Skip to main content

窗口函数(Window Functions)

概述

窗口函数对一组相关行("窗口")进行操作。对于每个输入行,窗口函数返回一个输出行,该输出行取决于传递给函数的特定行以及窗口中其他行的值。

有两种主要类型的顺序敏感窗口函数:

  • 排名相关函数:基于行的"排名"列出信息。例如,按年利润降序排列商店,利润最高的商店将排名第1,利润第二高的商店将排名第2,依此类推。

  • 窗口框架函数:支持对窗口中行的子集执行滚动操作,例如计算运行总计或移动平均值。

窗口函数类别

Databend 支持两大类窗口函数:

1. 专用窗口函数

这些函数专为窗口操作设计,提供排名、导航和值分析功能。

函数描述示例
RANK返回带间隙的排名RANK() OVER (ORDER BY salary DESC)1, 2, 2, 4, ...
DENSE_RANK返回无间隙的排名DENSE_RANK() OVER (ORDER BY salary DESC)1, 2, 2, 3, ...
ROW_NUMBER返回连续行号ROW_NUMBER() OVER (ORDER BY hire_date)1, 2, 3, 4, ...
CUME_DIST返回累积分布CUME_DIST() OVER (ORDER BY score)0.2, 0.4, 0.8, 1.0, ...
PERCENT_RANK返回相对排名 (0-1)PERCENT_RANK() OVER (ORDER BY score)0.0, 0.25, 0.75, ...
NTILE将行划分为 N 组NTILE(4) OVER (ORDER BY score)1, 1, 2, 2, 3, 3, 4, 4, ...
FIRST_VALUE返回窗口中的第一个值FIRST_VALUE(product) OVER (PARTITION BY category ORDER BY sales)
LAST_VALUE返回窗口中的最后一个值LAST_VALUE(product) OVER (PARTITION BY category ORDER BY sales)
NTH_VALUE返回窗口中的第 N 个值NTH_VALUE(product, 2) OVER (PARTITION BY category ORDER BY sales)
LEAD访问后续行的值LEAD(price, 1) OVER (ORDER BY date) → 下一天的价格
LAG访问前一行的值LAG(price, 1) OVER (ORDER BY date) → 前一天的价格
FIRST返回第一个值(别名)FIRST(product) OVER (PARTITION BY category ORDER BY sales)
LAST返回最后一个值(别名)LAST(product) OVER (PARTITION BY category ORDER BY sales)

2. 用作窗口函数的聚合函数

这些标准聚合函数可与 OVER 子句结合使用,执行窗口操作。

函数描述窗口框架支持示例
SUM计算窗口总和SUM(sales) OVER (PARTITION BY region ORDER BY date)
AVG计算窗口平均值AVG(score) OVER (ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
COUNT统计窗口行数COUNT(*) OVER (PARTITION BY department)
MIN返回窗口最小值MIN(price) OVER (PARTITION BY category)
MAX返回窗口最大值MAX(price) OVER (PARTITION BY category)
ARRAY_AGG将值收集到数组ARRAY_AGG(product) OVER (PARTITION BY category)
STDDEV_POP总体标准差STDDEV_POP(value) OVER (PARTITION BY group)
STDDEV_SAMP样本标准差STDDEV_SAMP(value) OVER (PARTITION BY group)
MEDIAN中位数值MEDIAN(response_time) OVER (PARTITION BY server)

条件变体

函数描述窗口框架支持示例
COUNT_IF条件计数COUNT_IF(status = 'complete') OVER (PARTITION BY dept)
SUM_IF条件求和SUM_IF(amount, status = 'paid') OVER (PARTITION BY customer)
AVG_IF条件平均值AVG_IF(score, passed = true) OVER (PARTITION BY class)
MIN_IF条件最小值MIN_IF(temp, location = 'outside') OVER (PARTITION BY day)
MAX_IF条件最大值MAX_IF(speed, vehicle = 'car') OVER (PARTITION BY test)

窗口函数语法

<function> ( [ <arguments> ] ) OVER ( { named_window | inline_window } )

其中:

named_window ::= window_name

inline_window ::=
[ PARTITION BY <expression_list> ]
[ ORDER BY <expression_list> ]
[ window_frame ]

关键组件

组件描述示例
<function>应用的窗口函数SUM(), RANK()
OVER指示窗口函数用法所有窗口函数必需
PARTITION BY将行分组为分区PARTITION BY department
ORDER BY分区内行排序ORDER BY salary DESC
window_frame定义计算行范围ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
named_window引用 WINDOW 子句定义的窗口SELECT sum(x) OVER w FROM t WINDOW w AS (PARTITION BY y)

窗口框架语法

窗口框架定义每行函数计算包含的行范围,包含两种类型:

1. 框架类型

框架类型描述示例
ROWS基于物理行的框架ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
RANGE基于逻辑值的框架RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

2. 框架范围

框架范围模式描述示例
累积框架
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW起始行到当前行运行总计
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING当前行到结束行当前位置的运行总计
滑动框架
BETWEEN N PRECEDING AND CURRENT ROW当前行前 N 行 + 当前行3 天移动平均
BETWEEN CURRENT ROW AND N FOLLOWING当前行 + 后续 N 行前瞻性计算
BETWEEN N PRECEDING AND N FOLLOWING前 N 行 + 当前行 + 后 N 行居中移动平均
BETWEEN UNBOUNDED PRECEDING AND N FOLLOWING起始行到当前后 N 行扩展累积计算
BETWEEN N PRECEDING AND UNBOUNDED FOLLOWING当前前 N 行到结束行扩展向后计算

窗口函数示例

以下示例使用员工数据集演示常见窗口函数用例。

示例数据设置

-- 创建员工表
CREATE TABLE employees (
employee_id INT,
first_name VARCHAR,
last_name VARCHAR,
department VARCHAR,
salary INT
);

-- 插入示例数据
INSERT INTO employees VALUES
(1, 'John', 'Doe', 'IT', 75000),
(2, 'Jane', 'Smith', 'HR', 85000),
(3, 'Mike', 'Johnson', 'IT', 90000),
(4, 'Sara', 'Williams', 'Sales', 60000),
(5, 'Tom', 'Brown', 'HR', 82000),
(6, 'Ava', 'Davis', 'Sales', 62000),
(7, 'Olivia', 'Taylor', 'IT', 72000),
(8, 'Emily', 'Anderson', 'HR', 77000),
(9, 'Sophia', 'Lee', 'Sales', 58000),
(10, 'Ella', 'Thomas', 'IT', 67000);

示例 1:排名函数

按薪资降序排列员工:

SELECT 
employee_id,
first_name,
last_name,
department,
salary,
RANK() OVER (ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees
ORDER BY salary DESC;

结果:

employee_idfirst_namelast_namedepartmentsalaryrankdense_rankrow_num
3MikeJohnsonIT90000111
2JaneSmithHR85000222
5TomBrownHR82000333
8EmilyAndersonHR77000444
1JohnDoeIT75000555

示例 2:分区

计算部门统计信息:

SELECT DISTINCT
department,
COUNT(*) OVER (PARTITION BY department) AS employee_count,
SUM(salary) OVER (PARTITION BY department) AS total_salary,
AVG(salary) OVER (PARTITION BY department) AS avg_salary,
MIN(salary) OVER (PARTITION BY department) AS min_salary,
MAX(salary) OVER (PARTITION BY department) AS max_salary
FROM employees
ORDER BY department;

结果:

departmentemployee_counttotal_salaryavg_salarymin_salarymax_salary
HR324400081333.337700085000
IT430400076000.006700090000
Sales318000060000.005800062000

示例 3:运行总计与移动平均

计算部门内运行总计与移动平均:

SELECT 
employee_id,
first_name,
department,
salary,
-- 运行总计(累积求和)
SUM(salary) OVER (
PARTITION BY department
ORDER BY employee_id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total,
-- 当前行与前一行移动平均
AVG(salary) OVER (
PARTITION BY department
ORDER BY employee_id
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
) AS moving_avg
FROM employees
ORDER BY department, employee_id;

结果:

employee_idfirst_namedepartmentsalaryrunning_totalmoving_avg
2JaneHR850008500085000.00
5TomHR8200016700083500.00
8EmilyHR7700024400079500.00
1JohnIT750007500075000.00
3MikeIT9000016500082500.00