我需要根据当前记录的其他一些列和前一条记录的X值(使用一些分区和顺序)计算某些列X的值.基本上我需要在表单中实现查询
SELECT <some fields>,<some expression using LAG(X) OVER(PARTITION BY ... ORDER BY ...) AS X FROM <table>
这是不可能的,因为只有现有的列可以在窗口函数中使用,所以我正在寻找如何克服这一点.
这是一个例子.我有一张活动表.每个事件都有type和time_stamp.
create table event (id serial,type integer,time_stamp integer);
我不想找到“重复”事件.副本我的意思是以下.让我们通过time_stamp升序来为给定类型的所有事件排序.然后
>第一个事件不重复
>所有跟随非重复且在其后的某个时间范围内的事件(即它们的time_stamp不大于前一个非重复的time_stamp加上一些常量TIMEFRAME)是重复的
>下一个事件,如果time_stamp大于之前的非重复次数超过TIMEFRAME则不重复
>依此类推
对于这个数据
insert into event (type,time_stamp) values (1,1),(1,2),(2,3),10),15),21),13),40);
和TIMEFRAME = 10结果应该是
time_stamp | type | duplicate ----------------------------- 1 | 1 | false 2 | 1 | true 3 | 1 | true 10 | 1 | true 15 | 1 | false 21 | 1 | true 40 | 1 | false 2 | 2 | false 10 | 2 | true 13 | 2 | false
我可以根据前一个非重复事件的当前time_stamp和time_stamp来计算重复字段的值,如下所示:
WITH evt AS ( SELECT time_stamp,CASE WHEN time_stamp - LAG(current_non_dupl_time_stamp) OVER w >= TIMEFRAME THEN time_stamp ELSE LAG(current_non_dupl_time_stamp) OVER w END AS current_non_dupl_time_stamp FROM event WINDOW w AS (PARTITION BY type ORDER BY time_stamp ASC) ) SELECT time_stamp,time_stamp != current_non_dupl_time_stamp AS duplicate
但这不起作用,因为无法在LAG中引用计算的字段:
ERROR: column "current_non_dupl_time_stamp" does not exist.
递归方法的替代方法是自定义聚合.掌握编写自己的聚合技术后,创建转换和最终函数既简单又合乎逻辑.
状态转换功能:
create or replace function is_duplicate(st int[],time_stamp int,timeframe int) returns int[] language plpgsql as $$ begin if st is null or st[1] + timeframe <= time_stamp then st[1] := time_stamp; end if; st[2] := time_stamp; return st; end $$;
最终功能:
create or replace function is_duplicate_final(st int[]) returns boolean language sql as $$ select st[1] <> st[2]; $$;
骨料:
create aggregate is_duplicate_agg(time_stamp int,timeframe int) ( sfunc = is_duplicate,stype = int[],finalfunc = is_duplicate_final );
查询:
select *,is_duplicate_agg(time_stamp,10) over w from event window w as (partition by type order by time_stamp asc) order by type,time_stamp; id | type | time_stamp | is_duplicate_agg ----+------+------------+------------------ 1 | 1 | 1 | f 2 | 1 | 2 | t 4 | 1 | 3 | t 5 | 1 | 10 | t 7 | 1 | 15 | f 8 | 1 | 21 | t 10 | 1 | 40 | f 3 | 2 | 2 | f 6 | 2 | 10 | t 9 | 2 | 13 | f (10 rows)