获取表中不同列的最新 NOT NULL 值,按公共列分组

在 PostgreSQL 数据库中,我有一个测量表,如下所示:

| sensor_group_id | ts                        | value_1 | value_2 | etc... |
|-----------------|---------------------------|---------|---------|--------|
| 1               | 2021-07-21T00:20:00+00:00 | 15      | NULL    |        |
| 1               | 2021-07-15T00:20:00+00:00 | NULL    | 23      |        |
| 2               | 2021-07-17T00:20:00+00:00 | NULL    | 11      |        |
| 1               | 2021-07-13T00:20:00+00:00 | 9       | 4       |        |
| 2               | 2021-07-10T00:20:00+00:00 | 99      | 36      |        |

此表中有许多列具有不同类型的度量。每个传感器组同时生成不同类型的测量值,但并不总是所有类型。 所以我们最终得到了部分填充的行。

我想做什么:

  • 对于每个不同的 sensor_group_id
  • 对于每个不同的列(测量类型)
  • 获取该列为 NOT NULL 时的最新时间戳以及该时间戳时该度量的值

我现在的解决方案看起来很麻烦:

WITH
    latest_value_1 AS (SELECT DISTINCT ON (sensor_group_id) sensor_group_id,ts,value_1
                                  FROM measurements
                                  WHERE value_1 IS NOT NULL
                                  ORDER BY sensor_group_id,ts DESC),latest_value_2 AS (SELECT DISTINCT ON (sensor_group_id) sensor_group_id,value_2
                                  FROM measurements
                                  WHERE value_2 IS NOT NULL
                                  ORDER BY sensor_group_id,latest_value_3 AS (SELECT DISTINCT ON (sensor_group_id) sensor_group_id,value_3
                                  FROM measurements
                                  WHERE value_3 IS NOT NULL
                                  ORDER BY sensor_group_id,etc...
SELECT latest_value_1.sensor_group_id,latest_value_1.ts        AS latest_value_1_ts,value_1,latest_value_2.ts        AS latest_value_2_ts,value_2,latest_value_3.ts        AS latest_value_3_ts,value_3,etc...
FROM lastest_value_1
         JOIN latest_value_2
              ON latest_value_1.sensor_group_id = latest_value_2.sensor_group_id
         JOIN latest_value_2
              ON latest_value_1.sensor_group_id = latest_value_2.sensor_group_id
         JOIN latest_value_3
              ON latest_value_1.sensor_group_id = latest_value_3.sensor_group_id
        etc...

这会产生以下结果:

sensor_group_id latest_value_1_ts value_1 latest_value_2_ts value_2 等等...
1 2021-07-21T00:20:00+00:00 15 2021-07-21T00:20:00+00:00 23
2 2021-07-10T00:20:00+00:00 99 2021-07-17T00:20:00+00:00 11

这看起来非常复杂,但我不确定是否有更好的方法。将不胜感激!

sealguyue 回答:获取表中不同列的最新 NOT NULL 值,按公共列分组

不确定是否更简单...

with
  sensor_groups(sgr_id) as ( -- Change it to the list of groups if you have it
    select distinct sensor_group_id from measurements)
select
  *
from
  sensor_groups as sg
    left join lateral (
      select ts,value_1
      from measurements
      where value_1 is not null and sensor_group_id = sg.sgr_id
      order by ts desc limit 1) as v1(ts_1,v_1) on true
    left join lateral (
      select ts,value_2
      from measurements
      where value_2 is not null and sensor_group_id = sg.sgr_id
      order by ts desc limit 1) as v2(ts_2,v_2) on true
    ...

PS:数据规范化有很大帮助

,

真正想要的是 IGNORE NULLSLAG() 上的 LAST_VALUE() 选项。但是 Postgres 不支持这个功能。相反,您可以使用两级技巧,为每个值分配一个分组,因此每个 NULL 值与具有值的前一行在同一组中。然后通过组“修改”值:

select t.*,max(value_1) over (partition by sensor_group_id,grp_1) as imputed_value_1,max(value_2) over (partition by sensor_group_id,grp_2) as imputed_value_2,max(value_3) over (partition by sensor_group_id,grp_3) as imputed_value_3
from (select t.*,count(value_1) over (partition by sensor_group_id order by ts) as grp_1,count(value_2) over (partition by sensor_group_id order by ts) as grp_2,count(value_3) over (partition by sensor_group_id order by ts) as grp_3
      from t
     ) t;
本文链接:https://www.f2er.com/60951.html

大家都在问