Hive SQL聚合将多个SQL合并为一个说明：

2024-05-20 • 问答

我有一个类似的串行sqls：

select count(distinct userId) from table where hour >= 0 and hour <= 0;
select count(distinct userId) from table where hour >= 0 and hour <= 1;
select count(distinct userId) from table where hour >= 0 and hour <= 2;
...
select count(distinct userId) from table where hour >= 0 and hour <= 14;

是否可以将它们合并为一个sql？

您似乎想保持累计计数，并以小时为括号。为此，您可以使用一个窗口函数，如下所示：

import wordninja
' '.join(wordninja.split('todayIgotAemailReport'))

#this will break this into their respective word which can make your stuff easy,while searching
#op
'today I got A email Report'

可能有一种更简单的方法，但这通常会产生正确的答案。

说明：

这对相同的输入SELECT DISTINCT A.hour AS hour,SUM(COALESCE(M.include,0)) OVER (ORDER BY A.hour) AS cumulative_count FROM ( -- get all records,with 0 for include SELECT name,hour,0 AS include FROM table ) A LEFT JOIN ( -- get the record with lowest `hour` for each `name`,and 1 for include SELECT name,MIN(hour) AS hour,1 AS include FROM table GROUP BY name ) M ON M.name = A.name AND M.hour = A.hour ;使用2个子查询，并使用一个名为table的派生字段来跟踪哪些记录应为每个存储桶的最终总数作出贡献。第一个子查询仅获取表中的所有记录并分配include。第二个子查询找到所有唯一的0 AS include和其中出现name的最低hour插槽，并将它们分配给name。封闭查询对这两个子查询进行了1 AS include >

最外面的查询执行LEFT JOIN来填充COALESCE(M.include,0)产生的任何NULL，以及那些LEFT JOIN和1产生的由0编辑和加窗显示。这必须是SUM，而不是使用hour，因为SELECT DISTINCT会同时列出GROUP BY和GROUP BY，但最终会折叠中的所有记录给定的hour组成一行（仍然与include在一起）。 hour在include=1之后应用，因此它将删除重复项而不会丢弃任何输入行。

Hive SQL聚合将多个SQL合并为一个 说明：

xuexiaojuanxxj13400 回答：Hive SQL聚合将多个SQL合并为一个 说明：

说明：

大家都在问

Hive SQL聚合将多个SQL合并为一个说明：

xuexiaojuanxxj13400 回答：Hive SQL聚合将多个SQL合并为一个说明：