Hive SQL聚合将多个SQL合并为一个 说明:

我有一个类似的串行sqls:

select count(distinct userId) from table where hour >= 0 and hour <= 0;
select count(distinct userId) from table where hour >= 0 and hour <= 1;
select count(distinct userId) from table where hour >= 0 and hour <= 2;
...
select count(distinct userId) from table where hour >= 0 and hour <= 14;

是否可以将它们合并为一个sql?

xuexiaojuanxxj13400 回答:Hive SQL聚合将多个SQL合并为一个 说明:

您似乎想保持累计计数,并以小时为括号。为此,您可以使用一个窗口函数,如下所示:

import wordninja
' '.join(wordninja.split('todayIgotAemailReport'))

#this will break this into their respective word which can make your stuff easy,while searching
#op
'today I got A email Report' 

可能有一种更简单的方法,但这通常会产生正确的答案。


说明:

这对相同的输入SELECT DISTINCT A.hour AS hour,SUM(COALESCE(M.include,0)) OVER (ORDER BY A.hour) AS cumulative_count FROM ( -- get all records,with 0 for include SELECT name,hour,0 AS include FROM table ) A LEFT JOIN ( -- get the record with lowest `hour` for each `name`,and 1 for include SELECT name,MIN(hour) AS hour,1 AS include FROM table GROUP BY name ) M ON M.name = A.name AND M.hour = A.hour ; 使用2个子查询,并使用一个名为table的派生字段来跟踪哪些记录应为每个存储桶的最终总数作出贡献。第一个子查询仅获取表中的所有记录并分配include。第二个子查询找到所有唯一的0 AS include和其中出现name的最低hour插槽,并将它们分配给name。封闭查询对这两个子查询进行了1 AS include >

最外面的查询执行LEFT JOIN来填充COALESCE(M.include,0)产生的任何NULL,以及那些LEFT JOIN1产生的由0编辑和加窗显示。这必须是SUM,而不是使用hour,因为SELECT DISTINCT会同时列出GROUP BYGROUP BY,但最终会折叠中的所有记录给定的hour组成一行(仍然与include在一起)。 hourinclude=1之后应用,因此它将删除重复项而不会丢弃任何输入行。

本文链接:https://www.f2er.com/3146670.html

大家都在问