如何避免列中重复出现

我有一张桌子,它描述了为不同顾客预订机票的代理商。 以下数据描述了一个客户数据。

如何避免列中重复出现

根据以上数据,我期望的是

如何避免列中重复出现

输出的意思是,我想先对队列进行分组,他先预订了一些到新加坡的票,然后是奥斯丁,再到了新加坡和德里

我们如何在SQL中实现此目标,请对此提供帮助

如果按如下所示进行投放也很有帮助

如何避免列中重复出现

netdyk 回答:如何避免列中重复出现

这是一个空白和孤岛的问题。要解决它,您需要生成相邻记录的组。通常,这是通过比较两个不同分区上的行号来完成的。

考虑:

select 
    agent_id,travel_destination,min(date_of_booking) first_date_of_booking,max(date_of_booking) max_date_of_booking
from (
    select 
        t.*,row_number() 
            over(partition by agent_id order by date_of_booking) rn1,row_number() 
            over(partition by agent_id,travel_destination order by date_of_booking) rn2
    from mytable t
) t 
group by 
    agent_id,rn1 - rn2,travel_destination
order by first_date_of_booking

请注意,我在答案中添加了每组的开始和结束日期,因为我发现它使答案更有意义。

另一句话:根据示例数据,不清楚是否要将customerid放入组中;我假设没有(如果是,则需要将该列添加到两个分区中。)

Demo on DB Fiddle

给出这个(简化的)数据集:

agent_id | travel_destination | customer_id | date_of_booking
:------- | :----------------- | :---------- | :--------------
A1001    | Singapore          | C1001       | 2019-06-10     
A1001    | Singapore          | C1001       | 2019-06-11     
A1001    | Austin             | C1001       | 2019-06-12     
A1001    | Singapore          | C1001       | 2019-06-13     
A1001    | Singapore          | C1001       | 2019-06-14     
A1001    | Dehli              | C1001       | 2019-06-15     

查询返回:

agent_id | travel_destination | first_date_of_booking | max_date_of_booking
:------- | :----------------- | :-------------------- | :------------------
A1001    | Singapore          | 2019-06-10            | 2019-06-11         
A1001    | Austin             | 2019-06-12            | 2019-06-12         
A1001    | Singapore          | 2019-06-13            | 2019-06-14         
A1001    | Dehli              | 2019-06-15            | 2019-06-15         

要获得您演示的第二个输出,可以添加另一个聚合级别并使用string_agg()

select 
    agent_id,string_agg(travel_destination order by first_date_of_booking) travel_destination
from (
  -- above query
) t
group by agent_id
,

尝试一下-至少如果您的数据库具有像Vertica那样的LISTAGG之类的功能...

WITH
-- this is your input - next time put it in so it can be 
-- copy-pasted and formatted to the below ....                                                                                                                                                    
input(agent_id,travel_dest,cust_id,bookdt) AS (
          SELECT 'A1001','Singapore','C1001',DATE '2109-06-10'
UNION ALL SELECT 'A1001',DATE '2019-06-11'
UNION ALL SELECT 'A1001','Austin',DATE '2019-06-19'
UNION ALL SELECT 'A1001',DATE '2019-06-20'
UNION ALL SELECT 'A1001',DATE '2019-07-30'
UNION ALL SELECT 'A1001',DATE '2019-07-31'
UNION ALL SELECT 'A1001','Delhi',DATE '2019-08-01'
UNION ALL SELECT 'A1001',DATE '2019-08-10'
UNION ALL SELECT 'A1001',DATE '2019-08-25'
)
-- real WITH clause starts here - substitute comma below with "WITH" ...,with_prev AS (
  SELECT
    agent_id,LAG(travel_dest,1,'') OVER (PARTITION BY agent_id ORDER BY bookdt) AS prev_dest
  FROM input
),de_duped AS (
  SELECT
    agent_id,travel_dest
   FROM with_prev
   WHERE travel_dest <> prev_dest
)
SELECT
  agent_id,LISTAGG(travel_dest) AS travel_dest
FROM de_duped
GROUP BY 1
;

您得到:

 agent_id |                travel_dest                 
----------+--------------------------------------------
 A1001    | Singapore,Austin,Singapore,Delhi,Singapore                                                                                                  
,

以下是用于BigQuery标准SQL

#standardSQL
SELECT agent_id,STRING_AGG(DISTINCT travel_destination) AS travel_destination
FROM `project.dataset.table`
GROUP BY agent_id    

它将产生以下输出

Row agent_id    travel_destination   
1   A1001       Singapore,Delhi      

预期输出为Singapore,Delhi-以下是该选项的另一个选择

#standardSQL
CREATE TEMP FUNCTION DedupConsecutive(line STRING) RETURNS STRING LANGUAGE js AS """
  return line.split(",").filter(function(value,index,arr){return value != arr[index+1];}).join(",");
""";
SELECT agent_id,DedupConsecutive(STRING_AGG(travel_destination ORDER BY date_of_booking)) destinations
FROM `project.dataset.table`
GROUP BY agent_id   

与戈登相同的情绪-I cannot think of a simpler solution.:o)

,

我只会使用lag()

SELECT t.agent_id,t.travel_dest
FROM (SELECT t.*,LAG(travel_dest) OVER (PARTITION BY agent_id ORDER BY bookdt) as prev_travel_dest
      FROM t
     ) t
WHERE prev_travel_dest IS NULL OR prev_travel_dest <> travel_dest
ORDER BY agent_id,bookdt;

我想不出更简单的解决方案。

本文链接:https://www.f2er.com/3158012.html

大家都在问