寻找生成森林(WITH RECURSIVE,PostgreSQL 9.5)

我有一个任意数量的人的身份表(即别名).每行都有一个以前的名称和一个新名称.在生产中,大约有1M行.例如：

id,old,new
---
1,'Albert','Bob'
2,'Bob','Charles'
3,'Mary','Nancy'
4,'Charles','Albert'
5,'Lydia','Nancy'
6,'Zoe','Zoe'

我想要的是生成用户列表并引用他们各自的身份.这类似于查找连接标识的每个图形中的所有节点,或查找生成林：

User 1: Albert,Bob,Charles (identities: 1,2,4)
User 2: Mary,Nancy,Lydia (identities: 3,5)
User 3: Zoe (identities: 6)

我一直在修补Postgresql的WITH RECURSIVE,但它为每个产生了集合和子集.例如：

1,4 <-- spanning tree: good
2     <-- subset: discard
3,5   <-- spanning tree: good
4     <-- subset: discard
5     <-- subset: discard
6     <-- spanning tree: good

我需要做什么才能为每个用户生成完整的身份集(即生成树)？

sqlFiddle：http://sqlfiddle.com/#!15/9eaed/4我的最新尝试.这是代码：

WITH RECURSIVE search_graph AS (
   SELECT id,id AS min_id,ARRAY[id] AS path,ARRAY[old,new] AS emails
   FROM   identities

   UNION 

   SELECT identities.id,LEAST(identities.id,sg.min_id),(sg.path || identities.id),(sg.emails || identities.old || identities.new)

   FROM search_graph sg
   JOIN identities ON (identities.old = ANY(sg.emails) OR identities.new = ANY(sg.emails))
   WHERE  identities.id <> ALL(sg.path)
)
SELECT array_agg(DISTINCT(p)) from search_graph,unnest(path) p GROUP BY min_id;

结果如下：

解决方法

我刚才写了一个类似问题的答案： How to find all connected subgraphs of an undirected graph.在那个问题中,我使用了sql Server.有关中间CTE的详细说明,请参阅该答案.我将该查询改编为Postgres.

使用Postgres数组功能可以更有效地编写它,而不是将路径连接到文本列.

WITH RECURSIVE
CTE_Idents
AS
(
    SELECT old AS Ident
    FROM identities

    UNION

    SELECT new AS Ident
    FROM identities
),CTE_Pairs
AS
(
    SELECT old AS Ident1,new AS Ident2
    FROM identities
    WHERE old <> new

    UNION

    SELECT new AS Ident1,old AS Ident2
    FROM identities
    WHERE old <> new
),CTE_Recursive
AS
(
    SELECT
        CTE_Idents.Ident AS AnchorIdent,Ident1,Ident2,',' || Ident1 || ',' || Ident2 || ',' AS IdentPath,1 AS Lvl
    FROM 
        CTE_Pairs
        INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1

    UNION ALL

    SELECT 
        CTE_Recursive.AnchorIdent,CTE_Pairs.Ident1,CTE_Pairs.Ident2,CTE_Recursive.IdentPath || CTE_Pairs.Ident2 || ',CTE_Recursive.Lvl + 1 AS Lvl
    FROM
        CTE_Pairs
        INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
    WHERE
        CTE_Recursive.IdentPath NOT LIKE ('%,' || CTE_Pairs.Ident2 || ',%')
),CTE_RecursionResult
AS
(
    SELECT AnchorIdent,Ident2
    FROM CTE_Recursive
),CTE_CleanResult
AS
(
    SELECT AnchorIdent,Ident1 AS Ident
    FROM CTE_RecursionResult

    UNION

    SELECT AnchorIdent,Ident2 AS Ident
    FROM CTE_RecursionResult
),CTE_Groups
AS
(
  SELECT
    CTE_Idents.Ident,array_agg(COALESCE(CTE_CleanResult.Ident,CTE_Idents.Ident) 
        ORDER BY COALESCE(CTE_CleanResult.Ident,CTE_Idents.Ident)) AS AllIdents
  FROM
    CTE_Idents
    LEFT JOIN CTE_CleanResult ON CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
  GROUP BY CTE_Idents.Ident
)
SELECT AllIdents
FROM CTE_Groups
GROUP BY AllIdents
;

我在样本数据中添加了一行(7,X,Y).

SQL Fiddle

结果

|          allidents |
|--------------------|
|   Lydia,Mary,Nancy |
| Albert,Charles |
|                X,Y |
|                Zoe |

寻找生成森林(WITH RECURSIVE,PostgreSQL 9.5)

解决方法

猜你在找的Postgre SQL相关文章