Postgres慢速嵌套循环反连接用于大型且不断更新的表

2024-05-21 • 问答

我有两个表，ingestion_event和ingestion_dependency，其中第二个表中存在来自第一个表的一个或多个记录。

处理第一张表中的行时，应用程序将删除第二张表中的记录。

应用程序使用第一个表进行轮询以获取第二个表中不存在的行，如下所示：

SELECT a.*,dependency_key FROM (
    SELECT e.*,dependency_key
    FROM ingestion_event e
        LEFT JOIN ingestion_dependency d ON
                e.object_key = d.object_key AND e.root_task_id = d.root_task_id
    WHERE d.object_key is null
    FOR NO KEY UPDATE of e SKIP LOCKED
) AS a 
where  a.status = 'QUEUED' 
limit 100

"Limit  (cost=0.55..579.30 rows=100 width=1560) (actual time=16199.602..71206.977 rows=100 loops=1)"
"  ->  Subquery Scan on a  (cost=0.55..1436371.23 rows=248188 width=1560) (actual time=16199.600..71206.897 rows=100 loops=1)"
"        ->  LockRows  (cost=0.55..1433889.35 rows=248188 width=1470) (actual time=16199.599..71206.771 rows=100 loops=1)"
"              ->  nested Loop Anti Join  (cost=0.55..1431407.47 rows=248188 width=1470) (actual time=315.396..70222.109 rows=7457 loops=1)"
"                    ->  Seq Scan on ingestion_event e  (cost=0.00..424604.81 rows=917896 width=1362) (actual time=0.007..23196.192 rows=154371 loops=1)"
"                          Filter: (status = 'QUEUED'::text)"
"                          Rows Removed by Filter: 206432"
"                    ->  Index Scan using ingestion_dependency_object_key_idx on ingestion_dependency d  (cost=0.55..1.12 rows=1 width=219) (actual time=0.298..0.298 rows=1 loops=154371)"
"                          Index Cond: (e.object_key = object_key)"
"                          Filter: (e.root_task_id = root_task_id)"
"                          Rows Removed by Filter: 0"
"Planning time: 2.355 ms"
"Execution time: 71207.097 ms"

我对ingestion_event有以下索引：

(root_task_id ASC NULLS LAST,object_key ASC NULLS LAST);
(status COLLATE ASC NULLS LAST);
(root_task_id ASC NULLS LAST)

我在gestation_dependency列上具有以下btree索引：

（root_task_id ASC为空，dependency_key ASC为空）（root_task_id，object_key）（root_task_id，object_key）

我的问题是为什么嵌套循环反连接如此昂贵？我的postgres版本是9.6

Postgres慢速嵌套循环反连接用于大型且不断更新的表

sghhahaha 回答：Postgres慢速嵌套循环反连接用于大型且不断更新的表

大家都在问