我有两个表,ingestion_event和ingestion_dependency,其中第二个表中存在来自第一个表的一个或多个记录。
处理第一张表中的行时,应用程序将删除第二张表中的记录。
应用程序使用第一个表进行轮询以获取第二个表中不存在的行,如下所示:
SELECT a.*,dependency_key FROM (
SELECT e.*,dependency_key
FROM ingestion_event e
LEFT JOIN ingestion_dependency d ON
e.object_key = d.object_key AND e.root_task_id = d.root_task_id
WHERE d.object_key is null
FOR NO KEY UPDATE of e SKIP LOCKED
) AS a
where a.status = 'QUEUED'
limit 100
"Limit (cost=0.55..579.30 rows=100 width=1560) (actual time=16199.602..71206.977 rows=100 loops=1)"
" -> Subquery Scan on a (cost=0.55..1436371.23 rows=248188 width=1560) (actual time=16199.600..71206.897 rows=100 loops=1)"
" -> LockRows (cost=0.55..1433889.35 rows=248188 width=1470) (actual time=16199.599..71206.771 rows=100 loops=1)"
" -> nested Loop Anti Join (cost=0.55..1431407.47 rows=248188 width=1470) (actual time=315.396..70222.109 rows=7457 loops=1)"
" -> Seq Scan on ingestion_event e (cost=0.00..424604.81 rows=917896 width=1362) (actual time=0.007..23196.192 rows=154371 loops=1)"
" Filter: (status = 'QUEUED'::text)"
" Rows Removed by Filter: 206432"
" -> Index Scan using ingestion_dependency_object_key_idx on ingestion_dependency d (cost=0.55..1.12 rows=1 width=219) (actual time=0.298..0.298 rows=1 loops=154371)"
" Index Cond: (e.object_key = object_key)"
" Filter: (e.root_task_id = root_task_id)"
" Rows Removed by Filter: 0"
"Planning time: 2.355 ms"
"Execution time: 71207.097 ms"
我对ingestion_event有以下索引:
(root_task_id ASC NULLS LAST,object_key ASC NULLS LAST);
(status COLLATE ASC NULLS LAST);
(root_task_id ASC NULLS LAST)
我在gestation_dependency列上具有以下btree索引:
(root_task_id ASC为空,dependency_key ASC为空) (root_task_id,object_key) (root_task_id,object_key)
我的问题是为什么嵌套循环反连接如此昂贵? 我的postgres版本是9.6