PostgreSQL中的词典排序非常慢?

前端之家收集整理的这篇文章主要介绍了PostgreSQL中的词典排序非常慢?前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我有一个vote_pairs视图,看起来像这样:
  1. CREATE VIEW vote_pairs AS
  2. SELECT
  3. v1.name as name1,v2.name as name2,...
  4. FROM votes AS v1
  5. JOIN votes AS v2
  6. ON v1.topic_id = v2.topic_id;

并且,在投票表中有大约100k行,跨此视图的查询大约需要3秒钟才能执行.

但是,当我在名称添加额外的过滤器时:

  1. ON v1.topic_id = v2.topic_id AND v1.name < v2.name;

运行时间翻了四倍,在vote_pairs上完成查询需要大约12秒.

无论限制的位置如何,此运行时都是一致的…例如,如果将过滤器移动到外部查询的WHERE子句,则查询同样很慢:

  1. SELECT * FROM vote_pairs WHERE name1 < name2;

这是怎么回事? Postgres的词典比较速度慢吗?这是别的吗?我怎么能提高这个查询的速度?

投票表:

  1. CREATE TABLE votes (
  2. topic_id INTEGER REFERENCES topics(id),name VARCHAR(64),vote VARCHAR(12)
  3. )
  4.  
  5. CREATE INDEX votes_topic_name ON votes (topic_id,name);
  6. CREATE INDEX votes_name ON votes (name);

没有名称过滤器的EXPLAIN ANALYZE的输出

  1. db=# CREATE OR REPLACE VIEW vote_pairs AS
  2. db-# SELECT
  3. db-# v1.name as name1,db-# v2.name as name2
  4. db-# FROM votes AS v1
  5. db-# JOIN votes AS v2
  6. db-# ON v1.topic_id = v2.topic_id;
  7. CREATE VIEW
  8. db=# EXPLAIN ANALYZE SELECT * FROM vote_pairs; QUERY PLAN
  9. -----------------------------------------------------------------------------------------------------------------------------
  10. Hash Join (cost=3956.38..71868.56 rows=5147800 width=28) (actual time=51.810..1236.673 rows=5082750 loops=1)
  11. Hash Cond: (v1.topic_id = v2.topic_id)
  12. -> Seq Scan on votes v1 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.019..18.358 rows=112950 loops=1)
  13. -> Hash (cost=1882.50..1882.50 rows=112950 width=18) (actual time=50.671..50.671 rows=112950 loops=1)
  14. -> Seq Scan on votes v2 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.004..20.306 rows=112950 loops=1)
  15. Total runtime: 1495.963 ms
  16. (6 rows)

并使用过滤器:

  1. db=# CREATE OR REPLACE VIEW vote_pairs AS
  2. db-# SELECT
  3. db-# v1.name as name1,db-# v2.name as name2
  4. db-# FROM votes AS v1
  5. db-# JOIN votes AS v2
  6. db-# ON v1.topic_id = v2.topic_id AND v1.name < v2.name;
  7. CREATE VIEW
  8. db=# EXPLAIN ANALYZE SELECT * FROM vote_pairs;
  9. QUERY PLAN
  10. -----------------------------------------------------------------------------------------------------------------------------
  11. Hash Join (cost=3956.38..84738.06 rows=1715933 width=28) (actual time=66.688..6900.478 rows=2484900 loops=1)
  12. Hash Cond: (v1.topic_id = v2.topic_id)
  13. Join Filter: ((v1.name)::text < (v2.name)::text)
  14. -> Seq Scan on votes v1 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.023..24.539 rows=112950 loops=1)
  15. -> Hash (cost=1882.50..1882.50 rows=112950 width=18) (actual time=65.603..65.603 rows=112950 loops=1)
  16. -> Seq Scan on votes v2 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.004..26.756 rows=112950 loops=1)
  17. Total runtime: 7048.740 ms
  18. (7 rows)

EXPLAIN(ANALYZE,BUFFERS):

  1. db=# EXPLAIN (ANALYZE,BUFFERS) SELECT * FROM vote_pairs;
  2. QUERY PLAN
  3. -----------------------------------------------------------------------------------------------------------------------------
  4. Hash Join (cost=3956.38..71345.89 rows=5152008 width=28) (actual time=56.230..1204.522 rows=5082750 loops=1)
  5. Hash Cond: (v1.topic_id = v2.topic_id)
  6. Buffers: shared hit=129 read=1377 written=2,temp read=988 written=974
  7. -> Seq Scan on votes v1 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.008..20.492 rows=112950 loops=1)
  8. Buffers: shared hit=77 read=676
  9. -> Hash (cost=1882.50..1882.50 rows=112950 width=18) (actual time=55.742..55.742 rows=112950 loops=1)
  10. Buckets: 2048 Batches: 8 Memory Usage: 752kB
  11. Buffers: shared hit=52 read=701 written=2,temp written=480
  12. -> Seq Scan on votes v2 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.004..22.954 rows=112950 loops=1)
  13. Buffers: shared hit=52 read=701 written=2
  14. Total runtime: 1499.302 ms
  15. (11 rows)
  16.  
  17.  
  18. db=# EXPLAIN (ANALYZE,BUFFERS) SELECT * FROM vote_pairs WHERE name1 > name2;
  19. QUERY PLAN
  20. -----------------------------------------------------------------------------------------------------------------------------
  21. Hash Join (cost=3956.38..84225.91 rows=1717336 width=28) (actual time=51.214..6422.592 rows=2484900 loops=1)
  22. Hash Cond: (v1.topic_id = v2.topic_id)
  23. Join Filter: ((v1.name)::text > (v2.name)::text)
  24. Rows Removed by Join Filter: 2597850
  25. Buffers: shared hit=32 read=1477,temp read=988 written=974
  26. -> Seq Scan on votes v1 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.008..22.605 rows=112950 loops=1)
  27. Buffers: shared hit=27 read=726
  28. -> Hash (cost=1882.50..1882.50 rows=112950 width=18) (actual time=50.678..50.678 rows=112950 loops=1)
  29. Buckets: 2048 Batches: 8 Memory Usage: 752kB
  30. Buffers: shared hit=2 read=751,temp written=480
  31. -> Seq Scan on votes v2 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.005..21.337 rows=112950 loops=1)
  32. Buffers: shared hit=2 read=751
  33. Total runtime: 6573.308 ms
  34. (13 rows)

杂项说明:

>已经运行了VACCUM FULL和ANALYZE投票
> 8.4.11和9.2.3都以相同的方式运行

是的,文本比较有时很慢.你可能想尝试:
  1. SELECT * FROM vote_pairs WHERE name1 > name2 collate "C";

这应该更快一些,因为它不会考虑特定于语言环境的比较规则.此外,您的解释分析结果表明您的shared_buffers可能设置得太低.

猜你在找的Postgre SQL相关文章