我有一个vote_pairs视图,看起来像这样:
- CREATE VIEW vote_pairs AS
- SELECT
- v1.name as name1,v2.name as name2,...
- FROM votes AS v1
- JOIN votes AS v2
- ON v1.topic_id = v2.topic_id;
并且,在投票表中有大约100k行,跨此视图的查询大约需要3秒钟才能执行.
- … ON v1.topic_id = v2.topic_id AND v1.name < v2.name;
运行时间翻了四倍,在vote_pairs上完成查询需要大约12秒.
无论限制的位置如何,此运行时都是一致的…例如,如果将过滤器移动到外部查询的WHERE子句,则查询同样很慢:
- SELECT * FROM vote_pairs WHERE name1 < name2;
这是怎么回事? Postgres的词典比较速度慢吗?这是别的吗?我怎么能提高这个查询的速度?
投票表:
- CREATE TABLE votes (
- topic_id INTEGER REFERENCES topics(id),name VARCHAR(64),vote VARCHAR(12)
- )
- CREATE INDEX votes_topic_name ON votes (topic_id,name);
- CREATE INDEX votes_name ON votes (name);
- db=# CREATE OR REPLACE VIEW vote_pairs AS
- db-# SELECT
- db-# v1.name as name1,db-# v2.name as name2
- db-# FROM votes AS v1
- db-# JOIN votes AS v2
- db-# ON v1.topic_id = v2.topic_id;
- CREATE VIEW
- db=# EXPLAIN ANALYZE SELECT * FROM vote_pairs; QUERY PLAN
- -----------------------------------------------------------------------------------------------------------------------------
- Hash Join (cost=3956.38..71868.56 rows=5147800 width=28) (actual time=51.810..1236.673 rows=5082750 loops=1)
- Hash Cond: (v1.topic_id = v2.topic_id)
- -> Seq Scan on votes v1 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.019..18.358 rows=112950 loops=1)
- -> Hash (cost=1882.50..1882.50 rows=112950 width=18) (actual time=50.671..50.671 rows=112950 loops=1)
- -> Seq Scan on votes v2 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.004..20.306 rows=112950 loops=1)
- Total runtime: 1495.963 ms
- (6 rows)
并使用过滤器:
- db=# CREATE OR REPLACE VIEW vote_pairs AS
- db-# SELECT
- db-# v1.name as name1,db-# v2.name as name2
- db-# FROM votes AS v1
- db-# JOIN votes AS v2
- db-# ON v1.topic_id = v2.topic_id AND v1.name < v2.name;
- CREATE VIEW
- db=# EXPLAIN ANALYZE SELECT * FROM vote_pairs;
- QUERY PLAN
- -----------------------------------------------------------------------------------------------------------------------------
- Hash Join (cost=3956.38..84738.06 rows=1715933 width=28) (actual time=66.688..6900.478 rows=2484900 loops=1)
- Hash Cond: (v1.topic_id = v2.topic_id)
- Join Filter: ((v1.name)::text < (v2.name)::text)
- -> Seq Scan on votes v1 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.023..24.539 rows=112950 loops=1)
- -> Hash (cost=1882.50..1882.50 rows=112950 width=18) (actual time=65.603..65.603 rows=112950 loops=1)
- -> Seq Scan on votes v2 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.004..26.756 rows=112950 loops=1)
- Total runtime: 7048.740 ms
- (7 rows)
EXPLAIN(ANALYZE,BUFFERS):
- db=# EXPLAIN (ANALYZE,BUFFERS) SELECT * FROM vote_pairs;
- QUERY PLAN
- -----------------------------------------------------------------------------------------------------------------------------
- Hash Join (cost=3956.38..71345.89 rows=5152008 width=28) (actual time=56.230..1204.522 rows=5082750 loops=1)
- Hash Cond: (v1.topic_id = v2.topic_id)
- Buffers: shared hit=129 read=1377 written=2,temp read=988 written=974
- -> Seq Scan on votes v1 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.008..20.492 rows=112950 loops=1)
- Buffers: shared hit=77 read=676
- -> Hash (cost=1882.50..1882.50 rows=112950 width=18) (actual time=55.742..55.742 rows=112950 loops=1)
- Buckets: 2048 Batches: 8 Memory Usage: 752kB
- Buffers: shared hit=52 read=701 written=2,temp written=480
- -> Seq Scan on votes v2 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.004..22.954 rows=112950 loops=1)
- Buffers: shared hit=52 read=701 written=2
- Total runtime: 1499.302 ms
- (11 rows)
- db=# EXPLAIN (ANALYZE,BUFFERS) SELECT * FROM vote_pairs WHERE name1 > name2;
- QUERY PLAN
- -----------------------------------------------------------------------------------------------------------------------------
- Hash Join (cost=3956.38..84225.91 rows=1717336 width=28) (actual time=51.214..6422.592 rows=2484900 loops=1)
- Hash Cond: (v1.topic_id = v2.topic_id)
- Join Filter: ((v1.name)::text > (v2.name)::text)
- Rows Removed by Join Filter: 2597850
- Buffers: shared hit=32 read=1477,temp read=988 written=974
- -> Seq Scan on votes v1 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.008..22.605 rows=112950 loops=1)
- Buffers: shared hit=27 read=726
- -> Hash (cost=1882.50..1882.50 rows=112950 width=18) (actual time=50.678..50.678 rows=112950 loops=1)
- Buckets: 2048 Batches: 8 Memory Usage: 752kB
- Buffers: shared hit=2 read=751,temp written=480
- -> Seq Scan on votes v2 (cost=0.00..1882.50 rows=112950 width=18) (actual time=0.005..21.337 rows=112950 loops=1)
- Buffers: shared hit=2 read=751
- Total runtime: 6573.308 ms
- (13 rows)
杂项说明:
>已经运行了VACCUM FULL和ANALYZE投票
> 8.4.11和9.2.3都以相同的方式运行
是的,文本比较有时很慢.你可能想尝试:
- SELECT * FROM vote_pairs WHERE name1 > name2 collate "C";
这应该更快一些,因为它不会考虑特定于语言环境的比较规则.此外,您的解释分析结果表明您的shared_buffers可能设置得太低.