postgresql – 带日期的索引优化

前端之家收集整理的这篇文章主要介绍了postgresql – 带日期的索引优化前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我在Postgresql 9.0.8中有一个大型对象表(15M行),我想查询它过时的字段.

我希望将查询除以数百万,以实现可扩展性和并发目的,我希望使用updated_at字段获取所有数据,日期为几天前.

我已经在一百万个ID上尝试了很多索引和查询,而且我似乎无法使用Heroku的Ronin硬件在100秒内获得性能.

我正在寻找我没有尽力提高效率的建议.

尝试#1

  1. EXPLAIN ANALYZE SELECT count(*) FROM objects
  2. WHERE (date(updated_at)) < (date(now())-7) AND id >= 5000001 AND id < 6000001;
  3. INDEX USED: (date(updated_at),id)
  4. 268578.934 ms

尝试#2

  1. EXPLAIN ANALYZE SELECT count(*) FROM objects
  2. WHERE ((date(now()) - (date(updated_at)) > 7)) AND id >= 5000001 AND id < 6000001;
  3. INDEX USED: primary key
  4. 335555.144 ms

尝试#3

  1. EXPLAIN ANALYZE SELECT count(*) FROM objects
  2. WHERE (date(updated_at)) < (date(now())-7) AND id/1000000 = 5;
  3. INDEX USED: (date(updated_at),(id/1000000))
  4. 243427.042 ms

尝试#4

  1. EXPLAIN ANALYZE SELECT count(*) FROM objects
  2. WHERE (date(updated_at)) < (date(now())-7) AND id/1000000 = 5 AND updated_at IS NOT NULL;
  3. INDEX USED: (date(updated_at),(id/1000000)) WHERE updated_at IS NOT NULL
  4. 706714.812 ms

尝试#5(对于一个月的过期数据)

  1. EXPLAIN ANALYZE SELECT count(*) FROM objects
  2. WHERE (EXTRACT(MONTH from date(updated_at)) = 8) AND id/1000000 = 5;
  3. INDEX USED: (EXTRACT(MONTH from date(updated_at)),(id/1000000))
  4. 107241.472 ms

尝试#6

  1. EXPLAIN ANALYZE SELECT count(*) FROM objects
  2. WHERE (date(updated_at)) < (date(now())-7) AND id/1000000 = 5;
  3. INDEX USED: ( (id/1000000 ) ASC,updated_at DESC NULLS LAST)
  4. 106842.395 ms

尝试#7(见:http://explain.depesz.com/s/DQP)

  1. EXPLAIN ANALYZE SELECT count(*) FROM objects
  2. WHERE id/1000000 = 5 and (date(updated_at)) < (date(now())-7);
  3. INDEX USED: ( (id/1000000 ) ASC,date(updated_at) DESC NULLS LAST);
  4. 100732.049 ms
  5. Second try: 87280.728 ms

尝试#8

  1. EXPLAIN ANALYZE SELECT count(*) FROM objects
  2. WHERE (date(updated_at)) < (date(now())-7) AND id/1000000 = 5 AND updated_at IS NOT NULL;
  3. INDEX USED: ( (id/1000000 ) ASC,date(updated_at) ASC NULLS LAST);
  4. 129133.022 ms

尝试#9(部分索引根据欧文的建议,见:
http://explain.depesz.com/s/p9A)

  1. EXPLAIN ANALYZE SELECT count(*) FROM objects
  2. WHERE id BETWEEN 5000000 AND 5999999 AND (date(updated_at)) < '2012-10-23'::date;
  3. INDEX USED: (date(updated_at) DESC NULLS LAST)
  4. WHERE id BETWEEN 5000000 AND 6000000 AND date(updated_at) < '2012-10-23'::date;
  5. 73861.047 ms

尝试#10(根据Erwin的建议,CLUSTER).

  1. CREATE INDEX ix_8 on objects ( (id/1000000 ) ASC,date(updated_at) DESC NULLS LAST);
  2. CLUSTER entities USING ix_8;
  3. EXPLAIN ANALYZE SELECT count(*) FROM objects
  4. WHERE id/1000000 = 5 and (date(updated_at)) < (date(now())-7) ;
  5. 4745.595 ms
  6.  
  7. EXPLAIN ANALYZE SELECT count(*) FROM objects
  8. WHERE id/1000000 = 10 and (date(updated_at)) < (date(now())-7) ;
  9. 17573.639 ms

==>这个解决方案似乎是赢家.我必须彻底测试以验证我的应用程序中的所有反击.

数据库设置:

从pg_settings中选择name,min_val,max_val,boot_val;

  1. name | min_val | max_val | boot_val
  2. --------------------------------+-----------+--------------+-------------------
  3. allow_system_table_mods | | | off
  4. application_name | | |
  5. archive_command | | |
  6. archive_mode | | | off
  7. archive_timeout | 0 | 2147483647 | 0
  8. array_nulls | | | on
  9. authentication_timeout | 1 | 600 | 60
  10. autovacuum | | | on
  11. autovacuum_analyze_scale_factor | 0 | 100 | 0.1
  12. autovacuum_analyze_threshold | 0 | 2147483647 | 50
  13. autovacuum_freeze_max_age | 100000000 | 2000000000 | 200000000
  14. autovacuum_max_workers | 1 | 536870911 | 3
  15. autovacuum_naptime | 1 | 2147483 | 60
  16. autovacuum_vacuum_cost_delay | -1 | 100 | 20
  17. autovacuum_vacuum_cost_limit | -1 | 10000 | -1
  18. autovacuum_vacuum_scale_factor | 0 | 100 | 0.2
  19. autovacuum_vacuum_threshold | 0 | 2147483647 | 50
  20. backslash_quote | | | safe_encoding
  21. bgwriter_delay | 10 | 10000 | 200
  22. bgwriter_lru_maxpages | 0 | 1000 | 100
  23. bgwriter_lru_multiplier | 0 | 10 | 2
  24. block_size | 8192 | 8192 | 8192
  25. bonjour | | | off
  26. bonjour_name | | |
  27. bytea_output | | | hex
  28. check_function_bodies | | | on
  29. checkpoint_completion_target | 0 | 1 | 0.5
  30. checkpoint_segments | 1 | 2147483647 | 3
  31. checkpoint_timeout | 30 | 3600 | 300
  32. checkpoint_warning | 0 | 2147483647 | 30
  33. client_encoding | | | sql_ASCII
  34. client_min_messages | | | notice
  35. commit_delay | 0 | 100000 | 0
  36. commit_siblings | 1 | 1000 | 5
  37. constraint_exclusion | | | partition
  38. cpu_index_tuple_cost | 0 | 1.79769e+308 | 0.005
  39. cpu_operator_cost | 0 | 1.79769e+308 | 0.0025
  40. cpu_tuple_cost | 0 | 1.79769e+308 | 0.01
  41. cursor_tuple_fraction | 0 | 1 | 0.1
  42. custom_variable_classes | | |
  43. DateStyle | | | ISO,MDY
  44. db_user_namespace | | | off
  45. deadlock_timeout | 1 | 2147483 | 1000
  46. debug_assertions | | | off
  47. debug_pretty_print | | | on
  48. debug_print_parse | | | off
  49. debug_print_plan | | | off
  50. debug_print_rewritten | | | off
  51. default_statistics_target | 1 | 10000 | 100
  52. default_tablespace | | |
  53. default_text_search_config | | | pg_catalog.simple
  54. default_transaction_isolation | | | read committed
  55. default_transaction_read_only | | | off
  56. default_with_oids | | | off
  57. effective_cache_size | 1 | 2147483647 | 16384
  58. effective_io_concurrency | 0 | 1000 | 1
  59. enable_bitmapscan | | | on
  60. enable_hashagg | | | on
  61. enable_hashjoin | | | on
  62. enable_indexscan | | | on
  63. enable_material | | | on
  64. enable_mergejoin | | | on
  65. enable_nestloop | | | on
  66. enable_seqscan | | | on
  67. enable_sort | | | on
  68. enable_tidscan | | | on
  69. escape_string_warning | | | on
  70. extra_float_digits | -15 | 3 | 0
  71. from_collapse_limit | 1 | 2147483647 | 8
  72. fsync | | | on
  73. full_page_writes | | | on
  74. geqo | | | on
  75. geqo_effort | 1 | 10 | 5
  76. geqo_generations | 0 | 2147483647 | 0
  77. geqo_pool_size | 0 | 2147483647 | 0
  78. geqo_seed | 0 | 1 | 0
  79. geqo_selection_bias | 1.5 | 2 | 2
  80. geqo_threshold | 2 | 2147483647 | 12
  81. gin_fuzzy_search_limit | 0 | 2147483647 | 0
  82. hot_standby | | | off
  83. ignore_system_indexes | | | off
  84. integer_datetimes | | | on
  85. IntervalStyle | | | postgres
  86. join_collapse_limit | 1 | 2147483647 | 8
  87. krb_caseins_users | | | off
  88. krb_srvname | | | postgres
  89. lc_collate | | | C
  90. lc_ctype | | | C
  91. lc_messages | | |
  92. lc_monetary | | | C
  93. lc_numeric | | | C
  94. lc_time | | | C
  95. listen_addresses | | | localhost
  96. lo_compat_privileges | | | off
  97. local_preload_libraries | | |
  98. log_autovacuum_min_duration | -1 | 2147483 | -1
  99. log_checkpoints | | | off
  100. log_connections | | | off
  101. log_destination | | | stderr
  102. log_disconnections | | | off
  103. log_duration | | | off
  104. log_error_verbosity | | | default
  105. log_executor_stats | | | off
  106. log_hostname | | | off
  107. log_line_prefix | | |
  108. log_lock_waits | | | off
  109. log_min_duration_statement | -1 | 2147483 | -1
  110. log_min_error_statement | | | error
  111. log_min_messages | | | warning
  112. log_parser_stats | | | off
  113. log_planner_stats | | | off
  114. log_rotation_age | 0 | 35791394 | 1440
  115. log_rotation_size | 0 | 2097151 | 10240
  116. log_statement | | | none
  117. log_statement_stats | | | off
  118. log_temp_files | -1 | 2147483647 | -1
  119. log_timezone | | | UNKNOWN
  120. log_truncate_on_rotation | | | off
  121. logging_collector | | | off
  122. maintenance_work_mem | 1024 | 2097151 | 16384
  123. max_connections | 1 | 536870911 | 100
  124. max_files_per_process | 25 | 2147483647 | 1000
  125. max_function_args | 100 | 100 | 100
  126. max_identifier_length | 63 | 63 | 63
  127. max_index_keys | 32 | 32 | 32
  128. max_locks_per_transaction | 10 | 2147483647 | 64
  129. max_prepared_transactions | 0 | 536870911 | 0
  130. max_stack_depth | 100 | 2097151 | 100
  131. max_standby_archive_delay | -1 | 2147483 | 30000
  132. max_standby_streaming_delay | -1 | 2147483 | 30000
  133. max_wal_senders | 0 | 536870911 | 0
  134. password_encryption | | | on
  135. port | 1 | 65535 | 5432
  136. post_auth_delay | 0 | 2147483647 | 0
  137. pre_auth_delay | 0 | 60 | 0
  138. random_page_cost | 0 | 1.79769e+308 | 4
  139. search_path | | | "$user",public
  140. segment_size | 131072 | 131072 | 131072
  141. seq_page_cost | 0 | 1.79769e+308 | 1
  142. server_encoding | | | sql_ASCII
  143. server_version | | | 9.0.8
  144. server_version_num | 90008 | 90008 | 90008
  145. session_replication_role | | | origin
  146. shared_buffers | 16 | 1073741823 | 1024
  147. silent_mode | | | off
  148. sql_inheritance | | | on
  149. ssl | | | off
  150. ssl_renegotiation_limit | 0 | 2097151 | 524288
  151. standard_conforming_strings | | | off
  152. statement_timeout | 0 | 2147483647 | 0
  153. superuser_reserved_connections | 0 | 536870911 | 3
  154. synchronize_seqscans | | | on
  155. synchronous_commit | | | on
  156. syslog_facility | | | local0
  157. syslog_ident | | | postgres
  158. tcp_keepalives_count | 0 | 2147483647 | 0
  159. tcp_keepalives_idle | 0 | 2147483647 | 0
  160. tcp_keepalives_interval | 0 | 2147483647 | 0
  161. temp_buffers | 100 | 1073741823 | 1024
  162. temp_tablespaces | | |
  163. TimeZone | | | UNKNOWN
  164. timezone_abbreviations | | | UNKNOWN
  165. trace_notify | | | off
  166. trace_recovery_messages | | | log
  167. trace_sort | | | off
  168. track_activities | | | on
  169. track_activity_query_size | 100 | 102400 | 1024
  170. track_counts | | | on
  171. track_functions | | | none
  172. transaction_isolation | | |
  173. transaction_read_only | | | off
  174. transform_null_equals | | | off
  175. unix_socket_group | | |
  176. unix_socket_permissions | 0 | 511 | 511
  177. update_process_title | | | on
  178. vacuum_cost_delay | 0 | 100 | 0
  179. vacuum_cost_limit | 1 | 10000 | 200
  180. vacuum_cost_page_dirty | 0 | 10000 | 20
  181. vacuum_cost_page_hit | 0 | 10000 | 1
  182. vacuum_cost_page_miss | 0 | 10000 | 10
  183. vacuum_defer_cleanup_age | 0 | 1000000 | 0
  184. vacuum_freeze_min_age | 0 | 1000000000 | 50000000
  185. vacuum_freeze_table_age | 0 | 2000000000 | 150000000
  186. wal_block_size | 8192 | 8192 | 8192
  187. wal_buffers | 4 | 2147483647 | 8
  188. wal_keep_segments | 0 | 2147483647 | 0
  189. wal_level | | | minimal
  190. wal_segment_size | 2048 | 2048 | 2048
  191. wal_sender_delay | 1 | 10000 | 200
  192. wal_sync_method | | | fdatasync
  193. wal_writer_delay | 1 | 10000 | 200
  194. work_mem | 64 | 2097151 | 1024
  195. xmlbinary | | | base64
  196. xmloption | | | content
  197. zero_damaged_pages | | | off
  198. (195 rows)
首先,可以吗?你写:

I want to fetch all data with the updated_at field with a date of a
few days ago
.

但你的WHERE条件是:

  1. (date(updated_at)) < (date(now())-7)

不应该是>?

索引

为了获得最佳性能,您可以……

>分区索引
>从索引中排除不相关的行
>使用更新的谓词在非工作时间自动重新创建索引.

您的索引可能如下所示:

  1. CREATE INDEX objects_id_updated_at_idx (updated_at::date DESC NULLS LAST)
  2. WHERE id BETWEEN 0 AND 999999
  3. AND updated_at > '2012-10-01 0:0'::timestamp -- some minimum date
  4.  
  5. CREATE INDEX objects_id_updated_at_idx (updated_at::date DESC NULLS LAST)
  6. WHERE id BETWEEN 1000000 AND 1999999
  7. AND updated_at > '2012-10-01 0:0'::timestamp -- some minimum date

第二个条件立即从索引中排除不相关的行,这应该使它更小更快 – 取决于您的实际数据分布.根据我的初步评论,我假设你想要更新的行.

该条件还会自动排除updated_at中的NULL值 – 您似乎在表中允许这些值,并且显然希望在查询中排除.指数的有用性随着时间的推移而恶化.查询始终检索最新条目.定期使用更新的WHERE子句重新创建索引.这需要在桌子上进行独占锁定,所以在非工作时间进行.还有CREATE INDEX CONCURRENTLY以最小化锁定的持续时间:

  1. CREATE INDEX CONCURRENTLY objects_id_up_201211_idx; -- create new idx
  2. DROP INDEX objects_id_up_201210_idx; -- then drop old

关于SO的相关答案:

> Postgres returns records in wrong order

为了进一步优化,您可以像我们在评论中提到的那样使用CLUSTER.但是你需要一个完整的索引.不适用于部分索引.你会暂时创建:

  1. CREATE INDEX objects_full_idx (id/1000000,updated_at::date DESC NULLS LAST);

完整索引的这种形式匹配上面部分索引的排序顺序.

  1. CLUSTER objects USING objects_full_idx;
  2. ANALYZE objects;

这需要一段时间,因为表是物理重写的.它实际上也是真空.它需要在桌子上进行独占的写锁定,所以在非工作时间进行 – 如果您可以负担得起的话.同样,存在一种侵入性较小的替代方案:pg_repack

然后,您可以再次删除索引.这是一次性的影响.我至少会尝试一次,看看你的查询能从中获益多少.随后的写入操作会使效果恶化.如果你看到相当大的影响,你可以在非工作时间重复这个程序.

如果您的表收到大量写入操作,则必须权衡此步骤的成本和收益.对于许多UPDATE,请考虑将FILLFACTOR设置为低于100.在CLUSTER之前执行此操作.

询问

  1. SELECT count(*)
  2. FROM objects
  3. WHERE id BETWEEN 0 AND 999999 -- match conditions of partial index!
  4. AND updated_at > '2012-10-01 0:0'::timestamp
  5. AND updated_at::date > (now()::date - 7)

更多

这个相关的答案提出了一种更先进的索引分区技术:

> Can spatial index help a “range – order by – limit” query

除此之外,它还提供了自动索引(重新)创建的示例代码.

Postgresql 9.2为您提供了几项新功能.仅凭Index-only scans就可以让你值得拥有.

确保autovacuum正常运行.您报告的CLUSTER的巨大收益可能部分归因于您从CLUSTER获得的隐含的VACUUM FULL.也许这是由Heroku自动设置的,不确定.问题中的设置看起来不错.所以这可能不是问题,CLUSTER真的很有效.

猜你在找的Postgre SQL相关文章