linux – PostgreSQL缓慢提交性能

前端之家收集整理的这篇文章主要介绍了linux – PostgreSQL缓慢提交性能前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我们在Postgresql配置方面遇到了一些问题.在一些基准测试之后,我发现非常简单的查询需要相对较长的时间,在进一步检查后,看起来实际的COMMIT命令确实很慢.

我使用下表进行了一个非常简单的测试:

  1. CREATE TABLE test (
  2. id serial primary key,foo varchar(16),);

打开所有语句后,我运行了以下查询10000次:

  1. BEGIN;
  2. INSERT INTO test (a) VALUES ('bar');
  3. COMMIT;

BEGIN和INSERT正在采用< 1ms完成,但COMMIT平均需要22ms才能完成. 在我自己的PC上运行相同的基准测试,速度要慢得多,产生的BEGIN和INSERT语句的平均值相同,但平均COMMIT约为0.4ms(快20倍). 经过一些阅读后,我尝试使用pg_test_fsync工具来解决问题.在服务器上我得到这些结果:

  1. $./pg_test_fsync -o 1024
  2. 1024 operations per test
  3. O_DIRECT supported on this platform for open_datasync and open_sync.
  4.  
  5. Compare file sync methods using one 8kB write:
  6. (in wal_sync_method preference order,except fdatasync
  7. is Linux's default)
  8. open_datasync 14.875 ops/sec
  9. fdatasync 11.920 ops/sec
  10. fsync 30.524 ops/sec
  11. fsync_writethrough n/a
  12. open_sync 30.425 ops/sec
  13.  
  14. Compare file sync methods using two 8kB writes:
  15. (in wal_sync_method preference order,except fdatasync
  16. is Linux's default)
  17. open_datasync 19.956 ops/sec
  18. fdatasync 23.299 ops/sec
  19. fsync 21.955 ops/sec
  20. fsync_writethrough n/a
  21. open_sync 3.619 ops/sec
  22.  
  23. Compare open_sync with different write sizes:
  24. (This is designed to compare the cost of writing 16kB
  25. in different write open_sync sizes.)
  26. 16kB open_sync write 5.923 ops/sec
  27. 8kB open_sync writes 3.120 ops/sec
  28. 4kB open_sync writes 10.246 ops/sec
  29. 2kB open_sync writes 1.787 ops/sec
  30. 1kB open_sync writes 0.830 ops/sec
  31.  
  32. Test if fsync on non-write file descriptor is honored:
  33. (If the times are similar,fsync() can sync data written
  34. on a different descriptor.)
  35. write,fsync,close 34.371 ops/sec
  36. write,close,fsync 36.527 ops/sec
  37.  
  38. Non-Sync'ed 8kB writes:
  39. write 248302.619 ops/sec

在我自己的电脑上,我得到:

  1. $./pg_test_fsync -o 1024
  2. 1024 operations per test
  3. O_DIRECT supported on this platform for open_datasync and open_sync.
  4.  
  5. Compare file sync methods using one 8kB write:
  6. (in wal_sync_method preference order,except fdatasync
  7. is Linux's default)
  8. open_datasync 69.862 ops/sec
  9. fdatasync 68.871 ops/sec
  10. fsync 34.593 ops/sec
  11. fsync_writethrough n/a
  12. open_sync 26.595 ops/sec
  13.  
  14. Compare file sync methods using two 8kB writes:
  15. (in wal_sync_method preference order,except fdatasync
  16. is Linux's default)
  17. open_datasync 26.872 ops/sec
  18. fdatasync 59.056 ops/sec
  19. fsync 34.031 ops/sec
  20. fsync_writethrough n/a
  21. open_sync 17.284 ops/sec
  22.  
  23. Compare open_sync with different write sizes:
  24. (This is designed to compare the cost of writing 16kB
  25. in different write open_sync sizes.)
  26. 16kB open_sync write 7.412 ops/sec
  27. 8kB open_sync writes 3.942 ops/sec
  28. 4kB open_sync writes 8.700 ops/sec
  29. 2kB open_sync writes 4.161 ops/sec
  30. 1kB open_sync writes 1.492 ops/sec
  31.  
  32. Test if fsync on non-write file descriptor is honored:
  33. (If the times are similar,close 35.086 ops/sec
  34. write,fsync 34.043 ops/sec
  35.  
  36. Non-Sync'ed 8kB writes:
  37. write 240544.985 ops/sec

服务器的配置:

  1. cpu: Intel(R) Core(TM) i7-3770 cpu @ 3.40GHz
  2. RAM: 32GB
  3. Disk: 2x 2TB SATA disk in Software RAID 1

用于比较的机器是带有16GB RAM和普通SATA磁盘的i5(没有raid).

更多信息:

>操作系统:ubuntu服务器12.10
>内核:Linux … 3.5.0-22-generic#34-Ubuntu SMP Tue Jan 8 21:47:00 UTC 2013 x86_64 x86_64 x86_64 GNU / Linux
>软件RAID 1
>文件系统是ext4
>未指定其他安装选项.
> Postgres版本9.1
> Linux mdadm raid

dump2efs的输出

  1. dumpe2fs 1.42.5 (29-Jul-2012)
  2. Filesystem volume name: <none>
  3. Last mounted on: /
  4. Filesystem UUID: 16e30b20-0531-4bcc-877a-818e1f5d5fb2
  5. Filesystem magic number: 0xEF53
  6. Filesystem revision #: 1 (dynamic)
  7. Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
  8. Filesystem flags: signed_directory_hash
  9. Default mount options: (none)
  10. Filesystem state: clean
  11. Errors behavior: Continue
  12. Filesystem OS type: Linux
  13. Inode count: 182329344
  14. Block count: 729289039
  15. Reserved block count: 36464451
  16. Free blocks: 609235080
  17. Free inodes: 182228152
  18. First block: 0
  19. Block size: 4096
  20. Fragment size: 4096
  21. Reserved GDT blocks: 850
  22. Blocks per group: 32768
  23. Fragments per group: 32768
  24. Inodes per group: 8192
  25. Inode blocks per group: 256
  26. RAID stride: 1
  27. Flex block group size: 16
  28. Filesystem created: Sat Jan 19 12:42:19 2013
  29. Last mount time: Wed Jan 23 16:23:11 2013
  30. Last write time: Sat Jan 19 12:46:13 2013
  31. Mount count: 8
  32. Maximum mount count: 30
  33. Last checked: Sat Jan 19 12:42:19 2013
  34. Check interval: 15552000 (6 months)
  35. Next check after: Thu Jul 18 13:42:19 2013
  36. Lifetime writes: 257 GB
  37. Reserved blocks uid: 0 (user root)
  38. Reserved blocks gid: 0 (group root)
  39. First inode: 11
  40. Inode size: 128
  41. Journal inode: 8
  42. First orphan inode: 17304375
  43. Default directory hash: half_md4
  44. Directory Hash Seed: a71fa518-7696-4a28-bd89-b21c10d4265b
  45. Journal backup: inode blocks
  46. Journal features: journal_incompat_revoke
  47. Journal size: 128M
  48. Journal length: 32768
  49. Journal sequence: 0x000df5a4
  50. Journal start: 31733

Mdadm – 详细输出

  1. /dev/md2:
  2. Version : 1.2
  3. Creation Time : Sat Jan 19 12:42:05 2013
  4. Raid Level : raid1
  5. Array Size : 2917156159 (2782.02 GiB 2987.17 GB)
  6. Used Dev Size : 2917156159 (2782.02 GiB 2987.17 GB)
  7. Raid Devices : 2
  8. Total Devices : 2
  9. Persistence : Superblock is persistent
  10.  
  11. Update Time : Fri Mar 22 11:16:45 2013
  12. State : clean
  13. Active Devices : 2
  14. Working Devices : 2
  15. Failed Devices : 0
  16. Spare Devices : 0
  17.  
  18. Name : rescue:2
  19. UUID : d87b98e7:d584a4ed:5dac7907:ae5639b0
  20. Events : 38
  21.  
  22. Number Major Minor RaidDevice State
  23. 0 8 3 0 active sync /dev/sda3
  24. 1 8 19 1 active sync /dev/sdb3

更新2013-03-25:
我对两个磁盘都进行了长时间的智能测试,结果没有问题.两个磁盘均来自Seagate,型号:ST3000DM001-9YN166.

更新2013-03-27:
我在完全空闲的机器上运行了最新版本(9.2.3)的pg_test_fsync:

  1. $./pg_test_fsync -s 3
  2. 3 seconds per test
  3. O_DIRECT supported on this platform for open_datasync and open_sync.
  4.  
  5. Compare file sync methods using one 8kB write:
  6. (in wal_sync_method preference order,except fdatasync
  7. is Linux's default)
  8. open_datasync 39.650 ops/sec
  9. fdatasync 34.283 ops/sec
  10. fsync 19.309 ops/sec
  11. fsync_writethrough n/a
  12. open_sync 55.271 ops/sec

它比之前略好,但仍然令人遗憾.两个磁盘上的分区对齐:

  1. $sudo parted /dev/sdb unit s print
  2. Model: ATA ST3000DM001-9YN1 (scsi)
  3. Disk /dev/sdb: 5860533168s
  4. Sector size (logical/physical): 512B/4096B
  5. Partition Table: gpt
  6.  
  7. Number Start End Size File system Name Flags
  8. 4 2048s 4095s 2048s bios_grub
  9. 1 4096s 25169919s 25165824s raid
  10. 2 25169920s 26218495s 1048576s raid
  11. 3 26218496s 5860533134s 5834314639s raid

挂载-v输出

  1. $mount -v | grep ^/dev/
  2. /dev/md2 on / type ext4 (rw,noatime)
  3. /dev/md1 on /boot type ext3 (rw)

md2设备正用于测试.要破坏交换分区并在各个磁盘上运行pg_test_fsync.

如果我在两个磁盘上单独运行pg_test_fsync,我得到大致相同的性能,分区是用noatime挂载的:

  1. $pg_test_fsync/pg_test_fsync -s 3
  2.  
  3. 3 seconds per test
  4. O_DIRECT supported on this platform for open_datasync and open_sync.
  5.  
  6. Compare file sync methods using one 8kB write:
  7. (in wal_sync_method preference order,except fdatasync
  8. is Linux's default)
  9. open_datasync 75.111 ops/sec
  10. fdatasync 71.925 ops/sec
  11. fsync 37.352 ops/sec
  12. fsync_writethrough n/a
  13. open_sync 33.746 ops/sec
  14.  
  15. Compare file sync methods using two 8kB writes:
  16. (in wal_sync_method preference order,except fdatasync
  17. is Linux's default)
  18. open_datasync 38.204 ops/sec
  19. fdatasync 49.907 ops/sec
  20. fsync 32.126 ops/sec
  21. fsync_writethrough n/a
  22. open_sync 13.642 ops/sec
  23.  
  24. Compare open_sync with different write sizes:
  25. (This is designed to compare the cost of writing 16kB
  26. in different write open_sync sizes.)
  27. 1 * 16kB open_sync write 25.325 ops/sec
  28. 2 * 8kB open_sync writes 12.539 ops/sec
  29. 4 * 4kB open_sync writes 6.207 ops/sec
  30. 8 * 2kB open_sync writes 3.098 ops/sec
  31. 16 * 1kB open_sync writes 1.208 ops/sec
  32.  
  33. Test if fsync on non-write file descriptor is honored:
  34. (If the times are similar,close 27.275 ops/sec
  35. write,fsync 20.561 ops/sec
  36.  
  37. Non-Sync'ed 8kB writes:
  38. write 562902.020 ops/sec

在阵列和单个磁盘上运行测试几次后,数字似乎变化很大.最坏的情况是性能大约是我在这里发布的50%(第一次测试大约30次操作/秒).这是正常的吗?机器一直处于空闲状态.

此外,根据dmesg输出,控制器处于AHCI模式.

解决方法

服务器具有令人难以置信的,令人难以置信的,非常慢的fsync性能.您的软件RAID 1设置存在严重问题.可怕的fsync性能几乎肯定是导致性能问题的原因.

桌面只有非常慢的fsync.

通过设置synchronous_commit = off并设置commit_delay,您可以以崩溃后丢失一些数据为代价来解决性能问题.但是,你确实需要在服务器上整理磁盘性能,这是令人不快的.

为了比较,这是我在笔记本电脑上得到的东西(i7,8GB RAM,中档128G SSD,9.2中的pg_test_fsync):

  1. Compare file sync methods using one 8kB write:
  2.  
  3. open_datasync 4445.744 ops/sec
  4. fdatasync 4225.793 ops/sec
  5. fsync 2742.679 ops/sec
  6. fsync_writethrough n/a
  7. open_sync 2907.265 ops/sec

不可否认,这款SSD可能不是硬电源丢失安全的,但是当我们谈论服务器成本时,它并不像一个体面的电源故障安全SSD成本很高.

猜你在找的Linux相关文章