linux – ext4文件系统损坏 – 可能是硬件错误?

前端之家收集整理的这篇文章主要介绍了linux – ext4文件系统损坏 – 可能是硬件错误?前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我打开电脑约半小时后,我在dmesg中收到这些错误
  1. [ 1355.677957] EXT4-fs error (device sda2): htree_dirblock_to_tree: inode #1318420: (comm updatedb.mlocat) bad entry in directory: directory entry across blocks - block=5251700offset=0(0),inode=1802725748,rec_len=179136,name_len=32
  2. [ 1355.677973] Aborting journal on device sda2-8.
  3. [ 1355.678101] EXT4-fs (sda2): Remounting filesystem read-only
  4. [ 1355.690144] EXT4-fs error (device sda2): htree_dirblock_to_tree: inode #1318416: (comm updatedb.mlocat) bad entry in directory: directory entry across blocks - block=5251699offset=0(0),inode=2194783952,rec_len=53280,name_len=152
  5. [ 1356.864720] EXT4-fs error (device sda2): htree_dirblock_to_tree: inode #1312795: (comm updatedb.mlocat) bad entry in directory: directory entry across blocks - block=5251176offset=1460(13748),inode=1432317541,rec_len=208208,name_len=119

/ dev / sda是​​一个SSD,它使用noop调度程序.

/ etc / fstab条目:

  1. UUID=acb4eefa-48ff-4ee1-bb5f-2dccce7d011f / ext4 errors=remount-ro,noatime,discard,user_xattr 0 1

系统信息:

  1. $cat /proc/mounts | grep /dev/sd
  2. /dev/sda1 /boot ext2 rw,errors=continue 0 0
  3. $cat /etc/lsb-release
  4. DISTRIB_ID=Ubuntu
  5. DISTRIB_RELEASE=10.04
  6. DISTRIB_CODENAME=lucid
  7. DISTRIB_DESCRIPTION="Ubuntu 10.04.3 LTS"
  8. $uname -a
  9. Linux leetpad 2.6.35-30-generic-pae #61~lucid1-Ubuntu SMP Thu Oct 13 21:14:29 UTC 2011 i686 GNU/Linux

智能输出-a:

  1. smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
  2. Home page is http://smartmontools.sourceforge.net/
  3.  
  4. === START OF INFORMATION SECTION ===
  5. Device Model: STT_FTM28GX25H
  6. Serial Number: P637510-MIBY-706A009
  7. Firmware Version: 1916
  8. User Capacity: 128,035,676,160 bytes
  9. Device is: Not in smartctl database [for details use: -P showall]
  10. ATA Version is: 8
  11. ATA Standard is: Exact ATA specification draft version not indicated
  12. Local Time is: Thu Nov 24 20:53:48 2011 UTC
  13. SMART support is: Available - device has SMART capability.
  14. SMART support is: Enabled
  15.  
  16. === START OF READ SMART DATA SECTION ===
  17. SMART overall-health self-assessment test result: PASSED
  18. See vendor-specific Attribute list for marginal Attributes.
  19.  
  20. General SMART Values:
  21. Offline data collection status: (0x00) Offline data collection activity
  22. was never started.
  23. Auto Offline Data Collection: Disabled.
  24. Self-test execution status: ( 0) The prevIoUs self-test routine completed
  25. without error or no self-test has ever
  26. been run.
  27. Total time to complete Offline
  28. data collection: ( 0) seconds.
  29. Offline data collection
  30. capabilities: (0x1d) SMART execute Offline immediate.
  31. No Auto Offline data collection support.
  32. Abort Offline collection upon new
  33. command.
  34. Offline surface scan supported.
  35. Self-test supported.
  36. No Conveyance Self-test supported.
  37. No Selective Self-test supported.
  38. SMART capabilities: (0x0003) Saves SMART data before entering
  39. power-saving mode.
  40. Supports SMART auto save timer.
  41. Error logging capability: (0x00) Error logging NOT supported.
  42. General Purpose Logging supported.
  43. Short self-test routine
  44. recommended polling time: ( 0) minutes.
  45. Extended self-test routine
  46. recommended polling time: ( 0) minutes.
  47.  
  48. SMART Attributes Data Structure revision number: 16
  49. Vendor Specific SMART Attributes with Thresholds:
  50. ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_Failed RAW_VALUE
  51. 1 Raw_Read_Error_Rate 0x0000 005 000 000 Old_age Offline In_the_past 0
  52. 9 Power_On_Hours 0x0000 141 002 000 Old_age Offline - 0
  53. 12 Power_Cycle_Count 0x0000 115 002 000 Old_age Offline - 0
  54. 184 Unknown_Attribute 0x0000 084 000 000 Old_age Offline In_the_past 0
  55. 195 Hardware_ECC_Recovered 0x0000 000 000 000 Old_age Offline FAILING_NOW 0
  56. 196 Reallocated_Event_Count 0x0000 000 000 000 Old_age Offline FAILING_NOW 0
  57. 197 Current_Pending_Sector 0x0000 000 000 000 Old_age Offline FAILING_NOW 0
  58. 198 Offline_Uncorrectable 0x0000 002 107 000 Old_age Offline - 21198
  59. 199 UDMA_CRC_Error_Count 0x0000 063 003 000 Old_age Offline - 26957
  60. 200 Multi_Zone_Error_Rate 0x0000 099 124 000 Old_age Offline - 446
  61. 201 Soft_Read_Error_Rate 0x0000 024 154 000 Old_age Offline - 328
  62. 202 TA_Increase_Count 0x0000 115 254 000 Old_age Offline - 115
  63. 203 Run_Out_Cancel 0x0000 247 245 000 Old_age Offline - 83
  64. 204 Shock_Count_Write_Opern 0x0000 000 000 000 Old_age Offline FAILING_NOW 0
  65. 205 Shock_Rate_Write_Opern 0x0000 016 039 000 Old_age Offline - 0
  66. 206 Flying_Height 0x0000 005 000 000 Old_age Offline In_the_past 0
  67. 207 Spin_High_Current 0x0000 055 015 000 Old_age Offline - 0
  68. 208 Spin_Buzz 0x0000 248 001 000 Old_age Offline - 0
  69. 209 Offline_Seek_Performnce 0x0000 095 000 000 Old_age Offline In_the_past 0
  70. 211 Unknown_Attribute 0x0000 000 000 000 Old_age Offline FAILING_NOW 0
  71. 212 Unknown_Attribute 0x0000 000 000 000 Old_age Offline FAILING_NOW 0
  72. 213 Unknown_Attribute 0x0000 000 000 000 Old_age Offline FAILING_NOW 0
  73.  
  74. Warning: device does not support Error Logging
  75. Warning! SMART ATA Error Log Structure error: invalid SMART checksum.
  76. SMART Error Log Version: 1
  77. No Errors Logged
  78.  
  79. Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
  80. SMART Self-test log structure revision number 1
  81. No self-tests have been logged. [To run self-tests,use: smartctl -t]
  82.  
  83.  
  84. Device does not support Selective Self Tests/Logging

我运行memtest 7个小时,它没有发现任何内存错误.

任何明显的想法在这种情况下会出现什么问题?我能想象到的最合理的事情是SSD正在静默地丢弃一些写请求,最终导致EXT4文件系统不一致(但没有磁盘I / O错误).怎么会发生这种情况?是否有相关的配置选项我应该确保正确设置?

我应该使用哪些工具来诊断硬件故障?是否可以在不覆盖数据的情况下诊断SSD故障?

解决方法

首先,您可能希望对根磁盘执行完整的fsck.有时,我发现快速检查有时会遗漏一些重要的错误.您可以通过触摸根目录中的文件(可能取决于Linux发行版)来执行此操作,但可以尝试
  1. touch /forcefsck

并重新启动或启动救援CD并在那里执行root的fsck.完整,我的意思是使用-f fsck参数.

第二,您的系统日志是否指示任何硬件错误

正如Kario先生所说,您可能会考虑使用smartctl检查磁盘运行状况.我发现我使用的某些磁盘不会报告信息.

猜你在找的Linux相关文章