我打开电脑约半小时后,我在dmesg中收到这些错误:
- [ 1355.677957] EXT4-fs error (device sda2): htree_dirblock_to_tree: inode #1318420: (comm updatedb.mlocat) bad entry in directory: directory entry across blocks - block=5251700offset=0(0),inode=1802725748,rec_len=179136,name_len=32
- [ 1355.677973] Aborting journal on device sda2-8.
- [ 1355.678101] EXT4-fs (sda2): Remounting filesystem read-only
- [ 1355.690144] EXT4-fs error (device sda2): htree_dirblock_to_tree: inode #1318416: (comm updatedb.mlocat) bad entry in directory: directory entry across blocks - block=5251699offset=0(0),inode=2194783952,rec_len=53280,name_len=152
- [ 1356.864720] EXT4-fs error (device sda2): htree_dirblock_to_tree: inode #1312795: (comm updatedb.mlocat) bad entry in directory: directory entry across blocks - block=5251176offset=1460(13748),inode=1432317541,rec_len=208208,name_len=119
/ dev / sda是一个SSD,它使用noop调度程序.
/ etc / fstab条目:
- UUID=acb4eefa-48ff-4ee1-bb5f-2dccce7d011f / ext4 errors=remount-ro,noatime,discard,user_xattr 0 1
系统信息:
- $cat /proc/mounts | grep /dev/sd
- /dev/sda1 /boot ext2 rw,errors=continue 0 0
- $cat /etc/lsb-release
- DISTRIB_ID=Ubuntu
- DISTRIB_RELEASE=10.04
- DISTRIB_CODENAME=lucid
- DISTRIB_DESCRIPTION="Ubuntu 10.04.3 LTS"
- $uname -a
- Linux leetpad 2.6.35-30-generic-pae #61~lucid1-Ubuntu SMP Thu Oct 13 21:14:29 UTC 2011 i686 GNU/Linux
智能输出-a:
- smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
- Home page is http://smartmontools.sourceforge.net/
- === START OF INFORMATION SECTION ===
- Device Model: STT_FTM28GX25H
- Serial Number: P637510-MIBY-706A009
- Firmware Version: 1916
- User Capacity: 128,035,676,160 bytes
- Device is: Not in smartctl database [for details use: -P showall]
- ATA Version is: 8
- ATA Standard is: Exact ATA specification draft version not indicated
- Local Time is: Thu Nov 24 20:53:48 2011 UTC
- SMART support is: Available - device has SMART capability.
- SMART support is: Enabled
- === START OF READ SMART DATA SECTION ===
- SMART overall-health self-assessment test result: PASSED
- See vendor-specific Attribute list for marginal Attributes.
- General SMART Values:
- Offline data collection status: (0x00) Offline data collection activity
- was never started.
- Auto Offline Data Collection: Disabled.
- Self-test execution status: ( 0) The prevIoUs self-test routine completed
- without error or no self-test has ever
- been run.
- Total time to complete Offline
- data collection: ( 0) seconds.
- Offline data collection
- capabilities: (0x1d) SMART execute Offline immediate.
- No Auto Offline data collection support.
- Abort Offline collection upon new
- command.
- Offline surface scan supported.
- Self-test supported.
- No Conveyance Self-test supported.
- No Selective Self-test supported.
- SMART capabilities: (0x0003) Saves SMART data before entering
- power-saving mode.
- Supports SMART auto save timer.
- Error logging capability: (0x00) Error logging NOT supported.
- General Purpose Logging supported.
- Short self-test routine
- recommended polling time: ( 0) minutes.
- Extended self-test routine
- recommended polling time: ( 0) minutes.
- SMART Attributes Data Structure revision number: 16
- Vendor Specific SMART Attributes with Thresholds:
- ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_Failed RAW_VALUE
- 1 Raw_Read_Error_Rate 0x0000 005 000 000 Old_age Offline In_the_past 0
- 9 Power_On_Hours 0x0000 141 002 000 Old_age Offline - 0
- 12 Power_Cycle_Count 0x0000 115 002 000 Old_age Offline - 0
- 184 Unknown_Attribute 0x0000 084 000 000 Old_age Offline In_the_past 0
- 195 Hardware_ECC_Recovered 0x0000 000 000 000 Old_age Offline FAILING_NOW 0
- 196 Reallocated_Event_Count 0x0000 000 000 000 Old_age Offline FAILING_NOW 0
- 197 Current_Pending_Sector 0x0000 000 000 000 Old_age Offline FAILING_NOW 0
- 198 Offline_Uncorrectable 0x0000 002 107 000 Old_age Offline - 21198
- 199 UDMA_CRC_Error_Count 0x0000 063 003 000 Old_age Offline - 26957
- 200 Multi_Zone_Error_Rate 0x0000 099 124 000 Old_age Offline - 446
- 201 Soft_Read_Error_Rate 0x0000 024 154 000 Old_age Offline - 328
- 202 TA_Increase_Count 0x0000 115 254 000 Old_age Offline - 115
- 203 Run_Out_Cancel 0x0000 247 245 000 Old_age Offline - 83
- 204 Shock_Count_Write_Opern 0x0000 000 000 000 Old_age Offline FAILING_NOW 0
- 205 Shock_Rate_Write_Opern 0x0000 016 039 000 Old_age Offline - 0
- 206 Flying_Height 0x0000 005 000 000 Old_age Offline In_the_past 0
- 207 Spin_High_Current 0x0000 055 015 000 Old_age Offline - 0
- 208 Spin_Buzz 0x0000 248 001 000 Old_age Offline - 0
- 209 Offline_Seek_Performnce 0x0000 095 000 000 Old_age Offline In_the_past 0
- 211 Unknown_Attribute 0x0000 000 000 000 Old_age Offline FAILING_NOW 0
- 212 Unknown_Attribute 0x0000 000 000 000 Old_age Offline FAILING_NOW 0
- 213 Unknown_Attribute 0x0000 000 000 000 Old_age Offline FAILING_NOW 0
- Warning: device does not support Error Logging
- Warning! SMART ATA Error Log Structure error: invalid SMART checksum.
- SMART Error Log Version: 1
- No Errors Logged
- Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
- SMART Self-test log structure revision number 1
- No self-tests have been logged. [To run self-tests,use: smartctl -t]
- Device does not support Selective Self Tests/Logging
我运行memtest 7个小时,它没有发现任何内存错误.
任何明显的想法在这种情况下会出现什么问题?我能想象到的最合理的事情是SSD正在静默地丢弃一些写请求,最终导致EXT4文件系统不一致(但没有磁盘I / O错误).怎么会发生这种情况?是否有相关的配置选项我应该确保正确设置?
我应该使用哪些工具来诊断硬件故障?是否可以在不覆盖数据的情况下诊断SSD故障?