首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >smartmontools:我应该替换我的SSHD吗?

smartmontools:我应该替换我的SSHD吗?
EN

Unix & Linux用户
提问于 2022-10-23 13:51:07
回答 3查看 659关注 0票数 2

今天,当我在Firefox上看视频的时候,突然下面的窗口突然出现了:

或者来自GSmartContol的输出:

代码语言:javascript
复制
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.19.0-22-amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Laptop SSHD
Device Model:     ST500LM000-1EJ162-SSHD
Serial Number:    W3715AR9
LU WWN Device Id: 5 000c50 06e236b9f
Firmware Version: HPD3
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Oct 23 14:41:09 2022 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (  634) seconds.
Offline data collection
capabilities:            (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (  99) minutes.
SCT capabilities:          (0x1081) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   118   099   006    -    195697992
  3 Spin_Up_Time            PO---K   099   099   000    -    0
  4 Start_Stop_Count        -O--CK   093   093   020    -    7676
  5 Reallocated_Sector_Ct   PO--CK   100   100   036    -    0
  7 Seek_Error_Rate         POSR-K   082   060   030    -    4473742513
  9 Power_On_Hours          -O--CK   087   087   000    -    11853
 10 Spin_Retry_Count        PO--CK   100   100   097    -    0
 12 Power_Cycle_Count       -O--CK   093   093   020    -    7668
180 Unknown_HDD_Attribute   -O-R-K   100   100   000    -    64025461
183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
184 End-to-End_Error        PO--CK   100   100   097    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   100   099   000    -    2
189 High_Fly_Writes         -O-RCK   063   063   000    -    37
190 Airflow_Temperature_Cel -O---K   069   055   045    -    31 (Min/Max 28/32)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    228
193 Load_Cycle_Count        -O--CK   097   097   000    -    7777
194 Temperature_Celsius     -O---K   031   045   000    -    31 (0 14 0 0 0)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O--CK   100   100   000    -    16
198 Offline_Uncorrectable   ----CK   100   100   000    -    16
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
254 Free_Fall_Sensor        -O--CK   100   100   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O   1223  Current Device Internal Status Data log
0x25       GPL     R/O   1223  Saved Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS      20  Device vendor specific log
0xa2       GPL     VS    3900  Device vendor specific log
0xa8       GPL,SL  VS     129  Device vendor specific log
0xa9       GPL,SL  VS       1  Device vendor specific log
0xab       GPL     VS       1  Device vendor specific log
0xae       GPL     VS       1  Device vendor specific log
0xb0       GPL     VS    4580  Device vendor specific log
0xb6       GPL     VS    1918  Device vendor specific log
0xbe-0xbf  GPL     VS   65535  Device vendor specific log
0xc1       GPL,SL  VS      10  Device vendor specific log
0xc2       GPL,SL  VS      50  Device vendor specific log
0xc4       GPL,SL  VS       5  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
Device Error Count: 1
    CR     = Command Register
    FEATR  = Features Register
    COUNT  = Count (was: Sector Count) Register
    LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
    LH     = LBA High (was: Cylinder High) Register    ]   LBA
    LM     = LBA Mid (was: Cylinder Low) Register      ] Register
    LL     = LBA Low (was: Sector Number) Register     ]
    DV     = Device (was: Device/Head) Register
    DC     = Device Control Register
    ER     = Error register
    ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 [0] occurred at disk power-on lifetime: 8134 hours (338 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 00 a0 3a 40 00 00  Error: UNC at LBA = 0x00a03a40 = 10500672

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  25 00 00 00 2a 00 00 00 a0 3a 40 e0 00     01:31:49.827  READ DMA EXT
  25 00 00 00 35 00 00 00 a0 42 0b e0 00     01:31:49.348  READ DMA EXT
  25 00 00 00 0b 00 00 00 a0 42 00 e0 00     01:31:49.345  READ DMA EXT
  25 00 00 00 15 00 00 03 93 ac 6b e0 00     01:31:49.342  READ DMA EXT
  25 00 00 00 2b 00 00 03 93 ac 40 e0 00     01:31:49.339  READ DMA EXT

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     11852         -
# 2  Short offline       Completed without error       00%     11847         -
# 3  Short offline       Completed without error       00%     11844         -
# 4  Short offline       Completed without error       00%     11835         -
# 5  Short offline       Completed without error       00%     11830         -
# 6  Short offline       Completed without error       00%     11823         -
# 7  Short offline       Completed without error       00%     11818         -
# 8  Short offline       Completed without error       00%     11814         -
# 9  Short offline       Completed without error       00%     11806         -
#10  Short offline       Completed without error       00%     11801         -
#11  Short offline       Completed without error       00%     11792         -
#12  Short offline       Completed without error       00%     11790         -
#13  Short offline       Completed without error       00%     11780         -
#14  Short offline       Completed without error       00%     11772         -
#15  Short offline       Completed without error       00%     11765         -
#16  Short offline       Completed without error       00%     11756         -
#17  Short offline       Completed without error       00%     11751         -
#18  Short offline       Completed without error       00%     11747         -
#19  Short offline       Completed without error       00%     11740         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       522 (0x020a)
Device State:                        Active (0)
Current Temperature:                    31 Celsius
Power Cycle Min/Max Temperature:     25/32 Celsius
Lifetime    Min/Max Temperature:     16/44 Celsius
Under/Over Temperature Limit Count:   0/2

SCT Data Table command not supported

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS

今天,当我引导Linux时,它并不是引导。因此,我重新启动了启动,而且没有出现问题。这是在这个错误出现之前。不知道这个引导问题是否与smartmontools错误有关。引导问题发生在我收到错误警告之前。

令人困惑的是:在reoprt中有一行“错误1 0发生在磁盘上的生存期: 8134小时(338天+ 22小时)”。但没有约会。我的期望是,会有一个发生此错误的日期,这样我就可以显示今天的日期,并且可以将错误指定为今天的日期。由于我没有在txt文件的整个输出中找到日期,所以我在寻找sshd的实际生存期,因为据说错误发生在8134小时。所以我的期望是,我可以在某个地方找到我的sshd运行到当前时间的小时数。但我也没找到这个。

哪个主机的syslog是指的?也许是这个: /var/log/syslog?

如果是:这里是:https://workupload.com/file/NVD2gpdrvHp

但我的主要问题是:我的sshd很快就会死亡,这有很高的风险吗?

据说,硬盘的健康状态已经改变了。但我现在在哪里可以找到目前的健康状况呢?

谢谢。

EN

回答 3

Unix & Linux用户

发布于 2022-10-24 09:07:42

脱机不可修正扇区

从您发布的图片和文本中,已经有16个不可读/不可写的扇区。

作为过去从事数据恢复的工作人员,我建议使用ddrescue(手册页)尽快将磁盘的其他健康部分复制到外部介质中。

在这一点上,传递智能是无关紧要的,也是不相关的。

现在,您已经使用了ddrescue,并且可以确认存在一个实际的问题,完全另一个问题是找出哪些文件受到了影响,而您无法从dd救援的日志文件中找到这些文件。

您需要成功地挂载这个快速救援映像,如root

代码语言:javascript
复制
mount -o ro,loop,offset=$(( sector size, usually 512 * an actual offset )) /path/to/ddrescue/image /mnt/point/

查找错误=受影响的文件:

代码语言:javascript
复制
cp -PRv /mnt/point/ /path/to/extracted/files/ 2>>/path/to/extracted/files/ERRORS.txt

这些只是例子。始终检查路径,不要复制粘贴.

票数 4
EN

Unix & Linux用户

发布于 2022-10-23 14:22:36

驱动器本身不知道任何日期,也没有办法设置一个。它只需数几个小时的功率,即使是那个计数器也可能是一个粗糙的计数器,如果驱动器一次只运行几分钟的话,它的计数可能就不正确。

您当前的功率为11853小时,所以也许您可以推断出的日期,根据平均时间,这个系统是每天运行。或者你在其他地方用小时值记录电源,这样你就可以推断出一个更精确的日期了。

您的驱动器有不可读(挂起,不可纠正)扇区,所以有可能你已经失去了一些数据。您有什么备份可以比较,或者可以检查校验和吗?

就我个人而言,我会首先替换它(使用ddrescue来处理读取错误),然后更彻底地测试它。SMART报告的错误计数器总是最小值,即驱动器遇到的问题,而无需刻意查找它们。

因此,目前还可能有更多的错误没有被报告。

将来,还可以考虑运行长自测试(或选择性自我测试),因为短测试可能不足以可靠地检测读取错误。

票数 3
EN

Unix & Linux用户

发布于 2022-10-23 18:46:22

我会特别担心这点:

代码语言:javascript
复制
  7 Seek_Error_Rate         POSR-K   082   060   030    -    4473742513

您有一个显着的查找错误率(这在过去更糟)。

一个无法纠正的块错误可能发生,它本身并不需要担心,甚至16个挂起的错误也可能发生,但是根据搜索错误率,我不相信这个驱动器,当这些驱动器失败时,它们通常会很快地失败,这在很大程度上是令人惊讶的。

运行一个坏块扫描,运行一个长的自我测试,并根据结果决定做什么。这个磁盘可能可以用于系统文件(或任何其他您可以轻松恢复的文件),但我可能不会将重要数据放在其中。

哪个主机的syslog是指?/var/log/syslog

是。它可能会显示与内部日志中相同的错误,这是LBA 0x00a03a40处不可纠正的读DMA EXT。

我在寻找我的sshd的实际寿命

代码语言:javascript
复制
  9 Power_On_Hours          -O--CK   087   087   000    -    11853

智能值被标准化为100 (较低的值更糟),当它们低于指示的阈值时,驱动器被视为“失败”。这就是为什么你的驱动器仍然通过:所有的值都在阈值之上。

它仍然在工作,它有一些不好的块(这是可能发生的),并且有可能,一旦你重新分配这些块,它将是很好的一段时间。所以您仍然可以使用它,但是正如我所写的,当它失败时,它可能会突然失败,因为高的搜索错误率已经表明了一些问题(可能是机械的)。

票数 2
EN
页面原文内容由Unix & Linux提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://unix.stackexchange.com/questions/722110

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档