[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Edlug Archive Mar 2004
]
Re: [edlug] Failing hdd and mystery crash
Mike Moran wrote:
... I've now seen the voice (face?) of doom in my logs:
Mar 2 19:13:57 xxx kernel: hda: drive_cmd: status=0x51 { DriveReady
SeekComplete Error }
Mar 2 19:13:57 xxx kernel: hda: drive_cmd: error=0x04 { DriveStatusError }
This is sometimes accompanied by a 'chunk, kechunk' noise, and can
happen during normal runtime (ie not just at startup). I've now opened
it up and made sure the cables are secure; no kechunks yet.
I was wondering what I could do that would stress test it enough to
cause the problem to appear again? I was thinking of some sort of
recursive find that would have a peek in each file? I presume if you
just do "find / -print" then that may just access all the directory
entries and not access any files?
I've come up with this so far:
date > /var/tmp/find5.txt; find / -path /proc -prune -o -path /dev
-prune -o -fstype ext2 -type f -print -exec wc {} \; >>
/var/tmp/find5.txt 2>&1; date >> /var/tmp/find5.txt
It does the entire thing in about 1.5 minutes. I'm not sure if this is
really doing what I want it to ie accessing a lot of the filesystem. I
expect various caches may be scuppering my efforts.
Anyway, after running this a few times, I've heard the kechunk noise a
few times as well (though not every time). The thing is, whilst I was
running these checks, there were *no* log entries in /var/log/messages
of the form I saw before ie no hda: drive_cmd entries. However, now, 6
minutes after the last find completed, I get these:
Mar 3 12:36:11 xxx kernel: hda: drive_cmd: status=0x51 { DriveReady
SeekComplete Error }
Mar 3 12:36:11 xxx kernel: hda: drive_cmd: error=0x04 { DriveStatusError }
Mar 3 12:36:11 xxx kernel: hda: drive_cmd: status=0x51 { DriveReady
SeekComplete Error }
Mar 3 12:36:11 xxx kernel: hda: drive_cmd: error=0x04 { DriveStatusError }
Does anyone have any ideas on what could be causing these kechunk/error
symptoms?
FYI, according to smart, the disk type is "WDC WD400BB-75AUA1". I've
done a search on google for "WDC" and the log error and mostly the
responses seem to range from "You've got a duff disk, throw it away/send
it back" through "Your disk controller is radge" to "Your cable/s are
broken/loose". Note that this machine, in its former guise as a Windows
2000 box, has been running fine with these occasional kechunk noises for
the past 2 or 3 years.
--
Mike
-
----------------------------------------------------------------------
You can find the EdLUG mailing list FAQ list at:
http://www.edlug.org.uk/list_faq.html