[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Edlug Archive Mar 2004 ]

Re: [edlug] Failing hdd and mystery crash




Mike Moran wrote:

... I've now seen the voice (face?) of doom in my logs:


Mar 2 19:13:57 xxx kernel: hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
Mar 2 19:13:57 xxx kernel: hda: drive_cmd: error=0x04 { DriveStatusError }


This is sometimes accompanied by a 'chunk, kechunk' noise, and can happen during normal runtime (ie not just at startup). I've now opened it up and made sure the cables are secure; no kechunks yet.

I was wondering what I could do that would stress test it enough to cause the problem to appear again? I was thinking of some sort of recursive find that would have a peek in each file? I presume if you just do "find / -print" then that may just access all the directory entries and not access any files?

I've come up with this so far:


date > /var/tmp/find5.txt; find / -path /proc -prune -o -path /dev -prune -o -fstype ext2 -type f -print -exec wc {} \; >> /var/tmp/find5.txt 2>&1; date >> /var/tmp/find5.txt

It does the entire thing in about 1.5 minutes. I'm not sure if this is really doing what I want it to ie accessing a lot of the filesystem. I expect various caches may be scuppering my efforts.

Anyway, after running this a few times, I've heard the kechunk noise a few times as well (though not every time). The thing is, whilst I was running these checks, there were *no* log entries in /var/log/messages of the form I saw before ie no hda: drive_cmd entries. However, now, 6 minutes after the last find completed, I get these:

Mar 3 12:36:11 xxx kernel: hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
Mar 3 12:36:11 xxx kernel: hda: drive_cmd: error=0x04 { DriveStatusError }
Mar 3 12:36:11 xxx kernel: hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
Mar 3 12:36:11 xxx kernel: hda: drive_cmd: error=0x04 { DriveStatusError }


Does anyone have any ideas on what could be causing these kechunk/error symptoms?

FYI, according to smart, the disk type is "WDC WD400BB-75AUA1". I've done a search on google for "WDC" and the log error and mostly the responses seem to range from "You've got a duff disk, throw it away/send it back" through "Your disk controller is radge" to "Your cable/s are broken/loose". Note that this machine, in its former guise as a Windows 2000 box, has been running fine with these occasional kechunk noises for the past 2 or 3 years.

--
Mike

-
----------------------------------------------------------------------
You can find the EdLUG mailing list FAQ list at:
http://www.edlug.org.uk/list_faq.html



This archive is kept by wibble@morpheux.org.DONTSPAMME
homepage