Can’t Win, Can’t Break Even, Can’t Get Out Of The Game

You know what you never want to see? Ever? This:

Kernel Stack Inpage Error

That’s a particularly nasty variant of that old companion of people who do computer things for money: the BSOD. This one usually means one of three things: you have a virus on your Master Boot Record, the drive with your swap partition on it is hosed, or your memory chips are shot.

For those that don’t speak geek, none of those things is good.

And I started getting that BSOD intermittently today on my primary work machineHey, some of my personal machines are Linux, but I need Window$ for the job..

I can rule out the MBR virus–my computer has more prophylaxis that you would believe–which left memory or disk.

Eventually in the course of trying to diagnose the problem I took a look into Windows’ system event long. I was thinking–perhaps foolishly–that when the system stopped it would write something into the log about what the problem was. Well, there was no entry at crash time, nor anything near the crash time that looked like any kind of serious problem–no error icons, or scary event names, etc.

At this point I had to go get WinDiag, burn it to a disk, and then let my computer run through a couple of hours of memory tests. This turned out to be a waste of time–both because the tests passed, and because the actual clue I needed was in the event log all along.

It wasn’t at the point of the crash, not was it marked in a way that would make it stand out, but eventually I noticed it–a little entry, at the warning level, labelled only as “disk”. I clicked through for more info, and saw this:

Holy shit!

Dios Mio!

So, the operating system knew there was a serious problem with my primary hard drive, and it didn’t think I should know that? It just wrote a warning (a warning–not an error even??!?!) into the event long and went on its way, without notifying the user? I’m a geek, I know the event log, so I eventually found this, but how many Windows users would never see this at all, much less in time to do anything about it? Hell, if I hadn’t been seeing the Death screen I never would have found this message.

Oh, and apparently it’s been going on for a while:

Event List

Really nice, Microsoft, really nice.

A little bit of research led to me finding all about S.M.A.R.T. technology and how modern hard drives essentially keep an eye on themselves so they can warn you when they’re getting terminal. Which would be a great feature IF THE OS ACTUALLY PASSED ON THE NOTIFICATION.

So I grabbed a S.M.A.R.T. tool to check out the drive, and saw this:

Drivesitter scares me brown

Apparently if I had owned that tool all along, it would have told me when the drive reported it was dying. I wish my OS thought telling me was a good idea.

At this point what I really needed was one of these:

Calm The Hell Down Boy

Actually, it wasn’t that bad–most of my “I can’t lose that” files are backed up on one of my RAID arrays, but still it’s a pretty tremendous pain to have a machine’s primary drive die when you’re working to a harsh deadline.

Anyway, I called Dell and invoked my Corporate Computer Guy status to get them to overnight me a new drive for the laptop, and in the meantime I slapped a copy of Norton Ghost onto the box and quickly copied off an image of the machine before the drive died all the way. I was very relieved when I finally saw this:

What A Relief

Now I just need to wait for the new drive, and I can slap the image back onto it and be back to “productive” without needing to reinstall everything and restore all my backups, etc.

What a pain in the ass, though, and how easily it could all have been avoided if the first time Windows knew the drive was dying it just told me.

Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Canada
This work by Chris McLaren is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Canada.