WU crash

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0
Topic 191912

Hi folks,

I've had some stability issues (bluescreens, random reboots and freezing while playing certain games) with my computer for a while. My guess is that the problem is related to my antivirus causing a weird access violation in combination with my graphics card driver, but I'm really just guessing here. What I know for sure is that the problem is software-side; not voltage-or heat-related. I'm using Asus Probe and the readings are completely okay.
The reason I ask is that when I was playing the computer had a bluescreen again (for about the 4th time today, which is extreme but not unheard of) and, for the first time, it also affected BOINC. My half completed WU crashed with the following error code:
2006-10-06 20:04:31 [Einstein@Home] Unrecoverable error for result h1_1314.5_S5R1__1509_S5R1a_0 ( - exit code 99 (0x63))
which would probably be an "illegal function" or something. As a HashClash WU showed just the same behaviour it can't really be related to the WU as such.
Now what I'd like to know is: Is there any way to find out the exact cause of my problems? Or, if I can't stop my computer from crashing, how can I at least make sure my WUs aren't affected? I'm only getting long WUs atm and it's really annoying to lose one in the middle.
Thanks in advance
Annika

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

WU crash

Generally speaking most crashes only result in EAH picking up from the last checkpoint.

However, as you've just seen, it is possible to have the app crash in such a way that it starts from scratch or worse.

Unfortunately, unless you can figure what the source of the BSOD is and eliminate it the only sure way to avoid losing the result is to exit BOINC when doing known risky tasks like game playing and full virus scans for example.

The problem of tracking it down since it relates with a BOINC project is you may end up trashing a bunch of results during the usual troubleshooting proceedures. Whereas this shouldn't be much of a problem, some people get peeved when they see a host trashing a lot results, but tough for them in this case. ;-)

Alinator

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752699655
RAC: 1467489

RE: Hi folks, I've had

Quote:

Hi folks,

I've had some stability issues (bluescreens, random reboots and freezing while playing certain games) with my computer for a while. My guess is that the problem is related to my antivirus causing a weird access violation in combination with my graphics card driver, but I'm really just guessing here. What I know for sure is that the problem is software-side; not voltage-or heat-related. I'm using Asus Probe and the readings are completely okay.
The reason I ask is that when I was playing the computer had a bluescreen again (for about the 4th time today, which is extreme but not unheard of) and, for the first time, it also affected BOINC. My half completed WU crashed with the following error code:
2006-10-06 20:04:31 [Einstein@Home] Unrecoverable error for result h1_1314.5_S5R1__1509_S5R1a_0 ( - exit code 99 (0x63))
which would probably be an "illegal function" or something. As a HashClash WU showed just the same behaviour it can't really be related to the WU as such.
Now what I'd like to know is: Is there any way to find out the exact cause of my problems? Or, if I can't stop my computer from crashing, how can I at least make sure my WUs aren't affected? I'm only getting long WUs atm and it's really annoying to lose one in the middle.
Thanks in advance
Annika


Looking at the result text for the crashed WU, it contains:

2006-10-06 20:04:28.0625 [CRITICAL]: ERROR: could not parse line 29499 in skyGrid-file './grid_1320_h_T10_S5R1.dat'

Level 0: $Id: ComputeFStatistic.c,v 1.371 2006/06/09 12:48:58 reinhard Exp $
Function call `InitSearchGrid(status, &thisScan, &GV)' failed.
file ComputeFStatistic.c, line 509

just after a restart. Judging from my machine, the skyGrid file is downloaded from Einstein as data, and shouldn't be changed while it's running.

So if read it OK when it started running, but can't read it now when restarting after a crash, there's a possibility that you've got errors on your hard disk. Probably the result of the crashes, rather than the cause of them (sorry, can't help you there), but perhaps it would be wise to check the drive.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

Agreed, my experience has

Agreed, my experience has been when this happens the grid file has ended up getting cross-linked when the crash happened.

On my 9x boxes I have been able to recover from it sometimes, but only because I don't let Scandisk autofix the drive on the reboot, and use third party tools to cleanup the mess. ;-)

Alinator

Should have noted here that the grid_xxxxx_S5R1.dat file is the actual result you're running and is not the same thing as the main data pack file, which is the big one in the project directory.

Alinator

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

RE: So if read it OK when

Message 47572 in response to message 47570

Quote:
So if read it OK when it started running, but can't read it now when restarting after a crash, there's a possibility that you've got errors on your hard disk. Probably the result of the crashes, rather than the cause of them (sorry, can't help you there), but perhaps it would be wise to check the drive.

Any suggestions which tools I should use? I'm running Win XP and it's a SATA Raid HDD (and less than two months old, so I do hope it's not an hardware issue).

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

That's one of the problems

That's one of the problems I've run into, in that I really haven't found the same kind tools for NTFS on 2K/XP which were commonly available for FAT and FAT32 on 9x (at least at a cost I was willing to pay for my own personal use). Maybe someone else has a suggestion in this regard? (Hint, hint) ;-)

At least in my case, the kind of result killing BSOD like you just had on my NT based machines are so infrequent (I can think of maybe 1 or 2 in the last year, and basically were my own fault) I never felt the need to look into working around it and just wrote off the result as an occupational hazard for the serious cruncher. :-)

FWIW, I don't think this is an indication of a hardware fault with your SATA RAID, since your description and the files involved closely match my 9x experience. Unless of course you are having some kind bizarro driver conflict as you mentioned earlier.

Alinator

What I use on 9x is a more comprehensive and flexible disk utility (of which there were several good ones in the "old" days) and sometimes a Hex Editor to put the pieces back together.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752699655
RAC: 1467489

RE: Maybe someone else has

Message 47574 in response to message 47573

Quote:
Maybe someone else has a suggestion in this regard? (Hint, hint) ;-)


If you're looking at me, 'fraid not.

This WU has gone for good. Einstein seems to be very good at re-sending any missing files it's going to need again: although the skyGrid isn't the main datafile, you don't get one with every download, so I guess it has some kind of persistency, but you could safely delete it in this sort of situation.

So I wouldn't put any effort into trying to preserve fragments of cross-linked files. Just run the standard WinXP checker tool to sort out logical inconsistencies on the volume - there may be some in your games folder too - and keep an eye on how it behaves for the next few days. Like Alinator, I don't think it sounds like any sort of hardware problem with the drive array itself - just some random corruption of files, or bits of the filing system, that were in use when the crash happened.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752699655
RAC: 1467489

RE: That's one of the

Message 47575 in response to message 47573

Quote:
That's one of the problems I've run into, in that I really haven't found the same kind tools for NTFS on 2K/XP which were commonly available for FAT and FAT32 on 9x (at least at a cost I was willing to pay for my own personal use).


When I come across really serious data corruption on a drive, such as a WinXP boot disk which won't, I use GetDataBack from Runtime Software. Reasonably priced, and hasn't let me down yet. [That's for getting important data off a failed/failing drive, not necessarily putting it back into working order.]

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

Okay, so if I understand this

Okay, so if I understand this correctly you mean this kind of problem can occur whenever Windows crashes; for whatever reason. In that case it's okay with me... I can live with the risk of having this kind of bad luck once in two months or so. Thanks for clearing that up, guys :-)
Besides, I did some cleaning up today, meaning I installed a new sound card driver and deactivated an un-necessary and not entirely reliable part of my skinpack, and it looks like this might have helped (meaning the PC hasn't crashed since although I had a hardcore gaming evening).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.