The Win App 4.24 dropped nearly 40 WUs with 99 (0x66) Recursive error exit code on my computer in the early hours, but it works well now.
My computer automatically switched to an other project, because the daily quota was reduced and I didn't get new WUs.
I'm currently waiting for some internal tests to finish ...
Is there likely to be any performance improvement? Now that validation appears to be fixed, it would be nice to see some potential relief from the deadline pressure issue that is troubling many participants and not just those with older boxes :).
What caused it to start behaving again do you think? New (different frequency) data file perhaps?
I don't know yet. But I looked this problem on other computers too, they also reported client errors after a second. So it isn't a big time wasting thing, but these computers run out of the daily quota limit. And this problem usually happens on WU series.
An exit code of 99 means that the App terminated due to a failing internal sanity check. There should be a dump of a "status structure", similar to a stack dump, at the end of stderr_out, indicating the check that failed.
The most common cause of an error of this type is present when the following lines are found at the end of the dump:
[...] Status code 3: Incorrect header in file
[...] function LALSFTdataFind, file SFTfileIO.c, line 270
This means that a data file the App is trying to access has a broken header signature. The md5 checksum of downloaded files is checked by the BOINC Core Client only after downloading, so it might be that at some later point the file went bad on the disk. The fact that it has "cured itself" might be due to that you recently got work that doesn't require this particular file anymore.
There is a chance, though, that something else (I don't know of yet) is going wrong during accessing the file (e.g. it is blocked by a virus scanner) that the boinc_fopen() function that we are using doesn't catch.
Akos, what are the last few lines of stderr_out of the results in question? What other tools accessing the filesystem (virus scanner, malware removal etc.) are you using?
BTW: Anyone knows if the standard Microsoft Malware removal tool has any influence on BOINC Apps?
BTW: Anyone knows if the standard Microsoft Malware removal tool has any influence on BOINC Apps?
BM
The one that you get on Microsoft Patchday? I never had any problems with it, but I guess if it would cause problems, they would kind of "stick out" statistically because most people will execute this tool automatically on MS Patchday, which is always a Wednesday, right? Might be worthwile to group errors by weekdays.
As to file corruption, the MD5 checksums are in client_state.xml, right? So one could check unless the file is now already deleted.
CU
BTW: Anyone knows if the standard Microsoft Malware removal tool has any influence on BOINC Apps?
The one that you get on Microsoft Patchday? I never had any problems with it, but I guess if it would cause problems, they would kind of "stick out" statistically because most people will execute this tool automatically on MS Patchday, which is always a Wednesday, right? Might be worthwile to group errors by weekdays.
As to file corruption, the MD5 checksums are in client_state.xml, right? So one could check unless the file is now already deleted.
Good shots!
Akos, can you dig out the checksums from client_state.xml and check your data files? There's probably a simple too for Windows that does this (I usually use md5sum from Cygwin).
Akos, can you dig out the checksums from client_state.xml and check your data files? There's probably a simple too for Windows that does this (I usually use md5sum from Cygwin).
Probably I can't check it before tueasday, but i keep it on my mind.
The Win App 4.24 dropped
)
The Win App 4.24 dropped nearly 40 WUs with 99 (0x66) Recursive error exit code on my computer in the early hours, but it works well now.
My computer automatically switched to an other project, because the daily quota was reduced and I didn't get new WUs.
RE: The Win App 4.24
)
The straight 4.24 app or one you had optimised?
What caused it to start behaving again do you think? New (different frequency) data file perhaps?
Cheers,
Gary.
RE: I'm currently waiting
)
Is there likely to be any performance improvement? Now that validation appears to be fixed, it would be nice to see some potential relief from the deadline pressure issue that is troubling many participants and not just those with older boxes :).
Cheers,
Gary.
RE: RE: The Win App 4.24
)
The official 4.24 app.
I don't know yet. But I looked this problem on other computers too, they also reported client errors after a second. So it isn't a big time wasting thing, but these computers run out of the daily quota limit. And this problem usually happens on WU series.
An exit code of 99 means that
)
An exit code of 99 means that the App terminated due to a failing internal sanity check. There should be a dump of a "status structure", similar to a stack dump, at the end of stderr_out, indicating the check that failed.
The most common cause of an error of this type is present when the following lines are found at the end of the dump:
[...] Status code 3: Incorrect header in file
[...] function LALSFTdataFind, file SFTfileIO.c, line 270
This means that a data file the App is trying to access has a broken header signature. The md5 checksum of downloaded files is checked by the BOINC Core Client only after downloading, so it might be that at some later point the file went bad on the disk. The fact that it has "cured itself" might be due to that you recently got work that doesn't require this particular file anymore.
There is a chance, though, that something else (I don't know of yet) is going wrong during accessing the file (e.g. it is blocked by a virus scanner) that the boinc_fopen() function that we are using doesn't catch.
Akos, what are the last few lines of stderr_out of the results in question? What other tools accessing the filesystem (virus scanner, malware removal etc.) are you using?
BTW: Anyone knows if the standard Microsoft Malware removal tool has any influence on BOINC Apps?
BM
BM
RE: Akos, what are the last
)
I see the same stderr output in every cases.
[pre]5.4.11
- exit code 99 (0x63)
2007-07-12 23:57:31.9531 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R2_4.24_windows_intelx86.exe'.
2007-07-12 23:57:32.2968 [debug]: Reading SFTs and setting up stacks ... Level 0: $Id: HierarchicalSearch.c,v 1.170 2007/06/08 20:58:34 bema Exp $
Function call `SetUpSFTs( &status, &stackMultiSFT, &stackMultiNoiseWeights, &stackMultiDetStates, &usefulParams)' failed.
file HierarchicalSearch.c, line 677
2007-07-12 23:57:33.5937 [normal]:
Level 1: $Id: HierarchicalSearch.c,v 1.170 2007/06/08 20:58:34 bema Exp $
2007-07-12 23:57:33.5937 [normal]: Status code -1: Recursive error
2007-07-12 23:57:33.5937 [normal]: function SetUpSFTs, file HierarchicalSearch.c, line 1250
2007-07-12 23:57:33.5937 [normal]:
Level 2: $Id: SFTfileIO.c,v 1.123 2007/04/24 15:32:38 bema Exp $
2007-07-12 23:57:33.5937 [normal]: Status code 3: Incorrect header in file
2007-07-12 23:57:33.5937 [normal]: function LALSFTdataFind, file SFTfileIO.c, line 270
2007-07-12 23:57:33.5937 [CRITICAL]: BOINC_LAL_ErrHand(): now calling boinc_finish()[/pre]
Only a Total Commander.
OS is a Win2000 with SP4.
RE: BTW: Anyone knows if
)
The one that you get on Microsoft Patchday? I never had any problems with it, but I guess if it would cause problems, they would kind of "stick out" statistically because most people will execute this tool automatically on MS Patchday, which is always a Wednesday, right? Might be worthwile to group errors by weekdays.
As to file corruption, the MD5 checksums are in client_state.xml, right? So one could check unless the file is now already deleted.
CU
BRM
RE: RE: BTW: Anyone knows
)
Good shots!
Akos, can you dig out the checksums from client_state.xml and check your data files? There's probably a simple too for Windows that does this (I usually use md5sum from Cygwin).
BM
BM
RE: Akos, can you dig out
)
Probably I can't check it before tueasday, but i keep it on my mind.
Here's another client error
)
Here's another client error that looks kind of interesting:
http://einsteinathome.org/task/85564323
Computation stopped near the end with this message:
45172, 45173, 45174, 45175, 45176, 45177, 45178, 45179, 45180, 45181, 45182, 45183, 45184, c
45185, 45186, 45187, 2007-07-13 18:59:59.8281 [CRITICAL]: Required frequency-bins [-8, 8] not covered by SFT-interval [788941, 789228]
XLAL Error - LocalXLALComputeFaFb (LocalComputeFstat.c:534): Input domain error
Level 0: $Id: HierarchicalSearch.c,v 1.170 2007/06/08 20:58:34 bema Exp $
Function call `COMPUTEFSTATFREQBAND ( &status, fstatVector.data + k, &thisPoint, stackMultiSFT.data[k], stackMultiNoiseWeights.data[k], stackMultiDetStates.data[k], &CFparams)' failed.
file HierarchicalSearch.c, line 1019
2007-07-13 18:59:59.8281 [normal]:
Level 1: $Id: LocalComputeFstat.c,v 1.34 2007/06/09 22:11:24 bema Exp $
2007-07-13 18:59:59.8281 [normal]: Status code -1: Recursive error
2007-07-13 18:59:59.8281 [normal]: function LocalComputeFStatFreqBand, file LocalComputeFstat.c, line 207
2007-07-13 18:59:59.8281 [normal]:
Level 2: $Id: LocalComputeFstat.c,v 1.34 2007/06/09 22:11:24 bema Exp $
2007-07-13 18:59:59.8281 [normal]: Status code 5: XLAL function call failed
2007-07-13 18:59:59.8281 [normal]: function LocalComputeFStat, file LocalComputeFstat.c, line 342
2007-07-13 18:59:59.8281 [CRITICAL]: BOINC_LAL_ErrHand(): now calling boinc_finish()
Wingman has completed it's result successfuly, also with Windows version 4.24. Go figure..