K, I guess I just assumed that BOINC would be used for all transfers like this. You can tell that I wasn't really paying attention earlier when you talked about the app "phoning home" to download the debugging symbols when needed :). So why didn't it do what it was supposed to do? Is it some misconfiguration with my LAN? It would be really galling if there are no more crashes to analyse :).
Don't know yet. I'll do a bit more of testing myself when I have more internet access than just a browser again.
The "Breakpoint" feature I described had been put into the App for testing the symbol store communication, you may want to give it a shot. If you get a -nosymbols- line again, then something went wrong again, otherwise you should get a proper stackdump, listing source code lines in the "worker thread" section instead of the "SymGetLineFromAddr()" calls of the previous dumps.
The "Breakpoint" feature I described had been put into the App for testing the symbol store communication, you may want to give it a shot. If you get a -nosymbols- line again, then something went wrong again, otherwise you should get a proper stackdump, listing source code lines in the "worker thread" section instead of the "SymGetLineFromAddr()" calls of the previous dumps.
OK, well just as Murphy predicted. The result in progress finished normally and has now been uploaded. I took the opportunity to use the EAH_MSC_BREAKPOINT "feature" to cause the next two results to fail (after setting NNW of course). Both of these artificial failures have been uploaded as well and the machine has now resumed normal operations.
A quick examination of the result output for the two "failures" reveals the comforting message "PDB Symbols loaded" so I presume everything is now in readiness for a real failure to come along :).
Let me know if there is anything else that needs to be done.
I saw it coming. Just at 98% crunching this Einstein result, my computer went into a STOP 0x0000009c error.
Rebooting again, the result ended with "The environment is incorrect. (0xa) - exit code 10 (0xa)".
An earlier result I tried on Windows (2000) ended in the same error.
My hopes are now pinned on the Linux result I still have.
Or one more week and I won't be able to post here anymore as my RAC has fallen off the charts. :(
I saw it coming. Just at 98% crunching this Einstein result, my computer went into a STOP 0x0000009c error.
Rebooting again, the result ended with "The environment is incorrect. (0xa) - exit code 10 (0xa)".
An earlier result I tried on Windows (2000) ended in the same error.
My hopes are now pinned on the Linux result I still have.
Or one more week and I won't be able to post here anymore as my RAC has fallen off the charts. :(
Well, the MS Knowledge Base article suggests this is a hardware problem. Is the Linux result on the same host (dual boot)?
Well, the MS Knowledge Base article suggests this is a hardware problem. Is the Linux result on the same host (dual boot)?
It is, Hence why I want to see if that one can run to the end (using 4.35).
It's only the Einstein results that crash my computer at the 95-99% range in Windows. All other projects (Leiden, Primegrid, Cosmo, Predictor) ran rock stable.
Gary, your host #704557 has just returned 4 results with access violations and -nosymbols-. No clue what happened there, apparently not the "Fixing..." problem. It's probably not the same host where you adjusted the Firewall or manually added the PDB? Can you do it there, to?
I have probably switched about 40 hosts (I wasn't particularly counting) to the latest betas for both Windows and Linux. The machine you spotted just happens to be one of them. It, like many others, doesn't get monitored very often. It doesn't usually have keyboard, mouse or screen attached. All these machines were switched before I realised there was a problem with getting the symbol information so no others (as yet) have the .pdb installed.
The machine is actually at home so when I arrived at work and read your message I just turned around and went back home to install the .pdb. That's now been done and I'm back at work. The machine was chugging along and had racked up about 15 hours on the next job. When I plugged the screen in, there were some Microsoft dialog boxes informing me that the science app had performed an illegal operation and had been shut down, please report this to Microsoft - or words to that effect. I'm wondering if the Microsoft protection mechanism was shutting down the science app each time and BOINC was simply restarting the next task in the queue. Something must have changed because the next task was still running, apparently OK, some 15 hours later.
When I get some time, I'll go find all the Windows boxes running 4.33 and put a copy of the .pdb on each one.
The original machine that was playing up and on which I first installed a .pdb is still chugging along without issue. The next result is now over 80% completed. Murphy strikes again :).
PS: If anything is going to fail it should be today. It's midwinter here and today's temp is going to exceed 28 C so they say. Normal for this time of year is around 19 - 20 C.
Whilst adding .pdb files to those boxes that are running the beta app, I noticed another machine which has some recent client errors. This machine now has the debugging symbols in place for the next time.
There are three error results, one with the message -
Quote:
The device does not recognize the command. (0x16) - exit code 22 (0x16)
RE: K, I guess I just
)
Don't know yet. I'll do a bit more of testing myself when I have more internet access than just a browser again.
There is a small chance that something is wrong with the symbol store on the server (http://einstein.phys.uwm.edu/symstore/einstein_S5R2_4.33_windows_intelx86.pdb/2545763533B44450B32318278AD17A631/einstein_S5R2_4.33_windows_intelx86.pd_)
or has been at that time. Apparently a DNS server (possibly @UWM) has occasional problems. There is also a small chance that for whatever reason the checksum in the symbol store doesn't match the one encoded in the App, e.g. if something went wrong during the build or the transfer.
The "Breakpoint" feature I described had been put into the App for testing the symbol store communication, you may want to give it a shot. If you get a -nosymbols- line again, then something went wrong again, otherwise you should get a proper stackdump, listing source code lines in the "worker thread" section instead of the "SymGetLineFromAddr()" calls of the previous dumps.
BM
BM
Bernd, if it is the App
)
Bernd, if it is the App rather than BOINC initiating the download, could a firewall be blocking this? Dave.
Interesting point. I allow
)
Interesting point. I allow BOINC through the firewall but didn't make any provision for the app.
Cheers,
Gary.
RE: The "Breakpoint"
)
OK, well just as Murphy predicted. The result in progress finished normally and has now been uploaded. I took the opportunity to use the EAH_MSC_BREAKPOINT "feature" to cause the next two results to fail (after setting NNW of course). Both of these artificial failures have been uploaded as well and the machine has now resumed normal operations.
A quick examination of the result output for the two "failures" reveals the comforting message "PDB Symbols loaded" so I presume everything is now in readiness for a real failure to come along :).
Let me know if there is anything else that needs to be done.
Cheers,
Gary.
I saw it coming. Just at 98%
)
I saw it coming. Just at 98% crunching this Einstein result, my computer went into a STOP 0x0000009c error.
Rebooting again, the result ended with "The environment is incorrect. (0xa) - exit code 10 (0xa)".
An earlier result I tried on Windows (2000) ended in the same error.
My hopes are now pinned on the Linux result I still have.
Or one more week and I won't be able to post here anymore as my RAC has fallen off the charts. :(
RE: I saw it coming. Just
)
Well, the MS Knowledge Base article suggests this is a hardware problem. Is the Linux result on the same host (dual boot)?
CU
BRM
RE: Well, the MS Knowledge
)
It is, Hence why I want to see if that one can run to the end (using 4.35).
It's only the Einstein results that crash my computer at the 95-99% range in Windows. All other projects (Leiden, Primegrid, Cosmo, Predictor) ran rock stable.
Gary, your host #704557 has
)
Gary, your host #704557 has just returned 4 results with access violations and -nosymbols-. No clue what happened there, apparently not the "Fixing..." problem. It's probably not the same host where you adjusted the Firewall or manually added the PDB? Can you do it there, to?
BM
BM
I have probably switched
)
I have probably switched about 40 hosts (I wasn't particularly counting) to the latest betas for both Windows and Linux. The machine you spotted just happens to be one of them. It, like many others, doesn't get monitored very often. It doesn't usually have keyboard, mouse or screen attached. All these machines were switched before I realised there was a problem with getting the symbol information so no others (as yet) have the .pdb installed.
The machine is actually at home so when I arrived at work and read your message I just turned around and went back home to install the .pdb. That's now been done and I'm back at work. The machine was chugging along and had racked up about 15 hours on the next job. When I plugged the screen in, there were some Microsoft dialog boxes informing me that the science app had performed an illegal operation and had been shut down, please report this to Microsoft - or words to that effect. I'm wondering if the Microsoft protection mechanism was shutting down the science app each time and BOINC was simply restarting the next task in the queue. Something must have changed because the next task was still running, apparently OK, some 15 hours later.
When I get some time, I'll go find all the Windows boxes running 4.33 and put a copy of the .pdb on each one.
The original machine that was playing up and on which I first installed a .pdb is still chugging along without issue. The next result is now over 80% completed. Murphy strikes again :).
PS: If anything is going to fail it should be today. It's midwinter here and today's temp is going to exceed 28 C so they say. Normal for this time of year is around 19 - 20 C.
Cheers,
Gary.
Whilst adding .pdb files to
)
Whilst adding .pdb files to those boxes that are running the beta app, I noticed another machine which has some recent client errors. This machine now has the debugging symbols in place for the next time.
There are three error results, one with the message -
and two with the message -
Cheers,
Gary.