Windows S5R2 App 4.33 available for Beta Test

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250214833
RAC: 35613

RE: K, I guess I just

Message 70631 in response to message 70630

Quote:
K, I guess I just assumed that BOINC would be used for all transfers like this. You can tell that I wasn't really paying attention earlier when you talked about the app "phoning home" to download the debugging symbols when needed :). So why didn't it do what it was supposed to do? Is it some misconfiguration with my LAN? It would be really galling if there are no more crashes to analyse :).


Don't know yet. I'll do a bit more of testing myself when I have more internet access than just a browser again.

There is a small chance that something is wrong with the symbol store on the server (http://einstein.phys.uwm.edu/symstore/einstein_S5R2_4.33_windows_intelx86.pdb/2545763533B44450B32318278AD17A631/einstein_S5R2_4.33_windows_intelx86.pd_)
or has been at that time. Apparently a DNS server (possibly @UWM) has occasional problems. There is also a small chance that for whatever reason the checksum in the symbol store doesn't match the one encoded in the App, e.g. if something went wrong during the build or the transfer.

The "Breakpoint" feature I described had been put into the App for testing the symbol store communication, you may want to give it a shot. If you get a -nosymbols- line again, then something went wrong again, otherwise you should get a proper stackdump, listing source code lines in the "worker thread" section instead of the "SymGetLineFromAddr()" calls of the previous dumps.

BM

BM

Sou'westerly
Sou'westerly
Joined: 9 Jun 06
Posts: 57
Credit: 715838
RAC: 0

Bernd, if it is the App

Message 70632 in response to message 70631

Bernd, if it is the App rather than BOINC initiating the download, could a firewall be blocking this? Dave.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117320948156
RAC: 35926678

Interesting point. I allow

Interesting point. I allow BOINC through the firewall but didn't make any provision for the app.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117320948156
RAC: 35926678

RE: The "Breakpoint"

Message 70634 in response to message 70631

Quote:
The "Breakpoint" feature I described had been put into the App for testing the symbol store communication, you may want to give it a shot. If you get a -nosymbols- line again, then something went wrong again, otherwise you should get a proper stackdump, listing source code lines in the "worker thread" section instead of the "SymGetLineFromAddr()" calls of the previous dumps.

OK, well just as Murphy predicted. The result in progress finished normally and has now been uploaded. I took the opportunity to use the EAH_MSC_BREAKPOINT "feature" to cause the next two results to fail (after setting NNW of course). Both of these artificial failures have been uploaded as well and the machine has now resumed normal operations.

A quick examination of the result output for the two "failures" reveals the comforting message "PDB Symbols loaded" so I presume everything is now in readiness for a real failure to come along :).

Let me know if there is anything else that needs to be done.

Cheers,
Gary.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 167

I saw it coming. Just at 98%

I saw it coming. Just at 98% crunching this Einstein result, my computer went into a STOP 0x0000009c error.
Rebooting again, the result ended with "The environment is incorrect. (0xa) - exit code 10 (0xa)".

An earlier result I tried on Windows (2000) ended in the same error.
My hopes are now pinned on the Linux result I still have.
Or one more week and I won't be able to post here anymore as my RAC has fallen off the charts. :(

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 715192062
RAC: 941040

RE: I saw it coming. Just

Message 70636 in response to message 70635

Quote:

I saw it coming. Just at 98% crunching this Einstein result, my computer went into a STOP 0x0000009c error.
Rebooting again, the result ended with "The environment is incorrect. (0xa) - exit code 10 (0xa)".

An earlier result I tried on Windows (2000) ended in the same error.
My hopes are now pinned on the Linux result I still have.
Or one more week and I won't be able to post here anymore as my RAC has fallen off the charts. :(

Well, the MS Knowledge Base article suggests this is a hardware problem. Is the Linux result on the same host (dual boot)?

CU

BRM

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 167

RE: Well, the MS Knowledge

Message 70637 in response to message 70636

Quote:
Well, the MS Knowledge Base article suggests this is a hardware problem. Is the Linux result on the same host (dual boot)?


It is, Hence why I want to see if that one can run to the end (using 4.35).

It's only the Einstein results that crash my computer at the 95-99% range in Windows. All other projects (Leiden, Primegrid, Cosmo, Predictor) ran rock stable.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250214833
RAC: 35613

Gary, your host #704557 has

Gary, your host #704557 has just returned 4 results with access violations and -nosymbols-. No clue what happened there, apparently not the "Fixing..." problem. It's probably not the same host where you adjusted the Firewall or manually added the PDB? Can you do it there, to?

BM

BM

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117320948156
RAC: 35926678

I have probably switched

I have probably switched about 40 hosts (I wasn't particularly counting) to the latest betas for both Windows and Linux. The machine you spotted just happens to be one of them. It, like many others, doesn't get monitored very often. It doesn't usually have keyboard, mouse or screen attached. All these machines were switched before I realised there was a problem with getting the symbol information so no others (as yet) have the .pdb installed.

The machine is actually at home so when I arrived at work and read your message I just turned around and went back home to install the .pdb. That's now been done and I'm back at work. The machine was chugging along and had racked up about 15 hours on the next job. When I plugged the screen in, there were some Microsoft dialog boxes informing me that the science app had performed an illegal operation and had been shut down, please report this to Microsoft - or words to that effect. I'm wondering if the Microsoft protection mechanism was shutting down the science app each time and BOINC was simply restarting the next task in the queue. Something must have changed because the next task was still running, apparently OK, some 15 hours later.

When I get some time, I'll go find all the Windows boxes running 4.33 and put a copy of the .pdb on each one.

The original machine that was playing up and on which I first installed a .pdb is still chugging along without issue. The next result is now over 80% completed. Murphy strikes again :).

PS: If anything is going to fail it should be today. It's midwinter here and today's temp is going to exceed 28 C so they say. Normal for this time of year is around 19 - 20 C.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117320948156
RAC: 35926678

Whilst adding .pdb files to

Whilst adding .pdb files to those boxes that are running the beta app, I noticed another machine which has some recent client errors. This machine now has the debugging symbols in place for the next time.

There are three error results, one with the message -

Quote:
The device does not recognize the command. (0x16) - exit code 22 (0x16)

and two with the message -

Quote:
- exit code -1073741819 (0xc0000005)

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.