Windows S5R2 App 4.33 available for Beta Test

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7057134931
RAC: 1603667

RE: I maybe found why it

Message 70661 in response to message 70660

Quote:
I maybe found why it went wrong for me. In the app_info that was attached to the beta the app name is "einstein_S5R2" but now when I´m running the official Boinc is calling it "Hierarchical all-sky pulsar search" that must be why Boinc exited the unfinished WU for me .


I had one result in progress, and one in queue. I marked the unstarted one "suspend", and when the in progress one had reported in exited boincmgr and deleted ap_info.xml.

On restart and unsuspend, the unstarted one started and instantly errored.

On the reporting page, this result lists as:
Outcome: client error
Client State: compute error

std errout notes the error as:
-185 (0xffffff47)

While it is a bit antisocial to allow results to error out this way, I think Einstein at the moment has enough spare server capacity that this is an acceptable alternative to running one's queue all the way down to zero. All the results I have in queue are pretty fresh, so I'm not delaying my quorum partners by much. Suspending Einstein fetch, and running the queues to zero, would give me a huge overfetch of SETI (I give it about 8%).

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245213101
RAC: 12863

As some firewalls etc.

As some firewalls etc. apparently are blocking to download the PDB by the App, we again bundled the PDB file with the official App. This might be an intermediate measurement while we rely so strongly on the debugger feedback.

I don't know for sure, but my guess is that this error when deleting the app_info.xml happens as the client finds the pdb file missing and thus errors the result. To test (and hopefully avoid) this, try to downlad the pdb manually from the link I gave in a message to Gary before removing the app_info.xml.

Gary, are you using a proxy? I'm afraid the App doesn't inherit the proxy settings from the client.

BM

BM

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7057134931
RAC: 1603667

RE: As some firewalls etc.

Message 70663 in response to message 70662

Quote:

As some firewalls etc. apparently are blocking to download the PDB by the App, we again bundled the PDB file with the official App. This might be an intermediate measurement while we rely so strongly on the debugger feedback.

I don't know for sure, but my guess is that this error when deleting the app_info.xml happens as the client finds the pdb file missing and thus errors the result. To test (and hopefully avoid) this, try to downlad the pdb manually from the link I gave in a message to Gary before removing the app_info.xml.

Gary, are you using a proxy? I'm afraid the App doesn't inherit the proxy settings from the client.

BM


Just now, I tried downloading the .pdb file from the special download location you specified in a post about July 30. I then stopped boincmgr and removed the ap_info.xml file which had been pointing to einstein_S5R2_4.33_windows_intelx86.exe.

On restarting boincmgr, all my current Einstein instantly errored out, with these two lines in red in boincmgr messages pane:

8/19/2007 4:39:21 PM|Einstein@Home|[error] Application file einstein_S5R2_4.33_windows_intelx86.exe missing signature
8/19/2007 4:39:21 PM|Einstein@Home|[error] BOINC cannot accept this file

If I was a typical up-to-date 4.33 beta user, and did properly what you suggested, it appears this method won't give an error-free crossover.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245213101
RAC: 12863

RE: [8/19/2007 4:39:21

Message 70664 in response to message 70663

Quote:
[8/19/2007 4:39:21 PM|Einstein@Home|[error] Application file einstein_S5R2_4.33_windows_intelx86.exe missing signature
8/19/2007 4:39:21 PM|Einstein@Home|[error] BOINC cannot accept this file


Surprising. I've never seen that message before. Did switching back from a previous Beta App work before with the same version of the BOINC Client?

BM

BM

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7057134931
RAC: 1603667

RE: RE: [8/19/2007

Message 70665 in response to message 70664

Quote:
Quote:
[8/19/2007 4:39:21 PM|Einstein@Home|[error] Application file einstein_S5R2_4.33_windows_intelx86.exe missing signature
8/19/2007 4:39:21 PM|Einstein@Home|[error] BOINC cannot accept this file

Surprising. I've never seen that message before. Did switching back from a previous Beta App work before with the same version of the BOINC Client?

BM


Not sure on that point. Over on SETI I found I could use the ap_info.xml to get a new "optimized" ap running, then remove it and have it keep running, getting automatically replaced when the official ap moved on to a new version--but that may have been with older clients, and any it is a somewhat different setup.

Anyway, I'm converted, just posted that in case it might be helpful to someone.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109968192443
RAC: 30478943

RE: As some firewalls etc.

Message 70666 in response to message 70662

Quote:
As some firewalls etc. apparently are blocking to download the PDB by the App, we again bundled the PDB file with the official App. This might be an intermediate measurement while we rely so strongly on the debugger feedback.

I'm not using a proxy at all. I do prevent apps in general from accessing anything outside the LAN unless they have a specific need to do so. Initially I didn't realise that the EAH app needed to grab the PDB symbols. However once I realised this I simply installed the PDB everywhere it might be needed rather than changing the firewall.

Quote:
I don't know for sure, but my guess is that this error when deleting the app_info.xml happens as the client finds the pdb file missing and thus errors the result. To test (and hopefully avoid) this, try to downlad the pdb manually from the link I gave in a message to Gary before removing the app_info.xml.

I'm not 100% certain but here is my take on this problem and it's nothing to do with a missing or otherwise PDB file. When you distribute a new app normally, it gets recorded in the state file with an entry just like the following:-

In other words it comes packaged with a file signature which I imagine is calculated by the server and sent along with the executable. Before running the execuatble the BOINC client calculates its own signature from the execuatble and compares it with the value sent by the server to see if any tampering has occurred with the executable. If the signatures don't agree or if the signature is missing, the BOINC client refuses to run the executable with the results that people have seen.

When you use the app_info.xml mechanism, you are taking personal responsibility for the integrity of the executable you intend to use so there is no signature recorded in the state file. Here is an example of just such a situation:-

Quote:

einstein_S5R2_4.33_windows_intelx86.exe
0.000000
0.000000
1


You can see that minimal information about your intended executable gets recorded - no file signature and no download URLs.

There are quite a few things about this I don't understand but I reckon that the easy way for a person to solve the current problem would be to delete both the app_info.xml and the executable after stopping BOINC. That way the BOINC client on restarting would be forced to get a properly signatured file from the server and all should then proceed normally after that.

I also am of the opinion that it was previously possible to revert from the anonymous platform mechanism simply by deleting the app_info.xml file. There must be some variation or extra code needed in the app_info.xml to achieve this. Perhaps we need the assistance of whoever put this mechanism into the BOINC software in the first place :).

Cheers,
Gary.

Klimax
Klimax
Joined: 27 Apr 07
Posts: 87
Credit: 1370205
RAC: 0

I have following

I have following error:
16.8.2007 20:40:03|Einstein@Home|Reason: Unrecoverable error for result h1_0485.20_S5R2__160_S5R2c_1 ( h1_0485.20_S5R2__160_S5R2c_1_0 -161)

What could go wrong this time?It happend after hibernation of Win XP.Internet access was deactivated.

Result

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245213101
RAC: 12863

Probably switching back

Probably switching back doesn't work since we enabled verify_files_on_app_start. Sorry, we didn't think about this side effect for Beta tests. I'll see if there's any non-destructive way of switching back to the official App path from the Beta.

For now I recommend to NOT remove the app_info.xml while you have tasks in progress with more than 5% done.

Probably this week we'll get a new Windows Beta App anyway (still debugging...).

BM

BM

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109968192443
RAC: 30478943

RE: Probably switching back

Message 70670 in response to message 70669

Quote:
Probably switching back doesn't work since we enabled verify_files_on_app_start. Sorry, we didn't think about this side effect for Beta tests....

Aahhh.... I remember reading that at the time but thought it only applied to data files like sun, earth, etc, so it didn't strike me as likely to have any nasty side effects like deciding that the app itself is now not acceptable. Instead of simply trashing the current work in progress, maybe a better outcome would be simply for BOINC not to restart the app at all and put up a dialog box explaining what it's upset about. This could give the user an option or two on how to proceed - like restoring the app_info.xml file for example :).

So in the meantime all beta testers should keep their app_info.xml files. I hadn't touched any of mine - probably more than 50 by now - I've sort of lost count :). I reckon I'll simply await the next beta and deal with the problem then :).

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109968192443
RAC: 30478943

RE: I have following

Message 70671 in response to message 70668

Quote:

I have following error:
16.8.2007 20:40:03|Einstein@Home|Reason: Unrecoverable error for result h1_0485.20_S5R2__160_S5R2c_1 ( h1_0485.20_S5R2__160_S5R2c_1_0 -161)

What could go wrong this time?It happend after hibernation of Win XP.Internet access was deactivated.

Result

You can lookup the BOINC FAQs on the Mundayweb site and find that -161 is essentially a "File not found" type error. You can also look at the stderr.out output in your oldest problem result on the EAH website and you will find a couple of little excerpts like this:-

Quote:
2007-08-15 23:30:13.4375 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R2_4.33_windows_intelx86.exe'.
2007-08-15 23:30:31.4687 [normal]: WARNING: boinc-resolved result file "Hough.out" in local directory - will zip into "Hough.out.zip"
2007-08-15 23:31:16.4375 [debug]: Reading SFTs and setting up stacks ... done
2007-08-15 23:40:12.4375 [debug]: Found checkpoint - reading...
2007-08-15 23:40:12.4375 [debug]: Read checkpoint - reading previous output...
2007-08-15 23:40:42.4375 [debug]: Read exactly 690137 == maxbytes from Fstat-file, that's enough.
2007-08-15 23:40:42.4375 [debug]: DEBUG: read_fstat_toplist_from_fp() returned 690137
2007-08-15 23:40:42.4375 [debug]: Total skypoints = 57808. Progress: 49410,
$Revision: 1.45 $ OPT:0 SCV:2, SCTRIM:8

To me it would appear that the result output file Hough.out (which normally can be seen in the appropriate "slots" folder is being written in the main BOINC folder - perhaps when your machine hibernates. Each time BOINC restarts it seems to be complaining that the file is in the wrong place. The crunching seems to finish normally but then the "file not found" error occurs so you get no credit for what would appear to be an otherwise successful result.

Quote:

57792, 57793, 57794, 57795, 57796, 57797, 57798, 57799, 57800, 57801, 57802, 57803, 57804, 57805, 57806, 57807, done.

h1_0485.20_S5R2__160_S5R2c_1_0
-161

The "done" usually signifies successful completion of crunching. I imagine the next step is the uploading of the result which is where you get the which sort of implies that the result (or perhaps Hough.out) can't be found where it's supposed to be.

If it were my machine, I'd try to ensure that it didn't hibernate - unless you know that it has successfuly recovered from hibernation previously. Please realise that this is just speculation on my part as I haven't seen this before.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.