App info question

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 0

RE: Believe me - I've been

Message 96619 in response to message 96618

Quote:

Believe me - I've been monitoring to see if the devs have got it right! This was v6.10.25:

Are any of the versions with the deby bug fixed stable enough to upgrade to? I'm running 6.10.11 on my CUDA pc. A a non-cuda system with a newer install has 6.10.18.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2977141034
RAC: 775748

RE: RE: Believe me - I've

Message 96620 in response to message 96619

Quote:
Quote:
Believe me - I've been monitoring to see if the devs have got it right! This was v6.10.25:

Are any of the versions with the deby bug fixed stable enough to upgrade to? I'm running 6.10.11 on my CUDA pc. A a non-cuda system with a newer install has 6.10.18.


They've been steadily improving (under constant nagging from alpha testers), but unfortunately I took them back a few steps with a bug report yesterday.

For Windows use (my platform), I'd say that any of v6.10.19, .20 or .21 gives the widest range of available projects. AQUA, which is pioneering multi-threaded apps, is the hardest to get right.

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7262301849
RAC: 1562981

RE: Well, but why are you

Message 96621 in response to message 96614

Quote:
Well, but why are you using an app_info.xml in the first place?


That is a fair question, with more than one answer. Dan has mentioned the transition handling issue as it regards already downloaded work, especially in the presence of multiple projects with a high mismatch in effort allocation.

A matter which motivates me, but perhaps few others, is the fact that I run a COMODO firewall with settings which mean that the system asks me to give a ruling the first time any new executable is invoked. In the case of BOINC aps, if I am not there to give permission, after a little while it errors out in a way that trashes the result, then flies right along burning off the inventory of work in a few minutes. By the time I notice, typically there is no work, and work fetch is disabled until the next day for reason of having burned up the 1 result/core/day quota as reduced for the errors.

Using ap_info is one way to avoid this happening, especially when I leave a host unattended for days. It does, however, haver other disadvantages.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 756882355
RAC: 1160914

Dan, RE: What I was

Message 96622 in response to message 96615

Dan,

Quote:

What I was hoping to be able to do was to have my app info assign new s5r6 WU's to the s5r6 default executable while not causing those assigned to the old s5r5 exe to crash.

I would not bother for S5R5 anymore, it's basically over. I don't think there are still WUs around for S5R5, but if there are, they can easily be collected by ATLAS or whoever.

CU
H-B

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 0

RE: Dan, RE: What I was

Message 96623 in response to message 96622

Quote:

Dan,

Quote:

What I was hoping to be able to do was to have my app info assign new s5r6 WU's to the s5r6 default executable while not causing those assigned to the old s5r5 exe to crash.

I would not bother for S5R5 anymore, it's basically over. I don't think there are still WUs around for S5R5, but if there are, they can easily be collected by ATLAS or whoever.

CU
H-B

You misunderstand. Lingering stuff in my app info (from when s5r6 was still using the old r5 app???) is sending all my s5r6 wu's to the s5r5 executable. I'm wondering if the problem is that the person who wrote the app info consuming code never considered that critical version data would be stored somewhere other than in the version field and that as a result the version_num value of a newer application might be lower than that of an older one.

Shorn of the ABP2 entries (assumed to be irrelevant) my current appinfo file is the following:

[pre]


einstein_S5R6


einstein_S5R6_3.01_windows_intelx86__S5R6sse2.exe



einstein_S5R6_3.01_graphics_windows_intelx86.exe



einstein_S5R5_3.05_windows_intelx86.exe



einstein_S5R5_3.05_windows_intelx86_0.exe



einstein_S5R5_3.05_windows_intelx86_1.exe



einstein_S5R5_3.05_windows_intelx86_2.exe



einstein_S5R5_3.05_graphics_windows_intelx86.exe


einstein_S5R6
301
6.7.0

einstein_S5R6_3.01_windows_intelx86__S5R6sse2.exe



einstein_S5R6_3.01_graphics_windows_intelx86.exe
graphics_app


einstein_S5R6
305
6.3.0

einstein_S5R5_3.05_windows_intelx86.exe



einstein_S5R5_3.05_windows_intelx86_0.exe


einstein_S5R5_3.05_windows_intelx86_1.exe


einstein_S5R5_3.05_windows_intelx86_2.exe


einstein_S5R5_3.05_graphics_windows_intelx86.exe
graphics_app

[/pre]

Everything is being crunched by the s5r5 3.05 application.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 756882355
RAC: 1160914

Upps.... I don't think this

Upps.... I don't think this is healthy, currently there is no 3.05 S5R6 app for Windows (the app version didn't change so it's x.01 on all platforms), but there could be in the future, so you don't want to have a 305 S5R6 version in your app_info.xml.

And given the fact that the ABP2 app will probably undergo some change (optimizations) in the near future, everyone's better off without app_info.xml to get the latest ABP2 apps as soon as they are rolled out.

What about suspending the other projects while the Einstein@Home cache is drained with "no noew work"?

As plan B, you could also disable network-traffic, stop BOINC, backup the whole BOINC installation, edit the client_state.xml file to change all false 305 version "brandings", remove the 305 stuff from app_info.xml and restart BOINC. See if it does the trick, if not, restore from backup ...

CU
H-B

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118246736396
RAC: 24682359

RE: Upps.... I don't think

Message 96625 in response to message 96624

Quote:
Upps.... I don't think this is healthy, currently there is no 3.05 S5R6 app for Windows (the app version didn't change so it's x.01 on all platforms), but there could be in the future, so you don't want to have a 305 S5R6 version in your app_info.xml.

Actually, he doesn't have a S5R6_3.05 app in his app_info.xml. He has a S5R5_3.05 app.

Effectively his app_info.xml says he is doing the S5R6 run but that if he has tasks branded as 301, he will use the app S5R6_3.01 (which is correct) but if he has tasks branded as 305, he will use the app S5R5_3.05. Of course, because 305 is the higher version, all R6 tasks are being crunched with the S5R5_3.05 app.

This is actually working. I've checked his host and his tasks are indeed being crunched with the former S5R5_3.05 app and they are validating. So it must be that there were no changes in the app between S5R5 and S5R6. Of course this is pure luck and a highly undesirable state of affairs.

You could attempt surgery on client_state.xml and app_info.xml to get out of this but I believe the best course of action would be not to attempt that, mainly because the editing task would be quite onerous and the chance of mistakes, quite high. Instead, if it were my host, I'd set NNT and abort all unstarted tasks. I'd wait for the currently crunching tasks to finish and be reported. I'd delete app_info.xml and then I'd stop and restart BOINC. If you didn't want to waste quite so many tasks (seeing as they are validating) you could set NNT and run the cache down for a while before doing the abort thing.

If Dan were to take copies of the currently legitimate GW (S5R6_3.01) and ABP2 apps that are in his project folder, he could check they were still there just before restarting BOINC. If they were not, he could replace them and BOINC will find them and actually skip the downloading of all the apps once again. If he's not concerned about bandwidth, he could ignore this tip and let BOINC download afresh after a project reset.

Cheers,
Gary.

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7262301849
RAC: 1562981

RE: much

Message 96626 in response to message 96623

Quote:

much snippage
[pre]
einstein_S5R6
305
6.3.0

einstein_S5R5_3.05_windows_intelx86.exe



einstein_S5R5_3.05_windows_intelx86_0.exe


einstein_S5R5_3.05_windows_intelx86_1.exe


einstein_S5R5_3.05_windows_intelx86_2.exe


einstein_S5R5_3.05_graphics_windows_intelx86.exe
graphics_app

[/pre]

Everything is being crunched by the s5r5 3.05 application.


I'm not a total control of app_info guy, but the symptom you describe is actually what this version asks for. The primary key is the app_name, and you this section routes S5R6 straight to an S5R5 executable.

Edit: I see that Gary made a more useful version of this comment while I was away typing.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 0

I am currently NNT, but I

I am currently NNT, but I normally keep a 4 day cache to ride out any network outages and on my i7 that's ~ 100 tasks remaining. Were I to attempt an abortfest now I'd end up starving my client when E@H decided to stop sending me WU's because of my problems.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 0

RE: You could attempt

Message 96628 in response to message 96625

Quote:

You could attempt surgery on client_state.xml and app_info.xml to get out of this but I believe the best course of action would be not to attempt that, mainly because the editing task would be quite onerous and the chance of mistakes, quite high. Instead, if it were my host, I'd set NNT and abort all unstarted tasks. I'd wait for the currently crunching tasks to finish and be reported. I'd delete app_info.xml and then I'd stop and restart BOINC. If you didn't want to waste quite so many tasks (seeing as they are validating) you could set NNT and run the cache down for a while before doing the abort thing.

12 minutes and 3 restores from backup to have the existing work units running on the current executable via app info.

First restore PEBKAC diagnosis of the problem

Second restore PEBKAC diagnosis of the problem

Third restore realized the binaries I'd copied over earlier had been cleaned up, also copied app_version data from second machine.

When I nuke the app_info.xml file itself all my WU's are still imploding and I'm not sure what needs thumped to fix that.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.