Your help is needed for a new ABP2 app release

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109392643411
RAC: 35864530
Topic 194727

Mike has posted an update from Bernd in the ABP2 sticky threads which probably needs further comment.

You have probably noticed in the last couple of days, an initial small release of ABP2 tasks which then dried up. These 'initial release' tasks showed up some problems with validation which will need an updated science application to correct. The new ABP2 application can't be released until the bulk of the initial release of tasks each have a canonical result. If the updated apps were 'out in the field' and grabbing resends of the initial release of tasks, there would be a 100% problem with validation as the new ABP2 app and the old ABP2 app are not compatible.

To speed up the process of arriving at a canonical result (ie 2 matching) the Initial Replication (IR) of just these few thousand tasks has been put up from 2 to 4 copies in the hope that this will get 2 matching results back faster. This IR change applies to ABP2 only. There is no effect on ABP1 or the normal GW tasks. The observant among you may have noticed more ABP2 tasks arriving several hours ago as the extra two copies of each task were distributed.

The problem with this technique is that if you get any of these extra copies of tasks, they will be at the bottom of your cache and may not be started for quite a while.

So here is where you can make a real difference. On each host you have attached to the project, do a check to see if you have any ABP2 tasks in your task list in BOINC Manager. If you do, you need to get these tasks crunched and returned ASAP. An easy way to do this is to 'suspend' any E@H tasks ahead of it in your cache of work. You can 'click' to highlight each task ahead of it and can use the 'suspend' button to temporarily suspend that task. Once you have suspended all tasks ahead of the ABP2 task, you can then 'suspend' the running task and computation should transfer to the ABP2 task immediately.

If you have multiple ABP2 tasks you can leave the others suspended until crunching has started on the last ABP2 task of the group. At that point you can 'unsuspend' all suspended tasks and when the ABP2 task finishes, crunching of tasks will return to the normal sequence.

If you understand the above, your assistance in getting the ABP2 tasks back quickly would be appreciated. If you don't understand what I'm talking about, please take no action in case you inadvertently do the wrong thing. Please note, I'm not talking about aborting tasks, just rearranging the order in which they are crunched. Suspending and resuming tasks is quite a safe thing to do.

Cheers,
Gary.

Elphidieus
Elphidieus
Joined: 20 Feb 05
Posts: 245
Credit: 20603702
RAC: 0

Your help is needed for a new ABP2 app release

So I assume this validation problem affects all platforms...?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109392643411
RAC: 35864530

RE: So I assume this

Message 96587 in response to message 96586

Quote:
So I assume this validation problem affects all platforms...?


I guess so - since there are new apps for Win, Linux and Mac ready to go once the current tasks are cleaned up.

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752761280
RAC: 1449461

Only got one (WU 65782429),

Only got one (WU 65782429), but it's bumped and running.

Unfortunately, I'm Windows and the only one back so far is Linux, so it isn't going to help immediately....

Reading between the lines of all the new issues recently, the underlying BOINC server code must have been updated a lot to handle all the plan_class stuff. At some point when the dust has settled, would you give a moment's consideration to updating the web display code too, please? Newer versions are much more informative about applications, BOINC versions, GPUs etc., and provide useful tools for filtering task lists.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109392643411
RAC: 35864530

RE: Only got one (WU

Message 96589 in response to message 96588

Quote:

Only got one (WU 65782429), but it's bumped and running.

Unfortunately, I'm Windows and the only one back so far is Linux, so it isn't going to help immediately....


It's amazing how assumptions can be wrong :-).

That Linux one was one of my hosts and between us we have created a canonical result and we did reduce the list of outstanding quorums by one ... So you see, the different OS's don't always produce an invalid result :-).

In the last 18 hours, I've submitted around 600 ABP2 results and seen quite a smattering of invalids amongst them - perhaps 5-10%. I'll really be pleased when the last of these are put to bed and the new apps can be rolled out.

Cheers,
Gary.

Svenie25
Svenie25
Joined: 21 Mar 05
Posts: 139
Credit: 2436862
RAC: 0

RE: RE: Only got one (WU

Message 96590 in response to message 96589

Quote:
Quote:

Only got one (WU 65782429), but it's bumped and running.

Unfortunately, I'm Windows and the only one back so far is Linux, so it isn't going to help immediately....


It's amazing how assumptions can be wrong :-).

That Linux one was one of my hosts and between us we have created a canonical result and we did reduce the list of outstanding quorums by one ... So you see, the different OS's don't always produce an invalid result :-).

In the last 18 hours, I've submitted around 600 ABP2 results and seen quite a smattering of invalids amongst them - perhaps 5-10%. I'll really be pleased when the last of these are put to bed and the new apps can be rolled out.

It looks like the new apps are installed already: http://einstein.phys.uwm.edu/apps.php

I have 5 ABP2 3.03 left in the cache. Around 2 hours to go.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752761280
RAC: 1449461

RE: RE: Only got one (WU

Message 96591 in response to message 96589

Quote:
Quote:

Only got one (WU 65782429), but it's bumped and running.

Unfortunately, I'm Windows and the only one back so far is Linux, so it isn't going to help immediately....


It's amazing how assumptions can be wrong :-).

That Linux one was one of my hosts and between us we have created a canonical result and we did reduce the list of outstanding quorums by one ... So you see, the different OS's don't always produce an invalid result :-).

In the last 18 hours, I've submitted around 600 ABP2 results and seen quite a smattering of invalids amongst them - perhaps 5-10%. I'll really be pleased when the last of these are put to bed and the new apps can be rolled out.


Buoyed up by that success, I fast-tracked another three I've just been issued....

....only to look more closely, and find that they were quorum 2, version 3.06

Then I saw that version 3.07 had been installed on the applications page about two and a half hours ago. Anything we ought to be doing about this?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109392643411
RAC: 35864530

RE: Buoyed up by that

Message 96592 in response to message 96591

Quote:


Buoyed up by that success, I fast-tracked another three I've just been issued....

....only to look more closely, and find that they were quorum 2, version 3.06


No need to fast-track those as the X.06 is the new CPU app and the X.07 is the new CUDA version. The cleanup of the X.02/X.03 tasks must have been deemed 'good enough' for the new apps to be rolled out.

Quote:
Then I saw that version 3.07 had been installed on the applications page about two and a half hours ago. Anything we ought to be doing about this?


No, just allow things to return to normal. The apps page only shows the single physically highest version number for each platform so you need to interpret X.07 as referring to the X.06/X.07 combination of CPU/GPU apps.

So, thanks to all who may have helped with the cleanup. I would imagine there will still be a few of the X.02/X.03 tasks without a completed quorum (ie no canonical result yet) still out there. Since they will have an Initial Replication (IR) of 5, they shouldn't last long. If any such tasks actually fail, the scheduler will create a further copy. It wont create a further copy if a canonical result exists. If any new copies actually get crunched by the new app, they will crunch OK but will fail validation. Hopefully the IR=5 trick will have put enough copies already out in the field so that a canonical result can be achieved quickly without any need for this new copy.

You could protect yourself against wasted effort by making sure any new ABP2 tasks sent to you are the IR=2 type rather than IR=5 which is the 'signature' of the 'old app' tasks still floating around. You can check this 'signature' by going to the website and clicking on the WUID of these new ABP2 tasks. If you do see an 'IR=5' quorum with other tasks as well as yours, just suspend your task and wait to see if one the other tasks comes back to complete the quorum and produce a canonical result. Then you could safely abort yours. If you abort without checking, you just throw the problem onto some other poor bugger. That's why I'm religiously completing and sending back every last one of these that I've got on my hosts.

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752761280
RAC: 1449461

RE: No, just allow things

Message 96593 in response to message 96592

Quote:
No, just allow things to return to normal. The apps page only shows the single physically highest version number for each platform so you need to interpret X.07 as referring to the X.06/X.07 combination of CPU/GPU apps.


Thus reinforcing my plea for a web display code update, that would distinguish between then by plan_class and show all active applications.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109392643411
RAC: 35864530

RE: RE: No, just allow

Message 96594 in response to message 96593

Quote:
Quote:
No, just allow things to return to normal. The apps page only shows the single physically highest version number for each platform so you need to interpret X.07 as referring to the X.06/X.07 combination of CPU/GPU apps.

Thus reinforcing my plea for a web display code update, that would distinguish between then by plan_class and show all active applications.


Normally, I'm very slow on the uptake as far as server-side BOINC nitty gritty is concerned. Also for newish client-side BOINC stuff - you'll recall having to correct my ramblings at various times because I was resisting moving on from the old versions I was happy with and staying right away from the bleeding edge and so I simply wasn't aware of the recent changes.

Continually being corrected by you served as sufficient stimulus for me to start lurking on alpha and when I acquired a few ATI 4850s, that soon forced me to jump on the 6.10.x BOINC bleeding edge. So, hopefully, my knowledge of recent BOINC developments has improved a smidgen :-).

So what's all this got to do with the applications page you may well ask? It so happens that in my new found drive to be better informed, I had been paying attention to several of your postings where you had commented on the 'difficulties' for the average user in keeping track of new versions and version numbers. I had actually sent an email to Bernd quite recently seeking better information to be included on the apps page so that casual readers might better understand the full range of apps available. One of the snippets of information that Bernd sent back to me was the fact that only the highest version number of an app group could be displayed due to limitations in the code version being used. Also, it's unlikely to get fixed by upgrading as there are alternative plans being worked on at the moment. I'm not sure if I'm at liberty to say any more than that. I'm sure Bernd will if he wants to.

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752761280
RAC: 1449461

OK, that's fine. Point made

Message 96595 in response to message 96594

OK, that's fine. Point made and noted - all I can ask for. We await developments with bated breath!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.