S5R3 countdown thread

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5883
Credit: 119038688194
RAC: 24781146

RE: http://einstein.phys.uw

Message 85643 in response to message 85642

If you are saying that 41527980 is that last remaining workunit with no final result then I think you are absolutely correct.

This particular quorum contains several tasks in progress and even more listed as unsent. Someone (project staff) has bumped up the initial replication rather high to 11 in order to get the extra tasks into circulation. So the tasks in progress visible here would be part of the almost 9K results listed as still in the database on the server status page. There are quite a few other similar quorums that show much higher than usual for the initial replication. This was obviously done to clean up the remaining R3 work but at the cost of much unnecessary redundant crunching. There are almost 9K redundant tasks that are still being crunched with many more that have already disappeared from that list by being returned in recent days.

If you want to see many examples of this redundancy, take a look at the results lists for hosts 1037131, 1029834 and 53011 as just three hosts that have very large numbers of these redundant tasks. Undoubtedly there are many more. If you drill down into the results lists, you find many tasks still to be crunched, all bar one of which are completely unnecessary. If you drill down into recently completed tasks, the waste continues. It's not uncommon to see completed quorums with three, four, five or even six completed tasks, and maybe even an extra one or two still crunching, for good measure :-(.

It seems rather unfortunate that there has to be such waste just to get the job finished quickly. Surely there could have been a better way.

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3001418606
RAC: 699984

RE: It seems rather

Message 85644 in response to message 85643

Quote:
It seems rather unfortunate that there has to be such waste just to get the job finished quickly. Surely there could have been a better way.


BOINC servers have code to handle exactly this situation - it just needs to be activated in the configuration. I'm afraid I don't know the exact file/flag to use, but the effect is to cancel outstanding tasks in a user's cache once the quorum has been filled (tasks are only cancelled if computation hasn't started - tasks which have any run time to their credit are allowed to finish and claim cobblestones). Superfluous tasks are flagged up on the website as "Redundant result Cancelled by server": there has been a lot of discussion, and some negative comment, at LHC recently, but it would have been appropriate to use it in conjunction with this 'end of run clean-up'. Worth remembering for next time, even if it's a bit late now for this one.

It does seem odd that this clean-up has flushed out a significant number of machines which are still configured to run S5R3 only - the three you list are all anonymous power users. Is there a significant correlation between Power Users and Absent-Minded Users? There might even be enough to explain (some of) the decrease in active crunchers, discussed elsewhere.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3001418606
RAC: 699984

RE: 41527980 is that last

Message 85645 in response to message 85643

Quote:
41527980 is that last remaining workunit with no final result.


Not any longer, it isn't. Break out the champagne - it validated at 25 Sep 2008 14:07:03 UTC. Now where did I put that sweepstake ticket?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 798373749
RAC: 1199291

Great news, and a big

Great news, and a big milestone!

So the scientists have some reason to celebrate a little after-conference party in Amsterdam, I guess, where I understand there's a joint LSC/Virgo meeting right now.

CU
Bikeman

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5883
Credit: 119038688194
RAC: 24781146

It's interesting that the

It's interesting that the approximately 8.5K tasks still in progress are now all totally redundant. I suspect that the majority of those are concentrated in relatively few (say maybe 100 or so) hosts similar to the three I previously identified. If so, they could easily be aborted and significant savings of resources could be made. These hosts could be diverted to the R4 tasks where those saved resources could be much better used. It's interesting that they seemingly haven't as yet.

If someone is paying attention, the outstanding 8.5K tasks should go to zero extremely quickly :-).

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3001418606
RAC: 699984

RE: It's interesting that

Message 85648 in response to message 85647

Quote:

It's interesting that the approximately 8.5K tasks still in progress are now all totally redundant. I suspect that the majority of those are concentrated in relatively few (say maybe 100 or so) hosts similar to the three I previously identified. If so, they could easily be aborted and significant savings of resources could be made. These hosts could be diverted to the R4 tasks where those saved resources could be much better used. It's interesting that they seemingly haven't as yet.

If someone is paying attention, the outstanding 8.5K tasks should go to zero extremely quickly :-).


Unfortunately, even if that 'someone' has their hand on the 'Cancel redundant results' configuration switch, it still requires each host to contact the scheduler twice before the result disappears from the database pending list: once to receive the notification that the result is no longer needed, and a second to report that it has acted on the request (which it won't do, unless it's running a recent version of BOINC, and unless it isn't already running the task in question). It'll help, of course (and I see no reason why it couldn't be permanently active - we'd hardly ever notice), but I suspect there'll still be quite a few that have to wait for deadline.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5883
Credit: 119038688194
RAC: 24781146

As the hosts in question are

As the hosts in question are probably under the control of the project (my assumption), it should be extremely easy to get rid of most of the redundant tasks in a flash.

With this project's initial replication being two (usually), it's hard to see the justification for serverside aborts. Of course it would be useful now, seeing as somebody bumped the IR to 11 at some point for those R3 resends :-).

Cheers,
Gary.

Mats Nilsson
Mats Nilsson
Joined: 10 Dec 05
Posts: 94
Credit: 15011147
RAC: 0

0 WU in database.

0 WU in database.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.