S5R3 countdown thread

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5883

Credit: 119038688194

RAC: 24781146

RE: http://einstein.phys.uw

25 Sep 2008 5:34:48 UTC

Message 85643 in response to message 85642

(moderation:

)

Quote:

http://einsteinathome.org/workunit/41527980 -- This WU was.

If you are saying that 41527980 is that last remaining workunit with no final result then I think you are absolutely correct.

This particular quorum contains several tasks in progress and even more listed as unsent. Someone (project staff) has bumped up the initial replication rather high to 11 in order to get the extra tasks into circulation. So the tasks in progress visible here would be part of the almost 9K results listed as still in the database on the server status page. There are quite a few other similar quorums that show much higher than usual for the initial replication. This was obviously done to clean up the remaining R3 work but at the cost of much unnecessary redundant crunching. There are almost 9K redundant tasks that are still being crunched with many more that have already disappeared from that list by being returned in recent days.

If you want to see many examples of this redundancy, take a look at the results lists for hosts 1037131, 1029834 and 53011 as just three hosts that have very large numbers of these redundant tasks. Undoubtedly there are many more. If you drill down into the results lists, you find many tasks still to be crunched, all bar one of which are completely unnecessary. If you drill down into recently completed tasks, the waste continues. It's not uncommon to see completed quorums with three, four, five or even six completed tasks, and maybe even an extra one or two still crunching, for good measure :-(.

It seems rather unfortunate that there has to be such waste just to get the job finished quickly. Surely there could have been a better way.

Cheers,
Gary.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 3001418606

RAC: 699984

RE: It seems rather

25 Sep 2008 9:08:13 UTC

Message 85644 in response to message 85643

(moderation:

)

Quote:

It seems rather unfortunate that there has to be such waste just to get the job finished quickly. Surely there could have been a better way.

BOINC servers have code to handle exactly this situation - it just needs to be activated in the configuration. I'm afraid I don't know the exact file/flag to use, but the effect is to cancel outstanding tasks in a user's cache once the quorum has been filled (tasks are only cancelled if computation hasn't started - tasks which have any run time to their credit are allowed to finish and claim cobblestones). Superfluous tasks are flagged up on the website as "Redundant result Cancelled by server": there has been a lot of discussion, and some negative comment, at LHC recently, but it would have been appropriate to use it in conjunction with this 'end of run clean-up'. Worth remembering for next time, even if it's a bit late now for this one.

It does seem odd that this clean-up has flushed out a significant number of machines which are still configured to run S5R3 only - the three you list are all anonymous power users. Is there a significant correlation between Power Users and Absent-Minded Users? There might even be enough to explain (some of) the decrease in active crunchers, discussed elsewhere.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 3001418606

RAC: 699984

RE: 41527980 is that last

25 Sep 2008 14:13:02 UTC

Message 85645 in response to message 85643

(moderation:

)

Quote:

41527980 is that last remaining workunit with no final result.

Not any longer, it isn't. Break out the champagne - it validated at 25 Sep 2008 14:07:03 UTC. Now where did I put that sweepstake ticket?

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 798373749

RAC: 1199291

Great news, and a big

25 Sep 2008 16:49:24 UTC

Message 85646

(moderation:

)

Great news, and a big milestone!

So the scientists have some reason to celebrate a little after-conference party in Amsterdam, I guess, where I understand there's a joint LSC/Virgo meeting right now.

CU
Bikeman

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5883

Credit: 119038688194

RAC: 24781146

It's interesting that the

25 Sep 2008 20:33:25 UTC

Message 85647

(moderation:

)

It's interesting that the approximately 8.5K tasks still in progress are now all totally redundant. I suspect that the majority of those are concentrated in relatively few (say maybe 100 or so) hosts similar to the three I previously identified. If so, they could easily be aborted and significant savings of resources could be made. These hosts could be diverted to the R4 tasks where those saved resources could be much better used. It's interesting that they seemingly haven't as yet.

If someone is paying attention, the outstanding 8.5K tasks should go to zero extremely quickly :-).

Cheers,
Gary.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 3001418606

RAC: 699984

RE: It's interesting that

25 Sep 2008 20:41:12 UTC

Message 85648 in response to message 85647

(moderation:

)

Quote:

It's interesting that the approximately 8.5K tasks still in progress are now all totally redundant. I suspect that the majority of those are concentrated in relatively few (say maybe 100 or so) hosts similar to the three I previously identified. If so, they could easily be aborted and significant savings of resources could be made. These hosts could be diverted to the R4 tasks where those saved resources could be much better used. It's interesting that they seemingly haven't as yet.

If someone is paying attention, the outstanding 8.5K tasks should go to zero extremely quickly :-).

Unfortunately, even if that 'someone' has their hand on the 'Cancel redundant results' configuration switch, it still requires each host to contact the scheduler twice before the result disappears from the database pending list: once to receive the notification that the result is no longer needed, and a second to report that it has acted on the request (which it won't do, unless it's running a recent version of BOINC, and unless it isn't already running the task in question). It'll help, of course (and I see no reason why it couldn't be permanently active - we'd hardly ever notice), but I suspect there'll still be quite a few that have to wait for deadline.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5883

Credit: 119038688194

RAC: 24781146

As the hosts in question are

25 Sep 2008 21:07:08 UTC

Message 85649

(moderation:

)

As the hosts in question are probably under the control of the project (my assumption), it should be extremely easy to get rid of most of the redundant tasks in a flash.

With this project's initial replication being two (usually), it's hard to see the justification for serverside aborts. Of course it would be useful now, seeing as somebody bumped the IR to 11 at some point for those R3 resends :-).

Cheers,
Gary.

Mats Nilsson

Joined: 10 Dec 05

Posts: 94

Credit: 15011147

RAC: 0

0 WU in database.

25 Oct 2008 14:10:42 UTC

Message 85650

(moderation:

)

0 WU in database.

S5R3 countdown thread

Forums › Cafe Einstein

Comment viewing options

Forums › Cafe Einstein