If you are saying that 41527980 is that last remaining workunit with no final result then I think you are absolutely correct.
This particular quorum contains several tasks in progress and even more listed as unsent. Someone (project staff) has bumped up the initial replication rather high to 11 in order to get the extra tasks into circulation. So the tasks in progress visible here would be part of the almost 9K results listed as still in the database on the server status page. There are quite a few other similar quorums that show much higher than usual for the initial replication. This was obviously done to clean up the remaining R3 work but at the cost of much unnecessary redundant crunching. There are almost 9K redundant tasks that are still being crunched with many more that have already disappeared from that list by being returned in recent days.
If you want to see many examples of this redundancy, take a look at the results lists for hosts 1037131, 1029834 and 53011 as just three hosts that have very large numbers of these redundant tasks. Undoubtedly there are many more. If you drill down into the results lists, you find many tasks still to be crunched, all bar one of which are completely unnecessary. If you drill down into recently completed tasks, the waste continues. It's not uncommon to see completed quorums with three, four, five or even six completed tasks, and maybe even an extra one or two still crunching, for good measure :-(.
It seems rather unfortunate that there has to be such waste just to get the job finished quickly. Surely there could have been a better way.
It seems rather unfortunate that there has to be such waste just to get the job finished quickly. Surely there could have been a better way.
BOINC servers have code to handle exactly this situation - it just needs to be activated in the configuration. I'm afraid I don't know the exact file/flag to use, but the effect is to cancel outstanding tasks in a user's cache once the quorum has been filled (tasks are only cancelled if computation hasn't started - tasks which have any run time to their credit are allowed to finish and claim cobblestones). Superfluous tasks are flagged up on the website as "Redundant result Cancelled by server": there has been a lot of discussion, and some negative comment, at LHC recently, but it would have been appropriate to use it in conjunction with this 'end of run clean-up'. Worth remembering for next time, even if it's a bit late now for this one.
It does seem odd that this clean-up has flushed out a significant number of machines which are still configured to run S5R3 only - the three you list are all anonymous power users. Is there a significant correlation between Power Users and Absent-Minded Users? There might even be enough to explain (some of) the decrease in active crunchers, discussed elsewhere.
So the scientists have some reason to celebrate a little after-conference party in Amsterdam, I guess, where I understand there's a joint LSC/Virgo meeting right now.
It's interesting that the approximately 8.5K tasks still in progress are now all totally redundant. I suspect that the majority of those are concentrated in relatively few (say maybe 100 or so) hosts similar to the three I previously identified. If so, they could easily be aborted and significant savings of resources could be made. These hosts could be diverted to the R4 tasks where those saved resources could be much better used. It's interesting that they seemingly haven't as yet.
If someone is paying attention, the outstanding 8.5K tasks should go to zero extremely quickly :-).
It's interesting that the approximately 8.5K tasks still in progress are now all totally redundant. I suspect that the majority of those are concentrated in relatively few (say maybe 100 or so) hosts similar to the three I previously identified. If so, they could easily be aborted and significant savings of resources could be made. These hosts could be diverted to the R4 tasks where those saved resources could be much better used. It's interesting that they seemingly haven't as yet.
If someone is paying attention, the outstanding 8.5K tasks should go to zero extremely quickly :-).
Unfortunately, even if that 'someone' has their hand on the 'Cancel redundant results' configuration switch, it still requires each host to contact the scheduler twice before the result disappears from the database pending list: once to receive the notification that the result is no longer needed, and a second to report that it has acted on the request (which it won't do, unless it's running a recent version of BOINC, and unless it isn't already running the task in question). It'll help, of course (and I see no reason why it couldn't be permanently active - we'd hardly ever notice), but I suspect there'll still be quite a few that have to wait for deadline.
As the hosts in question are probably under the control of the project (my assumption), it should be extremely easy to get rid of most of the redundant tasks in a flash.
With this project's initial replication being two (usually), it's hard to see the justification for serverside aborts. Of course it would be useful now, seeing as somebody bumped the IR to 11 at some point for those R3 resends :-).
RE: http://einstein.phys.uw
)
If you are saying that 41527980 is that last remaining workunit with no final result then I think you are absolutely correct.
This particular quorum contains several tasks in progress and even more listed as unsent. Someone (project staff) has bumped up the initial replication rather high to 11 in order to get the extra tasks into circulation. So the tasks in progress visible here would be part of the almost 9K results listed as still in the database on the server status page. There are quite a few other similar quorums that show much higher than usual for the initial replication. This was obviously done to clean up the remaining R3 work but at the cost of much unnecessary redundant crunching. There are almost 9K redundant tasks that are still being crunched with many more that have already disappeared from that list by being returned in recent days.
If you want to see many examples of this redundancy, take a look at the results lists for hosts 1037131, 1029834 and 53011 as just three hosts that have very large numbers of these redundant tasks. Undoubtedly there are many more. If you drill down into the results lists, you find many tasks still to be crunched, all bar one of which are completely unnecessary. If you drill down into recently completed tasks, the waste continues. It's not uncommon to see completed quorums with three, four, five or even six completed tasks, and maybe even an extra one or two still crunching, for good measure :-(.
It seems rather unfortunate that there has to be such waste just to get the job finished quickly. Surely there could have been a better way.
Cheers,
Gary.
RE: It seems rather
)
BOINC servers have code to handle exactly this situation - it just needs to be activated in the configuration. I'm afraid I don't know the exact file/flag to use, but the effect is to cancel outstanding tasks in a user's cache once the quorum has been filled (tasks are only cancelled if computation hasn't started - tasks which have any run time to their credit are allowed to finish and claim cobblestones). Superfluous tasks are flagged up on the website as "Redundant result Cancelled by server": there has been a lot of discussion, and some negative comment, at LHC recently, but it would have been appropriate to use it in conjunction with this 'end of run clean-up'. Worth remembering for next time, even if it's a bit late now for this one.
It does seem odd that this clean-up has flushed out a significant number of machines which are still configured to run S5R3 only - the three you list are all anonymous power users. Is there a significant correlation between Power Users and Absent-Minded Users? There might even be enough to explain (some of) the decrease in active crunchers, discussed elsewhere.
RE: 41527980 is that last
)
Not any longer, it isn't. Break out the champagne - it validated at 25 Sep 2008 14:07:03 UTC. Now where did I put that sweepstake ticket?
Great news, and a big
)
Great news, and a big milestone!
So the scientists have some reason to celebrate a little after-conference party in Amsterdam, I guess, where I understand there's a joint LSC/Virgo meeting right now.
CU
Bikeman
It's interesting that the
)
It's interesting that the approximately 8.5K tasks still in progress are now all totally redundant. I suspect that the majority of those are concentrated in relatively few (say maybe 100 or so) hosts similar to the three I previously identified. If so, they could easily be aborted and significant savings of resources could be made. These hosts could be diverted to the R4 tasks where those saved resources could be much better used. It's interesting that they seemingly haven't as yet.
If someone is paying attention, the outstanding 8.5K tasks should go to zero extremely quickly :-).
Cheers,
Gary.
RE: It's interesting that
)
Unfortunately, even if that 'someone' has their hand on the 'Cancel redundant results' configuration switch, it still requires each host to contact the scheduler twice before the result disappears from the database pending list: once to receive the notification that the result is no longer needed, and a second to report that it has acted on the request (which it won't do, unless it's running a recent version of BOINC, and unless it isn't already running the task in question). It'll help, of course (and I see no reason why it couldn't be permanently active - we'd hardly ever notice), but I suspect there'll still be quite a few that have to wait for deadline.
As the hosts in question are
)
As the hosts in question are probably under the control of the project (my assumption), it should be extremely easy to get rid of most of the redundant tasks in a flash.
With this project's initial replication being two (usually), it's hard to see the justification for serverside aborts. Of course it would be useful now, seeing as somebody bumped the IR to 11 at some point for those R3 resends :-).
Cheers,
Gary.
0 WU in database.
)
0 WU in database.