Scheduler Bug with work requests for O2MD1 work - ### Staff Please Read ###

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,049
Credit: 224,634,206
RAC: 22,778

I got a few reports elsewhere

I got a few reports elsewhere from some people that their issues with "lost tasks" have been resolved in the meantime. I lost track, is there still anyone left with that kind of trouble, i.e. doesn't find the tasks it gets resent in his local (client's) task list?

The "resend lost tasks" feature is pretty old, and I'm not sure it's really worth the trouble it's causing occasionally.

The original idea was to 1. keep the database small and healthy by avoiding to produce unnecessary new tasks, and 2. to keep the download volume of the clients low, hoping that the data files for the "lost tasks" may still be there. The limits of DB performance and download bandwidth, however, had been stretched quite a bit over the years.

OTOH there have been added features to BOINC that this code isn't aware of and thus aren't handled very well. 1. The system only recognizes the tasks, not the application version (or type CPU/GPU) it was originally assigned to. In fact the whole scheduler doesn't record the application version a task is to be run with, it just records what the client reports when the task has been run and reported. 2. The handling is intransparent and counter-intuitive to uninitiated users. Actually it's pretty hard to get rid of tasks that once had been assigned to you, setting "no new tasks" doesn't help, nor resetting the project or even detaching and re-attaching (the only thing that helps is to accept, abort and then report the tasks).

I really wonder whether we should continue to scan for "lost tasks" and resend these, or should just drop this feature.

BM

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,542
Credit: 76,850,142,365
RAC: 65,212,086

If I have any vote in this, I

If I have any vote in this, I would be very strongly against losing this feature.  The only time problems show up is if people attempt to get rid of tasks other than by the correct method of aborting them and reporting the results in the proper manner.

The recent examples are more to do with lost GPU tasks being replaced with CPU tasks so that what could have easily been crunched by a GPU becomes impossible to crunch by CPU because of the large extra crunch time needed.  If people get caught, they can easily abort the excess.

This problem will disappear when the test status is eventually removed from the GPU app.  Now that we understand what happens with test GPU tasks, it won't be an ongoing problem anyway.  People who sign up for test GPU tasks will know about the behaviour for future reference.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.