Task duration

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 314283196

RAC: 264881

RE: Firstly a bit of

6 Aug 2008 10:48:12 UTC

Message 83565 in response to message 83564

(moderation:

)

Quote:

Firstly a bit of history. The estimate of crunch time is set by the work unit generator (WUG). Back at the start of S5R3, the apps were rather slower than they were at the end. The WUG estimate was OK at the start of S5R3 but towards the end, the tasks were being done much faster than the WUG estimate. BOINC handles this by lowering the duration correction factor (DCF) stored in your host's state file (client_state.xml). BOINC knows nothing about the efficacy of the WUG estimate. It simply reacts to reality as it perceives it over time. BOINC learns to correct the now incorrect estimate built into the WUG by lowering the DCF. At the end of S5R3, it would not be uncommon for the DCF to be 0.25 or less just to cope with the old WUG estimate which is now hopelessly too long for reality.

Enter S5R4, with a new WUG providing new and updated estimates. Your BOINC client cannot possibly know in advance about the changes in the new WUG so until it knows better (by relearning) it will assume that a DCF of 0.25 is still OK. On the other hand, the new WUG now knows a much better estimate of the real time so a new task now contains a realistic estimate of the true time but BOINC then applies the existing correction of say 0.25 and comes up with its own estimate that is 4 times shorter than it should be. When the first new task is completed, BOINC will get a very sudden wakeup call and will (in one big hit) change the DCF from 0.25 to perhaps a lot closer to the "ideal" value of 1.0.

Until BOINC has a chance to do this, the only real problem is that you will likely get a whole bunch of new work - more than your machine can handle - until the new DCF takes effect. This will only be a problem if you have left your cache size at something like 10 days. In that case you might end up with about 40 days of actual work.

Is this a big deal? Well actually, no it's not! All you have to do is firstly return your cache settings to something a bit more realistic, say 1 - 2 days at the most and then simply abort whatever tasks you feel are in excess of what your machine can comfortably handle. Immediately after you abort the excess, hit the "update" function so that the server can immediately be notified and then it will simply resend these excess tasks to someone else. End of problem. Very little bandwidth has been wasted because a task is just a very small set of parameters that tell the science app how to crunch the data that is already on your machine. Aborting tasks is NOT aborting large data files.

Is there a way to avoid all this? No, not really. Whenever the WUG changes, BOINC can't know in advance if the new crunch time estimates are good bad or indifferent. It will always have to relearn the new reality. We can assist by not leaving machines with overly large caches.

Well done Gary, that should go to the Wiki! :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 719549757

RAC: 1068875

Indeed! See also Bernd's

6 Aug 2008 17:24:21 UTC

Message 83566

(moderation:

)

Indeed!

See also Bernd's post here on (among other things), plans to fix the misprediction problem.

CU
Bikeman

Task duration

Forums › Problems and Bug Reports

RE: Firstly a bit of

Indeed! See also Bernd's

Comment viewing options

Forums › Problems and Bug Reports