Albert: Predicted/Actual crunch times

Winterknight
Winterknight
Joined: 4 Jun 05
Posts: 480
Credit: 79,757,569
RAC: 68,620
Topic 190519

Due to a large -ve LTD for Einstein, my computer didn't download any Albert units until the the 1/1/2006, and then some more on the 2nd. I saw that I had at least four different length type of units from the predicted processing times. I didn't make any notes on the predicted times.

Doing Einstein units my crunch time was 27 ksecs. After the first unit had completed the predicted times jumped alarmingly and suddenly instead of approx 100 hrs of work I had a prediction of over 150 hrs of work. No problem I thought it isn't bad enough to miss the deadline, I'll just keep an eye on things to see how it goes.

The shorter units were received and crunched first, but the predicted problem units were the 12 longest units. The pedicted time for these had gone from about 7 hrs to nearly 11 hrs, a quick check on my DCF showed that it was now greater than 2.

Now, the first of the longest units has been processed, in 17,278 sec, under 5 hrs.

The number of units that a host can download is based on the 'faulty' benchmark, adjusted by the DCF, during this initial phase of Albert units though my DCF has gone from just over 1 to over 2, and according to JM7, will adjust to an approximately correct figure after processing 10 units. I have 12 of the longer units, which are going to process about 30% faster than first predicted, and about 60% faster than the 'adjusted' predicted time. So my DCF should return to about 1, maybe.

If the DCF is going to vary so much, then a host is not going to download the correct amount of work next time it connects. It could be half, or twice that required.

This I fear will not please the masses, if one, the host is on a modem and needs to make twice as many calls because the units only take half the predicted time to process, or two, if a host has a 7+ day cache and suddenly the crunch time is doubled then deadlines will not be met, or as recommended crunches for more than one project, and the host is put into 'panic' mode to finish Einstein (Albert units) at the expense of the other projects. Also because the DCF is adjusted upwards immediately, downwards slowly, it can also put a host into EDF, on the 2 * connect rule, and therefore enforce no downloads.

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5,385,205
RAC: 0

Albert: Predicted/Actual crunch times

I think they are still learning how to specify the time.

On the WIndows machines I see a variation between about 5-12 hours so far.

On the PowerMac I see, so far, 50 minute or 2:30 work units.

The r1_0190.0 work seems to be consistent about 50 minutes, and This looks to be the second series of these work units I have gotten.

The r1_0806.0 work seems to be 2 1/2 hours ...

In this case, if this continues, I think the quota for faster PowerMacs with high resource share are in jeopardy of running out of work by running into the max. In particular, if the one hour work keeps being issued, more than 26 of those could be done in a day ...

I mean, I love the idea of doing that much work ... but, it would be nice to have no chance of runnng "dry" ...

gravywavy
gravywavy
Joined: 22 Jan 05
Posts: 392
Credit: 68,962
RAC: 0

Paul is right that the

Paul is right that the project team, having created these new WU for good scientific reasons, are only now learning how to adjust all the computing parameters to handle them properly.

However, Winternight makes a separate valid point. The two failure points for many users are deadline exceeded and cache insufficient.

If on a given machine the WU vary exactly in line with the estimate, no problem.

On the other hand if there is variation with some going (say) 3x estimate and others 1x, then the deadlines need to be extended. That is the only way to accomodate a user who has a run of jobs which run short followed by a cacheful that run long.

I don't think the project can shorten the 10-day limit on a cache, but it could certainly advise people not to go over (say) an 8 day cache. If the uncertainty on the length of the wu is 2:1 then on an 8 day advised max cache, a 16day deadline would be appropriate; if the uncertainty is 3:1 then the same max advised cache of 8 days needs a 24 day deadline, and so on.

This creates a trade-off - the deadline wants to be as long as really needed but no longer than that, or it will add unnecessary delay to wu where one result gets lost for any reason. I'd hope to see some advisory limits on cache size followed by a gradual, one day at a time, increase in deadline till these complaints stopped.

River~~

~~gravywavy

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.