v4.45, EDF and dual cpu

gravywavy
gravywavy
Joined: 22 Jan 05
Posts: 392
Credit: 68962
RAC: 0
Topic 189532

This client tries to avoid returning a wu late when you run mixed projects - it does this by checking from time to time if the computer is likely to miss a deadline.

When it detects a deadline problem it goes into EDF mode (earliest deadline first) and then results are processed in the order they are due.

I think there is a problem in the code when there is more than 1 cpu. I recently had 3 Einstein WU all with the same date/time deadline. The client waited till it was too late to run all of them before going over to EDF mode. One of two things may have gone wrong:

- the client may still be believeing the Einstein predictions of run time for a wu; this box predicts 18:30 but actually takes almost 25:00 to complete a wu. The overshoot was consistent with this discrepancy

- the client in assessing whether there is a deadline problem may not take into account the fact that the time cannot be split evenly between two cpu if there are an odd number of wu. Here there were three wu, total run time 75 hrs, but with 2 cpu it takes 50 hours to run all three, not 37.5! This is because for the first 25 hr both cpu are running Einstein, but for the second 25hr only one cpu is running it. It actually clicked over to EDF about 42 hours before the deadline, so again this explanation is consistent with the observation.

In fact I was able to rescue the situation by manually suspending one of the results half way through, and unsuspending it 12 hrs later.

Unless the client is clever enough to do this, it would seem better to build into the deadline spotter the ability to predict how the wu are likely to be 'chunked' between cpu by adding up N different sets of predicted runtime working forward from present on the basis of EDF allocation of wu to processors as they become free. This is , of course, harder than simply adding up the hours estimated runtime.

A tempting cludge is to identify the result with the longest run time and add it in twice - but you'd then get into edf unnecessarily if there was a cpdn wu around.

By the way, this is not a complaint: in most cases v4.45 handles things well, certainly far better than 4.19, say.

~~gravywavy

Blank Reg
Blank Reg
Joined: 18 Jan 05
Posts: 228
Credit: 40599
RAC: 0

v4.45, EDF and dual cpu

After looking at your list of boxes I would say you need to scale back your connect to time, cause you at pushing the limit, I do not keep that many on hand for my P4 3.0 hts or my 840ee dual ht.....

gravywavy
gravywavy
Joined: 22 Jan 05
Posts: 392
Credit: 68962
RAC: 0

RE: After looking at your

Message 14055 in response to message 14054

Quote:
After looking at your list of boxes I would say you need to scale back your connect to time, cause you at pushing the limit, I do not keep that many on hand for my P4 3.0 hts or my 840ee dual ht.....

absolutely so... and well spotted!

I don't usually run with such a high 'connect every' setting.

I pushed up the number of wu held to check I'd seen what I thought I'd seen, then it turned up again on another box anyway. The setting has already been turned down again.

I've had to abort 5 results on one box and 1 result on another in order to prevent wasted crunching, so apols to anyone who has been adversely affected; it's just that I don't like to report something I've not been able to repeat.

~~gravywavy

John McLeod VII
John McLeod VII
Moderator
Joined: 10 Nov 04
Posts: 547
Credit: 632255
RAC: 0

Yes there was a bug with that

Yes there was a bug with that code, it did not take into account multiple CPUs correctly. This has been taken care of in the latest Alpha release (4.70).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.