Under Estimating Wu Completion Times

The Gas Giant
The Gas Giant
Joined: 18 Jan 05
Posts: 72
Credit: 3,109,569
RAC: 0
Topic 189274

Why is it that Einstein underestimates the wu completion time so badly? On my 3.2GHz HT machine it estimates around 7hrs but it takes more like 11hrs. Whereas Predictor estimates 1nr45min and takes nearer 1hr. Because of the Einstein anomaly (oh I like that) BOINC downloads more work than it really needs for Einstein and then it gets it's knickers in a knot over deadlines etc. Severely overestimating completion times like predictor does is another problem.

Live long and crunch!

Paul

Mike
Mike
Joined: 20 Feb 05
Posts: 151
Credit: 5,536,135
RAC: 0

Under Estimating Wu Completion Times

Hi

I could see you are useing Boinc 4.44 so what i do with longrunners is give a amount of units a week.
I allways download two units and switch no new work on projects tab.
The client surely goes in panic mode because the deadline is closer.
The results will be finnished in time and you can switch allow new work when you want to.
So you will have better control of it.

greetz and respectful from Germany to Australia
Mike

gravywavy
gravywavy
Joined: 22 Jan 05
Posts: 392
Credit: 68,962
RAC: 0

Why is it that Einstein

Why is it that Einstein underestimates the wu completion time so badly? On my 3.2GHz HT machine it estimates around 7hrs but it takes more like 11hrs. [...]

My theory is that this is due to the fact that FFTs are hard to pipeline and you have a pipelined processor (all processors are pipelined for floating point, these days).

The benchmark program is not an FFT, so the pipeline runs as Intel intended. The real code repeatedly stalls the pipeline, so that your processor runs slower than it did when benchmarked.

The pipeline is how the processor can start one floating point op before it has finished the previous. If your benchmark speed for Floating Point is faster than for Intgers, or even if the speeds are equal, then you have a pipelined processor. Without a pipe, expect to see Floats run more slowly than integers.

If you stall the pipe, it means that the code cannot figure out which cal is needed next, or that it took a likely guess but got it wrong. Either way the Floating Point speed drops to about half the integer speed for that one calc. If it happened every time you'd knock 75% or so off the spped of your processor.

My theory is based on two things: people with newer, more heavily pipelined chips see a bigger effect: your WU take over 1.5 x estimated, mine on a years old 700MHz Pentium take 1.3 x estimated; consistent with the fact that your pentium has a longer pipeline than mine.

Secondly, that E@H contains a lot of FFTs and these are really hard work for any pipelined processor. I said more about this issue here, in another thread but that posting maybe more technical an answer than you were hoping for.

Finally, I hope you are getting two wu done in the 11 hours you quote? My 700MHz box takes almost exactly 24hrs for a wu, and if you are getting only 2.2 wu per day something funny is happening; whereas if you are running the HT as a dual processor you'd be getting about 4.4x my throughput, which is what I'd expect.

~~gravywavy

jerry
jerry
Joined: 20 Apr 05
Posts: 6
Credit: 35,784
RAC: 0

I have a 2 gig celeron and am

I have a 2 gig celeron and am having the same trouble with the estimator for the projects I run I have a feeling that the time given is if wwe were in the next room from the server.

Nightbird
Nightbird
Joined: 17 Feb 05
Posts: 79
Credit: 561,723
RAC: 0

Perhaps an idea : don't use a

Perhaps an idea : don't use a cache too large

[

The Gas Giant
The Gas Giant
Joined: 18 Jan 05
Posts: 72
Credit: 3,109,569
RAC: 0

It's nothing to do with

It's nothing to do with deadlines persee, but an issue with BOINC going into deadline mode (even though there is no chance with hitting the deadline). Since it then crunches Einstein in deadline mode it's LT debt gets too +ve and then it doesn't download any more work until it's LT debt gets lots -ve and we start going into limit cycling of the projects.

People with a background in process control technology (PID loops etc) would be aghast at the situation that BOINC gets itself into. Limit cycling where it goes from one extreme to the other is called "out of control" and is only something that novices think is something that works.

Live long and crunch!

Paul.

ps. I don't think a cache of 4 days work is too large...does anyone? I know BOINC does!

Paul

Nightbird
Nightbird
Joined: 17 Feb 05
Posts: 79
Credit: 561,723
RAC: 0

"ps. I don't think a cache of

"ps. I don't think a cache of 4 days work is too large...does anyone? "
I don't use cache of 4 days.

[

Divide Overflow
Divide Overflow
Joined: 9 Feb 05
Posts: 91
Credit: 183,220
RAC: 0

I don't think a cache of 4

Message 12092 in response to message 12090

I don't think a cache of 4 days work is too large...does anyone?

You know your cache size is too large when you start missing WU deadlines! The rest is entirely subjective. :)

Celtic Wolf
Celtic Wolf
Joined: 9 Feb 05
Posts: 34
Credit: 18,196
RAC: 0

There is a known issue with

There is a known issue with 4.4x of BOINC not computing deadlines correctly.

It seems that Einstein WU's compute faster in the beginning then they do near the end or some such thing. I am waiting to see what BOINC does when my Climate WU gets near it's end... I'll let you know 4 months from now what happened...

This is supposed to be fixed in the highly anticipated 4.45 release.. Also if you are crunching Protein WU's and have a Protein version greater them 4.28 you need to abort them because versions above 4.28 are broken. The will appear to be working but are stalled. This puts the LTD for SETI and Einstein into a bad state.

After I aborted mine I had to manually edit client_state.xml and zero out all the "debt" values to get any new WU's from Einstein or SETI.


The Gas Giant
The Gas Giant
Joined: 18 Jan 05
Posts: 72
Credit: 3,109,569
RAC: 0

I don't think a cache of 4

Message 12094 in response to message 12092

I don't think a cache of 4 days work is too large...does anyone?

You know your cache size is too large when you start missing WU deadlines! The rest is entirely subjective. :)

Never said I ever missed a deadline......I just feel that the severe over estimation of the wu completion times causes BOINC to prematurely have kittens over potential deadline issues, when in reality there are none!

Paul

Grimm
Grimm
Joined: 22 Jan 05
Posts: 40
Credit: 231,537,603
RAC: 89,187

This has been an issue since

Message 12095 in response to message 12094

This has been an issue since the beta test of Einstein that has never been addressed. The time estimates are even worse on "older" PIII machines. I have tried running Einstein on 4 machines and the *only* one where BOINC doesn't grossly underestimate completion times is on my work PIV 3.2 H/T where I am limiting BOINC to one processor. On that machine, the estimated completion times are very close.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.