Under Estimating Wu Completion Times

The Gas Giant

Joined: 18 Jan 05

Posts: 72

Credit: 3109569

RAC: 0

1 Jun 2005 20:49:19 UTC

Topic 189274

(moderation:

)

Why is it that Einstein underestimates the wu completion time so badly? On my 3.2GHz HT machine it estimates around 7hrs but it takes more like 11hrs. Whereas Predictor estimates 1nr45min and takes nearer 1hr. Because of the Einstein anomaly (oh I like that) BOINC downloads more work than it really needs for Einstein and then it gets it's knickers in a knot over deadlines etc. Severely overestimating completion times like predictor does is another problem.

Live long and crunch!

Paul

Mike

Joined: 20 Feb 05

Posts: 151

Credit: 5536135

RAC: 0

Under Estimating Wu Completion Times

2 Jun 2005 10:30:15 UTC

Message 12086

(moderation:

)

I could see you are useing Boinc 4.44 so what i do with longrunners is give a amount of units a week.
I allways download two units and switch no new work on projects tab.
The client surely goes in panic mode because the deadline is closer.
The results will be finnished in time and you can switch allow new work when you want to.
So you will have better control of it.

greetz and respectful from Germany to Australia
Mike

gravywavy

Joined: 22 Jan 05

Posts: 392

Credit: 68962

RAC: 0

Why is it that Einstein

2 Jun 2005 20:04:22 UTC

Message 12087

(moderation:

)

Why is it that Einstein underestimates the wu completion time so badly? On my 3.2GHz HT machine it estimates around 7hrs but it takes more like 11hrs. [...]

My theory is that this is due to the fact that FFTs are hard to pipeline and you have a pipelined processor (all processors are pipelined for floating point, these days).

The benchmark program is not an FFT, so the pipeline runs as Intel intended. The real code repeatedly stalls the pipeline, so that your processor runs slower than it did when benchmarked.

The pipeline is how the processor can start one floating point op before it has finished the previous. If your benchmark speed for Floating Point is faster than for Intgers, or even if the speeds are equal, then you have a pipelined processor. Without a pipe, expect to see Floats run more slowly than integers.

If you stall the pipe, it means that the code cannot figure out which cal is needed next, or that it took a likely guess but got it wrong. Either way the Floating Point speed drops to about half the integer speed for that one calc. If it happened every time you'd knock 75% or so off the spped of your processor.

My theory is based on two things: people with newer, more heavily pipelined chips see a bigger effect: your WU take over 1.5 x estimated, mine on a years old 700MHz Pentium take 1.3 x estimated; consistent with the fact that your pentium has a longer pipeline than mine.

Secondly, that E@H contains a lot of FFTs and these are really hard work for any pipelined processor. I said more about this issue here, in another thread but that posting maybe more technical an answer than you were hoping for.

Finally, I hope you are getting two wu done in the 11 hours you quote? My 700MHz box takes almost exactly 24hrs for a wu, and if you are getting only 2.2 wu per day something funny is happening; whereas if you are running the HT as a dual processor you'd be getting about 4.4x my throughput, which is what I'd expect.

~~gravywavy

jerry

Joined: 20 Apr 05

Posts: 6

Credit: 35784

RAC: 0

I have a 2 gig celeron and am

5 Jun 2005 15:32:10 UTC

Message 12088

(moderation:

)

I have a 2 gig celeron and am having the same trouble with the estimator for the projects I run I have a feeling that the time given is if wwe were in the next room from the server.

Nightbird

Joined: 17 Feb 05

Posts: 79

Credit: 561723

RAC: 0

Perhaps an idea : don't use a

5 Jun 2005 18:55:52 UTC

Message 12089

(moderation:

)

Perhaps an idea : don't use a cache too large

[

The Gas Giant

Joined: 18 Jan 05

Posts: 72

Credit: 3109569

RAC: 0

It's nothing to do with

6 Jun 2005 0:52:47 UTC

Message 12090

(moderation:

)

It's nothing to do with deadlines persee, but an issue with BOINC going into deadline mode (even though there is no chance with hitting the deadline). Since it then crunches Einstein in deadline mode it's LT debt gets too +ve and then it doesn't download any more work until it's LT debt gets lots -ve and we start going into limit cycling of the projects.

People with a background in process control technology (PID loops etc) would be aghast at the situation that BOINC gets itself into. Limit cycling where it goes from one extreme to the other is called "out of control" and is only something that novices think is something that works.

Live long and crunch!

Paul.

ps. I don't think a cache of 4 days work is too large...does anyone? I know BOINC does!

Paul

Nightbird

Joined: 17 Feb 05

Posts: 79

Credit: 561723

RAC: 0

"ps. I don't think a cache of

6 Jun 2005 18:10:29 UTC

Message 12091

(moderation:

)

"ps. I don't think a cache of 4 days work is too large...does anyone? "
I don't use cache of 4 days.

[

Divide Overflow

Joined: 9 Feb 05

Posts: 91

Credit: 183220

RAC: 0

I don't think a cache of 4

6 Jun 2005 20:18:36 UTC

Message 12092 in response to message 12090

(moderation:

)

I don't think a cache of 4 days work is too large...does anyone?

You know your cache size is too large when you start missing WU deadlines! The rest is entirely subjective. :)

Celtic Wolf

Joined: 9 Feb 05

Posts: 34

Credit: 18196

RAC: 0

There is a known issue with

7 Jun 2005 16:36:35 UTC

Message 12093

(moderation:

)

There is a known issue with 4.4x of BOINC not computing deadlines correctly.

It seems that Einstein WU's compute faster in the beginning then they do near the end or some such thing. I am waiting to see what BOINC does when my Climate WU gets near it's end... I'll let you know 4 months from now what happened...

This is supposed to be fixed in the highly anticipated 4.45 release.. Also if you are crunching Protein WU's and have a Protein version greater them 4.28 you need to abort them because versions above 4.28 are broken. The will appear to be working but are stalled. This puts the LTD for SETI and Einstein into a bad state.

After I aborted mine I had to manually edit client_state.xml and zero out all the "debt" values to get any new WU's from Einstein or SETI.

The Gas Giant

Joined: 18 Jan 05

Posts: 72

Credit: 3109569

RAC: 0

I don't think a cache of 4

9 Jun 2005 10:52:06 UTC

Message 12094 in response to message 12092

(moderation:

)

I don't think a cache of 4 days work is too large...does anyone?

You know your cache size is too large when you start missing WU deadlines! The rest is entirely subjective. :)

Never said I ever missed a deadline......I just feel that the severe over estimation of the wu completion times causes BOINC to prematurely have kittens over potential deadline issues, when in reality there are none!

Paul

Grimm

Joined: 22 Jan 05

Posts: 40

Credit: 234574593

RAC: 83764

This has been an issue since

10 Jun 2005 11:04:17 UTC

Message 12095 in response to message 12094

(moderation:

)

This has been an issue since the beta test of Einstein that has never been addressed. The time estimates are even worse on "older" PIII machines. I have tried running Einstein on 4 machines and the *only* one where BOINC doesn't grossly underestimate completion times is on my work PIV 3.2 H/T where I am limiting BOINC to one processor. On that machine, the estimated completion times are very close.

Under Estimating Wu Completion Times

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner