Ran out of WUs

history
history
Joined: 22 Jan 05
Posts: 127
Credit: 7573923
RAC: 0

CDN: I concur. The curse of

CDN: I concur. The curse of 32 has yet to be addressed by any Mod or Admin for this project. I have begun rotating computers into and out of my KVMs. When one rig runs dry, I substitute in a rig than ran dry the day before. Sometimes it works, occasionally, a day sitting in the corner will attract a full set of 4 hour WU's that will keep the machine busy for 24 hours. Imagine the joy when I get a slug of the 6 minute specials. It seems this issue has been determined by the Admins to have been solved with silence.

Regards-tweakster

Erik
Erik
Joined: 14 Feb 06
Posts: 2815
Credit: 2645600
RAC: 0

deleted

Message 29390 in response to message 29389

deleted

Michael Roycraft
Michael Roycraft
Joined: 10 Mar 05
Posts: 846
Credit: 157718
RAC: 0

CDNgeezer, tweakster, I

CDNgeezer, tweakster,

I believe that the 32 MDQ issue has been addressed by an admin, to the effect that increasing the MDQ would cause overload on the server(s). Patience, my brothers, Patience. You have perhaps noticed that E@H has had at least 3 or 4 outages in the past 3 weeks? That is more than we've experienced here in the year preceeding. Coincidence? Previously, it was quite rare for someone to run up against the MDQ limitation, but with the optimized apps, it is not nearly as rare as before, and the extra stress due to increased productivity, well ... , we must have been close to the overload point already. Please, bear up :-)

Respects,

Michael R.

microcraft
"The arc of history is long, but it bends toward justice" - MLK

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2961389276
RAC: 692926

But it's well over a month

But it's well over a month since Bruce Allen posted that

Quote:
... we're upgrading the disk controllers and should be ready for this increased load soon.

My first post ever on this board quoted that link to try to persuade Linux users not to leave after the start of Akosf's wonderful optimisation push.

None of the outages since then have been attributed to the disk controller upgrade (and I don't think we can attribute two power losses to the increased crunch rate!). I agree with CDNgeezer and tweakster that it would be appreciated if we could have a further information update, and maybe some more detail on that magic "soon".

history
history
Joined: 22 Jan 05
Posts: 127
Credit: 7573923
RAC: 0

Bravo Mr. Haselgrove, it

Bravo Mr. Haselgrove, it seems a solid state project is having some growing pains with unanticipated optimized code. Too bad they didn't include this feature at inception. It would seem that the reliability of the physical plant electrical supply was also left out of the "business plan". I have 5 rigs offline "in the corner" anticipating the relief of those currently crunching those all too coveted 6 minute specials. Any relief in sight?

Regards-tweakster

Honza
Honza
Joined: 10 Nov 04
Posts: 136
Credit: 3332354
RAC: 0

Bad news - speed of 45

Bad news - speed of 45 minutes per 'long' WU with latest akosf D41 app gives exactly 32WU/day/core on my AMD X2 4400+ which makes my machine to meet daily quota and get dry on Einstein.

Please, consider some action about setting quota...soon.

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

Something has changed in the

Something has changed in the WUs I think.

One host (2xMP2600+) using S41.06 was down to about 44 min./Result

Now it got a new set of work (p2_0550) which uses about 1:30 (wuid=7705954)

If that's intended and they enlarged the workunits, this would sure solve the quota problem - at least if one receives a mix of long and short ones.
______________

Nothing to report from my other hosts yet - but I have switched those to S41.06 later, after the first result was valid on my test box and they haven't got a fresh WU set yet
______________

Disadvantage :

CC calibration: blocked [exagerated credit limit] 25.95 >> 69.60 (time: 5428s >> 6986s / Gfpops: 2.12 >> 11.08)

Oh well ...

Honza
Honza
Joined: 10 Nov 04
Posts: 136
Credit: 3332354
RAC: 0

RE: If that's intended and

Message 29396 in response to message 29395

Quote:
If that's intended and they enlarged the workunits, this would sure solve the quota problem - at least if one receives a mix of long and short ones.

Not quite - once you reach daily quota, you are out.
Unless there is a mechanism to send extra-long WUs to fast machine, it's not a cure...just lowering probability that you run out.
For example, just download 8 WUs, 7 long, 1 short, none extra-long.

AFAIK, primitive scheduler has no ability to supply WU according to machine specifications (e.i. WU cache size, turn-around time, pseudo-speed in terms of benchmark, CPU type and other relevant characterization than affects crunch time).

Generally, enlarging WU size, resp. prolonging cruch time, may be a way meet Ran out of WU issue less frequent.
A more succesful way to overcome this is to set your machine to more CPUs that you actually have and enable downloading more WU/CPU.

Still I don't understand - why not make daily Quota higher?
Just a coupled seconds step...I can't things of any major side-effects right now...

With latest akosf's S4106, AMD machines with 2.2+ GHz clock speed will run dry even when only long WUs are provided. We can only hope that extra-long WUs will counterbalance those of short cruch times.

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

Subtracting only 0.2 from the

Subtracting only 0.2 from the used-up quota for short WUs would help a bit too.

It should not be a daily quota though but limit the cache size for unfinished work, so if someone reaches his quota and returns 10 results with exit code 0, he should be given 10 new WUs in exchange.

Honza
Honza
Joined: 10 Nov 04
Posts: 136
Credit: 3332354
RAC: 0

RE: Subtracting only 0.2

Message 29398 in response to message 29397

Quote:
Subtracting only 0.2 from the used-up quota for short WUs would help a bit too.

Still no cure, just another hack. It would need another adjustement when different type of WUs are in queue...not a solution to me.

Quote:
It should not be a daily quota though but limit the cache size for unfinished work, so if someone reaches his quota and returns 10 results with exit code 0, he should be given 10 new WUs in exchange.

That's something to consider. This concept has some advantage - lowering number of unfinished WUs in validator.
AFAIK, daily quota it there to prevent faultys host draing WUs so it can be on place...when set large enough for fast machine - which is not the case right now.
I think what you suggest means technically that quota needs to be higher 'cause exit status already takes place in quota level. But that it can be machine driven - meeting quota while all WU success -> make quota higher for particular machine.
That would need a little change in code I guess.

Setting daily quota to meet 'standard' machine specs value but enable "run as fast as you can" mode is nice idea :-)

Still - why not having quota set to 50 or 64 in the meantime???

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.