Too fast

ADDMP

Joined: 25 Feb 05

Posts: 104

Credit: 7332049

RAC: 0

21 Jan 2006 20:34:09 UTC

Topic 190653

(moderation:

)

I have a dual-core athlon-64 computer that has been stopped by E@H for a 6-hour enforced delay because it completed its daily alotment of 32 units before 24 hours. It has been running some "albert" units in about 4000 sec or 1.1 hours for each unit for each core. That means in 24 hours it should complete 2*(24/1.1)= 43.6 units with both cores running.

I think there might be some problem with the alotment calculation at E@H that assigned 32 unts max per day.

I can delete the installation & re-install if that is the only fix.

[This computer is named XBOX...]

ADDMP

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3587991219

RAC: 991292

Too fast

21 Jan 2006 21:20:50 UTC

Message 24371

(moderation:

)

No it's nothing wrong with your install. The problem is that unlike Einstiens which were fairly consistant in size Alberts can vary by a factor of 4, and the scheduler doesn't take the difference into account when handing out work. You're not the first person to have problems with a really fast machine and short Alberts, some highend macs've been burned as well. For the moment the best you can do is to add a 2nd project and set the work distribution to 99/1. The 2nd project will then (almost) only run when you're out of work for e@h, and a starvation attack here will put it far enough ahead that the 2nd project won't do anything at all for a time.

Bruce Allen

Moderator

Joined: 15 Oct 04

Posts: 1119

Credit: 172127663

RAC: 0

RE: I have a dual-core

22 Jan 2006 1:08:03 UTC

Message 24372

(moderation:

)

Quote:

I have a dual-core athlon-64 computer that has been stopped by E@H for a 6-hour enforced delay because it completed its daily alotment of 32 units before 24 hours. It has been running some "albert" units in about 4000 sec or 1.1 hours for each unit for each core. That means in 24 hours it should complete 2*(24/1.1)= 43.6 units with both cores running.

I think there might be some problem with the alotment calculation at E@H that assigned 32 unts max per day.

I can delete the installation & re-install if that is the only fix.

[This computer is named XBOX...]

ADDMP

Looking at the scheduler logs (available by following on-line links) I see that your computer has:

2006-01-21 23:40:03.0533 [PID=19068] [debug   ] CONTENT_LENGTH=4514 
2006-01-21 23:40:03.1788 [PID=19068] [normal  ] Handling request:   host 522041, platform i686-pc-linux-gnu, version 5.2.13, RSF 1.000000
2006-01-21 23:40:03.1788 [PID=19068] [normal  ] OS version Linux 2.6.13-15-smp
2006-01-21 23:40:03.1876 [PID=19068] [debug   ] Request [HOST#522041] Database [HOST#522041] Request [RPC#0] Database [RPC#0]
2006-01-21 23:40:03.1884 [PID=19068] [normal  ] Processing request  [HOST#522041]  [RPC#0] core client version 5.2.13
2006-01-21 23:40:03.5179 [PID=19068] [debug   ]   Result is on [HOST#522041]: r1_0148.5__190_S4R2a_2
2006-01-21 23:40:03.5180 [PID=19068] [debug   ]   Result is on [HOST#522041]: r1_0148.5__189_S4R2a_1
2006-01-21 23:40:03.5180 [PID=19068] [debug   ]   Result is on [HOST#522041]: r1_0148.5__188_S4R2a_1
2006-01-21 23:40:03.5190 [PID=19068] [normal  ]   [HOST#522041] got request for 1831.035555 seconds of work; available disk 16.098672 GB

So really the question in my mind is, why doesn't this machine have a LOT more results on it? And why isn't it requesting more than 1800 seconds of work?

PS: one of your intel boxes is reporting results with zero CPU time. Consider updating BOINC to fix this problem.

Director, Einstein@Home

ADDMP

Joined: 25 Feb 05

Posts: 104

Credit: 7332049

RAC: 0

RE: No it's nothing wrong

22 Jan 2006 3:37:01 UTC

Message 24373 in response to message 24371

(moderation:

)

Quote:

No it's nothing wrong with your install. The problem is that unlike Einstiens which were fairly consistant in size Alberts can vary by a factor of 4, and the scheduler doesn't take the difference into account when handing out work. You're not the first person to have problems with a really fast machine and short Alberts, some highend macs've been burned as well. For the moment the best you can do is to add a 2nd project and set the work distribution to 99/1. The 2nd project will then (almost) only run when you're out of work for e@h, and a starvation attack here will put it far enough ahead that the 2nd project won't do anything at all for a time.

Thanks for that news & suggestion. I had some trouble getting BOINC to allow me to run two E@H versions at once, but I now have both a linux/wine/windows version and a native linux version running simultaneously at different levels of "nice"ness. That should be OK, but I'll see what happens as they return results.

ADDMP

Joined: 25 Feb 05

Posts: 104

Credit: 7332049

RAC: 0

RE: RE: I think there

22 Jan 2006 4:05:25 UTC

Message 24374 in response to message 24372

(moderation:

)

Quote:

Quote:
I think there might be some problem with the alotment calculation at E@H that assigned 32 unts max per day.

I can delete the installation & re-install if that is the only fix.

[This computer is named XBOX...]

ADDMP

Bruce Allen wrote:
Looking at the scheduler logs (available by following on-line links) I see that your computer has:
[code]...

So really the question in my mind is, why doesn't this machine have a LOT more results on it? And why isn't it requesting more than 1800 seconds of work?
Quote:

Sorry, I can't interpret the info you listed, but very likely you are looking at logs after I did a lot of tinkering trying to force the computer to get more units. It is now running both the native linux & the windows versions of BOINC simultaneously. The native Linux version is niced-out to a much lower priority, & should not need many units.

When it was running straight linux/wine/windows, it usually had about 8 units simltaneously either waiting to run or running or waiting to return.

But nevertheless, I think if you check its completed results, they were running in about 4000 seconds, & that is about 43 units a day, but it was restricted to receiving only 32 units a day.

Quote:
of your intel boxes is reporting results with zero CPU time. Consider updating BOINC to fix this problem.

Thanks, l'll check it. I am running linux/wine/windows version on most boxes & I have not been able to get that working with the newer BOINCs. So it is a trade off between running slower with the native linux version & getting occasional glitches with wine.

I might convert the Intel boxes back to native Linux, since that version was more efficient with Intel than with Athlons.

ADDMP

Bruce Allen

Moderator

Joined: 15 Oct 04

Posts: 1119

Credit: 172127663

RAC: 0

RE: But nevertheless, I

22 Jan 2006 16:54:32 UTC

Message 24375 in response to message 24374

(moderation:

)

Quote:

But nevertheless, I think if you check its completed results, they were running in about 4000 seconds, & that is about 43 units a day, but it was restricted to receiving only 32 units a day.

I've bumped up the per cpu quotas by another factor of two. Let's see if that fixes this problem.

Director, Einstein@Home

AnRM

Joined: 9 Feb 05

Posts: 213

Credit: 4346941

RAC: 0

Thanks, Bruce, for increasing

22 Jan 2006 17:47:30 UTC

Message 24376

(moderation:

)

Thanks, Bruce, for increasing the MDQ to 32.....our dual core and faster 64s will be happy again! I can also increase the E@H share on these machines back to the pre-'Albert' levels. Tweakster will be happy and warm as well!....Cheers, Rog.

Michael Roycraft

Joined: 10 Mar 05

Posts: 846

Credit: 157718

RAC: 0

RE: I've bumped up the per

22 Jan 2006 18:14:34 UTC

Message 24377 in response to message 24375

(moderation:

)

Quote:

I've bumped up the per cpu quotas by another factor of two. Let's see if that fixes this problem.

OMG, Sir, no, tell me you didn't! At least not without implementing a "abort queued work at uninstall" bit, AND returning to a replication of 4. Already, we have a huge and increasing quantity on "pendings" due to the normal numbers of project dropouts and eyes-bigger-than-their-bellies WU hoggers. Your databases will be bloated, and it will not be an unusual case to have folks waiting a month or more for pending credits to be resolved. This will be a headache of epic proportions, and even it quickly reversed, the "hangover" will last for a month.

Dr. Allen, the "too fast, not enough work" problem only affected a (relative) handful of the fastest hosts, crunching the shortest WUs, less than 3% (I would guesstimate) and was safely solved by adding a backup project with minimal share. It was at most temporary, as you said that the "shorties" have been nearly depleted. This 32/day quota will affect most of the balance of crunchers, and new complaints will increase exponentially. It ain't gonna be pretty!

Respectfully,

Michael

edited for typos

microcraft
"The arc of history is long, but it bends toward justice" - MLK

Wurgl (speak^Wc...

Joined: 11 Feb 05

Posts: 321

Credit: 140550008

RAC: 0

RE: OMG, Sir, no, tell me

22 Jan 2006 18:25:13 UTC

Message 24378 in response to message 24377

(moderation:

)

Quote:

OMG, Sir, no, tell me you didn't! At least not without implementing a "abort queued work at uninstall" bit, AND returning to a replication of 4. Already, we have a huge and increasing quantity on "pendings" due to the normal numbers of project dropouts and eyes-bigger-than-their-bellies WU hoggers.

What do you like more? A box which is sitting idle waiting for the next day to get more work or pendings which cause excellent cobblestones somewhen later?

I do not care for the pendings, they are as good as my money on my bank. Somewhen I will get all of of them. Michael, be patient, the time works for you :-)

Michael Roycraft

Joined: 10 Mar 05

Posts: 846

Credit: 157718

RAC: 0

RE: What do you like more?

22 Jan 2006 18:36:11 UTC

Message 24379 in response to message 24378

(moderation:

)

Quote:

What do you like more? A box which is sitting idle waiting for the next day to get more work or pendings which cause excellent cobblestones somewhen later?

Again, no excuse for an idle box, except stubborness. Repeat the mantra - secondary project, minimal share, secondary project, minimal share. That, exactly, is what BOINC was designed to do. It doesn't require a Nostradamus to foresee the upcoming flood of unhappy participants. Where we once could see the light at the end of the tunnel (depletion of short WUs), now the light is transformed into a diesel truck hauling a double-wide trailerhome. Is it you few who are going to personally handle all the flood of complaints? How selfish can a few people be, to cause problems for the majority instead of using the designed-in solution?

Quote:

I do not care for the pendings, they are as good as my money on my bank. Somewhen I will get all of of them. Michael, be patient, the time works for you :-)

I'm sorry, but I don't have time to be patient. I'm on a rather short "deadline" (bad pun) myself.

Respects,

Michael

edited - to soften the tone
edit - reference to personal difficulties deleted

microcraft
"The arc of history is long, but it bends toward justice" - MLK

AnRM

Joined: 9 Feb 05

Posts: 213

Credit: 4346941

RAC: 0

Michael, I think you should

22 Jan 2006 19:10:29 UTC

Message 24380

(moderation:

)

Michael, I think you should have more faith in the project Admins on their MDQ change. I'm sure they have done the equivelent of a 'cost/benefit' analysis on this and they realise that more and more dual core/faster boxes are coming on line and they will have to adjust the MDQ sooner or later. Yes, you may have to wait longer for a 'problem' WU but they are still 'money in the bank' IMHO it will not solve the problem of people attaching and leaving with WUs unprocessed. The delay in validation has already increased with the shift to 14 days and initial replication reduction to 3. I think the MDQ has some effect but is not the critical problem....Cheers, Rog.

Too fast

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner