How do I.....

Perle

Joined: 22 Jan 05

Posts: 47

Credit: 2224503945

RAC: 1249992

12 Jan 2009 2:57:56 UTC

Topic 194131

(moderation:

)

......get a cache of wu\'s on my dual x5355 system.

The current parameters allow all of my other sytsems to cache (1 days worth) a couple of extra wu\'s, yet the x5355 never has any wu\'s cached.
So I end up with a core or two idle because the system
gets pushed back by the \"last attempt to recent\" error.

Surely this has something to do with the core count.

PS: I asked this in the LHC forums as well, thought I would try a more active forum.

Mary

Joined: 2 Jun 08

Posts: 76

Credit: 634013

RAC: 0

How do I.....

12 Jan 2009 6:39:00 UTC

Message 89817

(moderation:

)

I saw your post in the LHC forums as well, but I'm replying here because I don't check that site very often. Since you've apparently already been tinkering with the 'work for x number of days' parameter, I won't bother you with suggesting adjustments to that.

Okay, I know a couple of ways to do this. This first one is the easiest. If you want to stock up on, say Einstein work, suspend all the other tasks running for other projects except your Einstein tasks. Your computer will then think it's low on work and will request more. You can do this project by project to get work stocked up on all of them. However, there is a catch. If Einstein owes too much long-term debt to other projects (you can check it here: click), or if you already have a lot of Einstein tasks it may not send out the work request.

Another method is to go into your client_state file and manually edit it (beware, this can be very, VERY dangerous if you don't know what you are doing!). My favorite is to turn the 'task duration correction factor' for a project way down so that it thinks the tasks are shorter than they really are, and thus that it has less work than it actually does. Eventually, the 'tdcf' will correct itself as you run through the workunits, so it's only a temporary trick. You can also modify your short-term and long-term project debts in this file. I'm sure there are other ways of stocking up on tasks, but I find these two to be the simplest.

Be forewarned that there is no real workaround for LHC@home since it limits you to a supposed max of 2 tasks per request and one request every 15 minutes (that and the server is out of WU's again as of this post).

~It only takes one bottle cap moving at 23,000 mph to ruin your whole day~

Jord

Joined: 26 Jan 05

Posts: 2952

Credit: 5893653

RAC: 0

RE: the \"last attempt to

12 Jan 2009 8:21:22 UTC

Message 89818

(moderation:

)

Quote:

the \"last attempt to recent\" error.

This isn't an error, but an informational message. It tells you that you pressed the "Update" button too soon after an earlier attempt to reach the server was already asked (by that server) to wait for several minutes. Hammering Update won't get things done any quicker.

Deferring communications is a way for BOINC to put a pause on computers continually requesting work and therewith having database inquiries running that will end with the database being very slow or even crashing. A couple of hundreds computers per second doing an inquiry is OK, a couple of thousand and it'll soon get to the point where the DB just can't take any more.

Hence the time-out being passed on to these computers, telling them to wait. Just like a traffic light.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119190133840

RAC: 24840224

RE: ......get a cache of

12 Jan 2009 8:26:16 UTC

Message 89819

(moderation:

)

Quote:

......get a cache of wu\'s on my dual x5355 system.

Simply by setting it to what you require in your preferences :-).

Quote:

The current parameters allow all of my other sytsems to cache (1 days worth) a couple of extra wu\'s, yet the x5355 never has any wu\'s cached.

Your Xeon has 8 cores and a task takes about 6 hours so a 1 day cache should allow you to keep about 32 new tasks on hand. You have 9 only at the moment so it's no wonder that you have cores running dry. Have you set any local preferences for that machine or are you using the website preferences? If you are using web site preferences only, they are either not set to a 1 day cache or there is some problem with BOINC's view of how many cores you have. It looks like BOINC thinks you only have two cores, but you must be running a lot more than that to have a RAC of over 4500. Also you must be essentially running EAH only on that host to get such a high RAC even with cores running dry.

You can set local preferences just for that host by using the "Advanced" drop down menu in BOINC Manager and selecting the "Preferences..." option. In the preferences window that opens, check the "processor usage" tab and see how many cores are allowed to be used. You should have 8. Check the "network usage" tab and see your settings for "connect every X days" and "additional work buffer Y days. If it were my machine, I would set X=0.0 and Y=3.0 which should allow you to have about 100 tasks ready to go at any time. However, you should set whatever you feel comfortable with.

If your website preferences were working properly you wouldn't need to consider local preferences for that host. If you can't immediately spot the problem with your website prefs, try forcing the issue with your local prefs to see if you can get a more reasonable supply of tasks. Please don't start fiddling with DCF or any other state-file tricks as these are really not necessary. You shouldn't need to suspend any projects either.

Quote:

So I end up with a core or two idle because the system
gets pushed back by the \"last attempt to recent\" error.

You should only get this message if you try to force a work request more frequently than each minute. Leave the "Update" button alone :-).

Quote:

Surely this has something to do with the core count.

That would be my reckoning as well. When you have enough tasks on hand, are all 8 cores crunching simultaneously?

PS: Just had another thought. Since you mentioned the LHC forum, I guess EAH may be sharing this host with LHC and perhaps with a high resource share for LHC?? Because LHC doesn't have work all that often, maybe there is a big debt to LHC which is causing BOINC to be frugal with the stock of EAH work just in case LHC suddenly gets some (as it has done recently). Do you have a bunch of LHC work recently? They have a 15 minute backoff period and you are only allowed two tasks at a time so if you get impatient there I could imagine lots of "last attempt too recent" type messages :-).

The other nasty thing about LHC is that because they issue 5 and only require 3 there can be lots of server generated task aborts when a quorum is filled. Each such abort is treated like a client error and so reduces your daily task limit, which is meagre enough anyway without further unfair reduction through no fault of yours.

Cheers,
Gary.

Gundolf Jahn

Joined: 1 Mar 05

Posts: 1079

Credit: 341280

RAC: 0

RE: ...Please don't start

12 Jan 2009 8:53:24 UTC

Message 89820 in response to message 89819

(moderation:

)

Quote:

...Please don't start fiddling with DCF or any other state-file tricks as these are really not necessary. You shouldn't need to suspend any projects either...

I agree in principle. Nonetheless, I'd at least check the entries in the section, as they sometimes tend to show incongruent values.

GruÃŸ,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Perle

Joined: 22 Jan 05

Posts: 47

Credit: 2224503945

RAC: 1249992

Gary, you nailed it sir. The

12 Jan 2009 14:32:51 UTC

Message 89821

(moderation:

)

Gary, you nailed it sir.
The PS from your post is exactly the reason why this is occuring.

Yes, all eight core are active, EaH and LhC are to only two projects for this system, web preferences, cache is set at 1 day, network = 0.0.

I have LhC at 80% and EaH at 20%.

The 15 minute backoff is only be issued by LhC, not Einstein.
When it does request additional wu's from EaH it will report tasks completed and request 1.0 seconds of work.

What is worse is that the 15 min LhC push back is off by less than 10-30 seconds, its like my system timer is running down 10 seconds faster than the LhC timer then will send a request for work, receiving another push back.

I do not monitor this machine daily, due to the recent run of LhC I started watching the task lists more carefully and reading the message log and discovered this condition.

Again...I do not sit there mashing the update button !

Dagorath

Joined: 22 Apr 06

Posts: 146

Credit: 226423

RAC: 0

RE: Gary, you nailed it

12 Jan 2009 15:05:30 UTC

Message 89822 in response to message 89821

(moderation:

)

Quote:

Gary, you nailed it sir.
The PS from your post is exactly the reason why this is occuring.

Yes, all eight core are active, EaH and LhC are to only two projects for this system, web preferences, cache is set at 1 day, network = 0.0.

I have LhC at 80% and EaH at 20%.

Since LHC doesn't have steady work it will never be able to fulfill the 80% share you're requesting and keep the 8 cores busy. A more reasonable share would be Einstein 80% and LHC 20% though you might find that over the longterm LHC can't even meet 20% (depends on how much work they issue in the future).

Quote:

The 15 minute backoff is only be issued by LhC, not Einstein.
When it does request additional wu's from EaH it will report tasks completed and request 1.0 seconds of work.

You should probably think about reseting the longterm debts. With your current resource share, LHC's debt is likely approaching the point where the scheduler will stop requesting Einstein tasks altogether.

Quote:

What is worse is that the 15 min LhC push back is off by less than 10-30 seconds, its like my system timer is running down 10 seconds faster than the LhC timer then will send a request for work, receiving another push back.

Interesting. In the manager the back off appears to be working off a countdown timer but in the actual code it may be waiting until a time of day which might not work properly if your clock or the server's clock is wrong?

BOINC FAQ Service
Official BOINC wiki
Installing BOINC on Linux

Brian Silvers

Joined: 26 Aug 05

Posts: 772

Credit: 282700

RAC: 0

RE: I have LhC at 80% and

12 Jan 2009 15:19:20 UTC

Message 89823 in response to message 89821

(moderation:

)

Quote:

I have LhC at 80% and EaH at 20%.

I'd agree that there's no way that even if you set LHC to 20% that you could have reasonable hopes of achieving a long-term 20% allocation. Getting LHC work is all about making a request for work at the exact moment in time that the work is made available. Even 60K of tasks that was just issued over the weekend didn't last very long... By the time I noticed, their web site said that they were "low on work", but when I requested, I got zippo...

In my opinion, LHC has all the computing power that they need at this point. I'd reallocate to at least 90% Einstein...if not 95%...

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119190133840

RAC: 24840224

RE: What is worse is that

13 Jan 2009 9:34:30 UTC

Message 89824 in response to message 89821

(moderation:

)

Quote:

What is worse is that the 15 min LhC push back is off by less than 10-30 seconds, its like my system timer is running down 10 seconds faster than the LhC timer then will send a request for work, receiving another push back.

I think you are dead right about this. I remember some time ago I had a machine whose clock would gain appreciably - a minute or two per day if I remember correctly. The BOINC client must use the clock to count down the 15 minutes and, because of the fast clock, it would do so in less than 15 minutes of true time. This leads to a server rejection of the type you describe. I'd forgotten about it until you mentioned it.

Quote:

Again...I do not sit there mashing the update button !

Sorry, didn't mean to sound critical :-).

Cheers,
Gary.

How do I.....

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner