Issue with cache on multi GPU system

Pokey
Pokey
Joined: 7 Jan 16
Posts: 14
Credit: 6,679,671,590
RAC: 4,202,218
Topic 229427

Two of my multi gpu rigs are only getting what I call just-in-time work.  They are not building any extra storage even though I have upped the storage setting in local prefs.  I have added >ncpu<64>ncpu< to cc_config; read config files; read local prefs; and restarted boinc, and updated numerous times, all to no avail.  What am I missing?

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 3,248
Credit: 2,785,439,252
RAC: 986,125

Check that your project

Check that your project resource share is not set to 0. Zero resource share gives you new work only when previous tasks have been finished.

[Edit] I just viewed your list of hosts and see that my first suggestion probably isn't you problem. Other things to check are that your cache sizes are something reasonable, like 0.5 + 0.5 days and that Boinc is allowed to use enough hard disk space for all the tasks you want to have.

Pokey
Pokey
Joined: 7 Jan 16
Posts: 14
Credit: 6,679,671,590
RAC: 4,202,218

Thanks Harri, My cache

Thanks Harri,

My cache sizes are set to 0.1 day + 0.01 day storage so, reasonable I think.  I just want enough to get away from the delays associated with uploading and reporting and downloading.  And disc usage is set to 90% which leaves me with approx 91 GBs on the the (5 GPU) rig.

Appreciate the suggestions.

mikey
mikey
Joined: 22 Jan 05
Posts: 11,340
Credit: 1,755,011,872
RAC: 11,785

Pokey wrote: Two of my multi

Pokey wrote:

Two of my multi gpu rigs are only getting what I call just-in-time work.  They are not building any extra storage even though I have upped the storage setting in local prefs.  I have added >ncpu<64>ncpu< to cc_config; read config files; read local prefs; and restarted boinc, and updated numerous times, all to no avail.  What am I missing?

Are you running cpu tasks from here as well? If so that's LONG been a problem where once the cpu cache is filled the gpu will refuse to get tasks because 'the cache is already full'.

Pokey
Pokey
Joined: 7 Jan 16
Posts: 14
Credit: 6,679,671,590
RAC: 4,202,218

Mikey, no.No cpu

Mikey, no.

No cpu tasks. 

I have reset the two problem rigs and that helped for a while, but after a bit they both reverted back to just-in-time again.

I keep going through the settings and have yet to find an explanation for some of the rigs to have caches and some not.  The work has been steady so far, so I'm trying not to obsess over it too much.  :) 

But will keep looking.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,820
Credit: 106,331,134,724
RAC: 5,988,818

Pokey wrote:...  I keep going

Pokey wrote:
...  I keep going through the settings and have yet to find an explanation for some of the rigs to have caches and some not.

First of all 0.1 days is a very low cache setting.  It's less than 2.5 hours of work even if on_frac is 100%.  The most likely cause of problems with some machines and not others is not likely in the settings, otherwise all would likely be in the same boat.  It's not something you could correct by changing <ncpus>.  Have you checked the value for on_frac as stored in each machine?

The easiest way to quickly check the current value for on_frac is to open a terminal session and type (at the user prompt) the command:-

grep on_frac path/to/client_state.xml

You will see something like  <on_frac>0.999524</on_frac>  which just happens to be from one of my hosts.  It means that BOINC calculates that the machine has BOINC running for 99.9524% of the time.  Just make sure you change "path/to/" to be a valid path for where your state file is stored.  Mine just happens to be /home/gary/BOINC/.

If you see a much lower value, (eg. 0.333333) then BOINC would only maintain a cache size of 2.5hrs x 0.333333 which would be less than 50 mins.  Maybe your affected hosts have crazy low on_frac values at the moment.  Do you leave your machine running all the time?  If you don't you will have a low on_frac.  In that case just up the work cache setting from 0.1 days to 0.5 days.  That will get you 5 times as many ready-to-run tasks as you currently have.

0.1 days is crazy low in any case.  If there's ever a relatively short outage, you'll likely run out of work very quickly.  What would you do if the project went down at 6.00PM one evening?  It might not get fixed until mid-morning (or later) next day.  There would be no real problem with maintaining a full 1 day setting which would allow you to survive most potential outages.

Cheers,
Gary.

Pokey
Pokey
Joined: 7 Jan 16
Posts: 14
Credit: 6,679,671,590
RAC: 4,202,218

  Thanks GARY,  Very

 

Thanks GARY,  Very helpful.  That's a report line I was ignorant of.  I'll check it out.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,820
Credit: 106,331,134,724
RAC: 5,988,818

Pokey wrote:... That's a

Pokey wrote:
... That's a report line I was ignorant of.

I presume you're referring to running the grep command.

Unless you're used to delving into the state file (client_state.xml) and are really familiar with how it all works, it's probably not something that the average user would even know about or consider investigating.  Your description of some hosts having normal work caches whilst others were basically empty, points to that particular parameter as a possible culprit.  BOINC is designed to gradually fix this on its own.  Since I noticed you were running Linux, I thought I'd suggest an easy way to check.

BOINC will raise the value if your computer spends more time crunching.  It is possible to correct a very low value immediately, simply by stopping BOINC and editing the value in the state file to read 1.000000.  I'm NOT recommending this unless you are very familiar with editing xml files using a plain text editor.

I have a lot of hosts and I shut down a bunch of them during the hot summer months.  When I restart them, I always immediately correct the on_frac value so they can immediately get full work caches.  Otherwise, after being shut down for a couple months, they would have values very close to zero and it would take days/weeks to get up to the desired amount of work on hand.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.