Issue with cache on multi GPU system

Pokey

Joined: 7 Jan 16

Posts: 14

Credit: 6998346019

RAC: 2

24 Apr 2023 12:52:23 UTC

Topic 229427

(moderation:

)

Two of my multi gpu rigs are only getting what I call just-in-time work. They are not building any extra storage even though I have upped the storage setting in local prefs. I have added >ncpu<64>ncpu< to cc_config; read config files; read local prefs; and restarted boinc, and updated numerous times, all to no avail. What am I missing?

Harri Liljeroos

Joined: 10 Dec 05

Posts: 4590

Credit: 3343141263

RAC: 1921200

Check that your project

24 Apr 2023 13:10:04 UTC

Message 211498

(moderation:

)

Check that your project resource share is not set to 0. Zero resource share gives you new work only when previous tasks have been finished.

[Edit] I just viewed your list of hosts and see that my first suggestion probably isn't you problem. Other things to check are that your cache sizes are something reasonable, like 0.5 + 0.5 days and that Boinc is allowed to use enough hard disk space for all the tasks you want to have.

Pokey

Joined: 7 Jan 16

Posts: 14

Credit: 6998346019

RAC: 2

Thanks Harri, My cache

24 Apr 2023 14:44:16 UTC

Message 211506

(moderation:

)

Thanks Harri,

My cache sizes are set to 0.1 day + 0.01 day storage so, reasonable I think. I just want enough to get away from the delays associated with uploading and reporting and downloading. And disc usage is set to 90% which leaves me with approx 91 GBs on the the (5 GPU) rig.

Appreciate the suggestions.

mikey

Joined: 22 Jan 05

Posts: 12886

Credit: 1884404578

RAC: 102804

Pokey wrote: Two of my multi

25 Apr 2023 10:50:43 UTC

Message 211525

(moderation:

)

Pokey wrote:

Two of my multi gpu rigs are only getting what I call just-in-time work. They are not building any extra storage even though I have upped the storage setting in local prefs. I have added >ncpu<64>ncpu< to cc_config; read config files; read local prefs; and restarted boinc, and updated numerous times, all to no avail. What am I missing?

Are you running cpu tasks from here as well? If so that's LONG been a problem where once the cpu cache is filled the gpu will refuse to get tasks because 'the cache is already full'.

Pokey

Joined: 7 Jan 16

Posts: 14

Credit: 6998346019

RAC: 2

Mikey, no.No cpu

27 Apr 2023 16:05:24 UTC

Message 211607

(moderation:

)

Mikey, no.

No cpu tasks.

I have reset the two problem rigs and that helped for a while, but after a bit they both reverted back to just-in-time again.

I keep going through the settings and have yet to find an explanation for some of the rigs to have caches and some not. The work has been steady so far, so I'm trying not to obsess over it too much. :)

But will keep looking.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119310292980

RAC: 25575811

Pokey wrote:... I keep going

28 Apr 2023 10:29:16 UTC

Message 211636 in response to message 211607

(moderation:

)

Pokey wrote:

... I keep going through the settings and have yet to find an explanation for some of the rigs to have caches and some not.

First of all 0.1 days is a very low cache setting. It's less than 2.5 hours of work even if on_frac is 100%. The most likely cause of problems with some machines and not others is not likely in the settings, otherwise all would likely be in the same boat. It's not something you could correct by changing <ncpus>. Have you checked the value for on_frac as stored in each machine?

The easiest way to quickly check the current value for on_frac is to open a terminal session and type (at the user prompt) the command:-

grep on_frac path/to/client_state.xml

You will see something like <on_frac>0.999524</on_frac> which just happens to be from one of my hosts. It means that BOINC calculates that the machine has BOINC running for 99.9524% of the time. Just make sure you change "path/to/" to be a valid path for where your state file is stored. Mine just happens to be /home/gary/BOINC/.

If you see a much lower value, (eg. 0.333333) then BOINC would only maintain a cache size of 2.5hrs x 0.333333 which would be less than 50 mins. Maybe your affected hosts have crazy low on_frac values at the moment. Do you leave your machine running all the time? If you don't you will have a low on_frac. In that case just up the work cache setting from 0.1 days to 0.5 days. That will get you 5 times as many ready-to-run tasks as you currently have.

0.1 days is crazy low in any case. If there's ever a relatively short outage, you'll likely run out of work very quickly. What would you do if the project went down at 6.00PM one evening? It might not get fixed until mid-morning (or later) next day. There would be no real problem with maintaining a full 1 day setting which would allow you to survive most potential outages.

Cheers,
Gary.

Pokey

Joined: 7 Jan 16

Posts: 14

Credit: 6998346019

RAC: 2

Thanks GARY, Very

28 Apr 2023 20:26:27 UTC

Message 211643

(moderation:

)

Thanks GARY, Very helpful. That's a report line I was ignorant of. I'll check it out.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119310292980

RAC: 25575811

Pokey wrote:... That's a

29 Apr 2023 10:48:03 UTC

Message 211672 in response to message 211643

(moderation:

)

Pokey wrote:

... That's a report line I was ignorant of.

I presume you're referring to running the grep command.

Unless you're used to delving into the state file (client_state.xml) and are really familiar with how it all works, it's probably not something that the average user would even know about or consider investigating. Your description of some hosts having normal work caches whilst others were basically empty, points to that particular parameter as a possible culprit. BOINC is designed to gradually fix this on its own. Since I noticed you were running Linux, I thought I'd suggest an easy way to check.

BOINC will raise the value if your computer spends more time crunching. It is possible to correct a very low value immediately, simply by stopping BOINC and editing the value in the state file to read 1.000000. I'm NOT recommending this unless you are very familiar with editing xml files using a plain text editor.

I have a lot of hosts and I shut down a bunch of them during the hot summer months. When I restart them, I always immediately correct the on_frac value so they can immediately get full work caches. Otherwise, after being shut down for a couple months, they would have values very close to zero and it would take days/weeks to get up to the desired amount of work on hand.

Cheers,
Gary.

Issue with cache on multi GPU system

Forums › Cruncher's Corner

Check that your project

Thanks Harri, My cache

Pokey wrote: Two of my multi

Mikey, no.No cpu

Pokey wrote:... I keep going

Thanks GARY, Very

Pokey wrote:... That's a

Comment viewing options

Forums › Cruncher's Corner