Maximum disk usage exceeded

ritterm
ritterm
Joined: 18 Jun 08
Posts: 23
Credit: 46,657,826
RAC: 0
Topic 198067

Any idea why I might be getting "Maximum disk usage exceeded" errors on this host? I've been getting the same errors on MilkyWay@Home but haven't gotten any feedback. I switched to Einstein as a test to see if I had the same problem and it looks like maybe I do.

I've upgraded to the latest driver from AMD and am now concerned I have a hardware problem.

Thanks for any feedback.

MarkR

archae86
archae86
Joined: 6 Dec 05
Posts: 2,890
Credit: 3,548,416,095
RAC: 3,134,332

Maximum disk usage exceeded

Well, the first question would be: what have you got the disk usage limits set to? And the second, how does that compare to actual disk space in use for BOINC?

First:
look at that computer's details on your account page at Einstein, and determine which location (aka venue) it is assigned to default, home, work, or school.

Second:
starting from your account page, go to computing preferences, and look at the disk entries for the location your computer is assigned to.

There are three distinct entries, you could be violating any of them. Two are "use at most" restrictions, one expressed in Gigabytes and another in percentage. The third is a "leave free at least" expressed in Gigabytes.

Lastly:
While the above limits set from your account web page apply unless you have set a local preference on the specific host, if they appear not to be violated check for a local preference.

Go go Boincmgr|Tools|Computing Preferences|disk and memory usage
If you have set any limits here, they take precedence over the ones set using the web site.

Let us know what you find out.

ritterm
ritterm
Joined: 18 Jun 08
Posts: 23
Credit: 46,657,826
RAC: 0

Thanks, archae86. I think

Thanks, archae86. I think everything you asked about and recommended I check follows:

I have local preferences set that are pretty wide-open compared to what I need (I think!). The host has an 1TB HDD with only about 250GB used. Local disk limits are set to:

Use at most -- 150GB (most restrictive)
Leave at least -- 0.1 GB (least restrictive)
Use at most -- 50% of total (less restrictive)

The BOINC manager shows 26 GB is used for BOINC with 124GB available and that Einstein is using less than 300MB.

archae86
archae86
Joined: 6 Dec 05
Posts: 2,890
Credit: 3,548,416,095
RAC: 3,134,332

I think the error is thrown

I think the error is thrown by boinc, not by anything to do with an individual project, so it is probably pointless to check space used specifically by Einstein, for instance.

The interplay between the local limits (which are stated to prevail), and the web site ones is mildly mysterious in detail to me.

Unless you have other hosts on other projects relying on the web site preference values, maybe to rule out that source of trouble it would be wise to make sure all three limits are very unrestrictive for all four locations as set by the Einstein web site (then be sure to hit update on boincmgr with Einstein selected).

I admit this seems unlikely to help. Perhaps someone else will come along who will have another idea.

ritterm
ritterm
Joined: 18 Jun 08
Posts: 23
Credit: 46,657,826
RAC: 0

RE: Unless you have other

Quote:
Unless you have other hosts on other projects relying on the web site preference values, maybe to rule out that source of trouble it would be wise to make sure all three limits are very unrestrictive for all four locations as set by the Einstein web site (then be sure to hit update on boincmgr with Einstein selected)...


I have two other hosts running Einstein with similarly non-restrictive disk limitations and using the same location/preferences as the problem host and they are having no issues at all...

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 1,946
Credit: 295,626,857
RAC: 921,807

Actually, I think ritterm was

Actually, I think ritterm was on the right lines in his post at Milkyway, where he posted the associated with the individual task.

The exit code for EXIT_DISK_LIMIT_EXCEEDED was one of a batch added in April 2012 - the ones which are still causing problems for projects, like this one, which haven't updated their web code to handle the plain-language descriptions for the new codes.

The full changeset includes the replacement of

Error code:
-#define ERR_RSC_LIMIT_EXCEEDED -177 (resource limit exceeded)

with three specific Exit codes:
+#define EXIT_DISK_LIMIT_EXCEEDED 196
+#define EXIT_TIME_LIMIT_EXCEEDED 197
+#define EXIT_MEM_LIMIT_EXCEEDED 198

All of those are to do with the individual resource limits for each task - the time limit, in particular, makes no sense as an overall BOINC preference setting.

So, my next enquiry would be: what's going on in the slot directory while these tasks are running? Any task-related working files would be written there.

ritterm
ritterm
Joined: 18 Jun 08
Posts: 23
Credit: 46,657,826
RAC: 0

I now think I might have a

I now think I might have a GPU hardware problem. Many of the tasks I've checked that errored out for me have been completed by other hosts without a problem. If the tasks I ran had a bad parameter, would the same task work for another host?

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 1,946
Credit: 295,626,857
RAC: 921,807

RE: I now think I might

Quote:
I now think I might have a GPU hardware problem. Many of the tasks I've checked that errored out for me have been completed by other hosts without a problem. If the tasks I ran had a bad parameter, would the same task work for another host?


Depends on the failure mode of the GPU. If it writes very verbose error logs, you might exceed the disk bound, while users with working GPUs might not. Have a look at what it writes into the slot directory while running.

ritterm
ritterm
Joined: 18 Jun 08
Posts: 23
Credit: 46,657,826
RAC: 0

RE: Have a look at what it

Quote:
Have a look at what it writes into the slot directory while running...


I didn't have a chance to test here to see what was going on in the slots my failing tasks were using. However... Following the suggestion of a forum post about a similar problem at another project, I checked all my host's slots directories and found two "stray" VM image files left by one of the VM projects (probably CERN's CMS-dev), each of which was over 5GB. I deleted those files and slots and that seems to have solved my problem.

I'm not sure I understand, though, why those slots presented a problem. Could BOINC have tried to use them thinking they were empty only to find a large file which exceeded the disk limit? If so, did the VM task not clean something up like it should have or is BOINC not managing the slots properly?

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 1,946
Credit: 295,626,857
RAC: 921,807

RE: RE: Have a look at

Quote:
Quote:
Have a look at what it writes into the slot directory while running...

I didn't have a chance to test here to see what was going on in the slots my failing tasks were using. However... Following the suggestion of a forum post about a similar problem at another project, I checked all my host's slots directories and found two "stray" VM image files left by one of the VM projects (probably CERN's CMS-dev), each of which was over 5GB. I deleted those files and slots and that seems to have solved my problem.

I'm not sure I understand, though, why those slots presented a problem. Could BOINC have tried to use them thinking they were empty only to find a large file which exceeded the disk limit? If so, did the VM task not clean something up like it should have or is BOINC not managing the slots properly?


That does sound like the most plausible explanation so far, and the size of 'over 5 GB' matches your report of 'Peak disk usage 5,741.01 MB' in the Milkyway thread.

I think it's BOINC, rather than the project supplying the VM image, which is responsible for cleaning the slot files, but I'm not sure exactly what rules, or how many rules, are supposed to be followed. Choose from:

1) After successful task completion
2) After unsuccessful task completion (crash)
3) At BOINC startup
4) Before new task startup

I have a suspicion that rules (1) and (3) are active, but I'm not sure about the others: that might be one that Jord could run past the developers?

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 1,946
Credit: 295,626,857
RAC: 921,807

I reported this problem to

I reported this problem to the BOINC developers, and got this reply from David Anderson:

Quote:

I looked at this and couldn't immediately see the problem.
The BOINC client deletes everything in a slot directory before using it for a new job.
If a deletion fails (e.g. because a file is in use by another app) it doesn't use
that slot directory.
I verified this by opening some Word docs in slot directories.

Notes:

* There's a "slot_debug" log flag for messages related to slot directories.
Unfortunately it doesn't print messages about failed file deletions; I'll add this.
* The "disk limit exceeded" errors refer to the per-job disk limit, not the user's
disk usage preferences; I'll change the message to clarify this.
* Apps aren't responsible for cleaning out their slot dirs; BOINC does this. It
may be that BOINC is failing to delete VM images because they're still in use by
the VirtualBox executive.

Bottom line: I'll need some more info to debug this.
If anyone is seeing this reproducibly, let me know.
Otherwise we'll release a client with more debugging output to help us investigate.

-- David


So, help needed.

Under what circumstances does the CMS .vdi image get left behind? Is there a difference between successful task completions and abnormal (error) exits?

Can the .vdi be deleted manually? Immediately? Later? After BOINC restart? After reboot?

Does BOINC ever clean it up by itself, say after a client restart?

And anything else you can think of.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.