No work sent, claims to be short of disk space... 147 Gb available

Glenn Hawley, RASC Calgary
Glenn Hawley, R...
Joined: 6 Mar 05
Posts: 48
Credit: 893110145
RAC: 349302
Topic 224274

This is from the scheduler log message. At present I have GPU work running but no CPU work (other than the CPU effort to assist the GPU)

The disk has 147 Gb free at present, so the error message confuses me greatly. My "preferences" have "Use no more than 5 Gb Disk"

020-12-22 03:47:35.7871 [PID=9509 ] [debug] [HOST#12796786] MSG(high) No work sent
2020-12-22 03:47:35.7871 [PID=9509 ] [debug] [HOST#12796786] MSG(high) see scheduler log messages on https://einsteinathome.org/host/12796786/log
2020-12-22 03:47:35.7871 [PID=9509 ] [debug] [HOST#12796786] MSG(high) Gamma-ray pulsar binary search #1 on GPUs needs 0.76MB more disk space. You currently have 18.32 MB available and it needs 19.07 MB.
2020-12-22 03:47:35.7871 [PID=9509 ] [debug] [HOST#12796786] MSG(high) Gravitational Wave search O2 Multi-Directional needs 81.68MB more disk space. You currently have 18.32 MB available and it needs 100.00 MB.
2020-12-22 03:47:35.7871 [PID=9509 ] [debug] [HOST#12796786] MSG(high) Gamma-ray pulsar search #5 needs 0.76MB more disk space. You currently have 18.32 MB available and it needs 19.07 MB.
2020-12-22 03:47:35.7872 [PID=9509 ] Sending reply to [HOST#12796786]: 0 results, delay req 60.00
2020-12-22 03:47:35.7872 [PID=9509 ] Scheduler ran 0.080 seconds


Glenn Hawley, RASC Calgary
Glenn Hawley, R...
Joined: 6 Mar 05
Posts: 48
Credit: 893110145
RAC: 349302

I tried a 'hard reboot' of my

I tried a 'hard reboot' of my computer, but the problem persists.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3958
Credit: 46997822642
RAC: 64879816

How much is available to

How much is available to BOINC? Check your compute preferences in BOINC Manager. BOINC can only use the space allocated to it, regardless of how much is actually free on the drive. 

_________________________________________________________________________

Eugene Stemple
Eugene Stemple
Joined: 9 Feb 11
Posts: 67
Credit: 376631795
RAC: 578207

Glenn,  The 147 GB is not

Glenn,  The 147 GB is not what BOINC sees; it is going by your restriction of "use no more than 5 GB."  That is plenty, of course, for a single work unit process but E@H likes to keep "a lot" of data file downloads to be used over and over by subsequent work units without having to download them each time.  E@H is presently using 18.61 GB in my system.  Almost surely you have accumulated 4.9 GB of such data files and there's no room for any more (within the 5 GB limit you've set).  Boinc Manager has a "Disk" usage tab where you can see how much disk space is "in use" and how much is free.  [In Use is shown for each boinc project you may have active; Free space is the total for BOINC.]  As you probably suspect, The 5 GB limit can be increased.  Boinc Manager -> Options -> Computing Preferences -> Disk & Memory.  I started with a relatively low number and monitored the E@H usage while increasing the BOINC allocation whenever usage was approaching the limit, i.e. "free, available to BOINC" declined to less than 1 GB.  I'm up to 22 GB now and have not needed any more for several weeks.  I have other projects, too, but they use a relatively small amount.  If you're running other BOINC projects - and no harm there - you'll need to take their disk space requirements into account also.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225314931
RAC: 1043292

You say you have set a 5 GB

You say you have set a 5 GB limit.  The messages make it clear that is not enough.  You say you have 147 GB free.

Why not simply increase the 5--say to 10 for a start?

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 409
Credit: 10210673455
RAC: 22753288

archae86 wrote: ... Why not

archae86 wrote:

...

Why not simply increase the 5--say to 10 for a start?

... hmmm, because it seems like he wants to know why this would maybe solve the problem!

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3958
Credit: 46997822642
RAC: 64879816

San-Fernando-Valley

San-Fernando-Valley wrote:

archae86 wrote:

...

Why not simply increase the 5--say to 10 for a start?

... hmmm, because it seems like he wants to know why this would maybe solve the problem!

 

it would solve the problem because it's obvious that 5 is not enough. so increase it to 10.

 

also it seems that Einstein likes to hold onto old data files that might or might not be used for future tasks. My Einstein data/project folder contained 2.5GB of data with no active tasks running on the system.

 

So he could try resetting the project on that host (when all tasks are completed and reported and no tasks on the system), which will delete all the content and only re-download what it needs for the tasks that it gets.

_________________________________________________________________________

Glenn Hawley, RASC Calgary
Glenn Hawley, R...
Joined: 6 Mar 05
Posts: 48
Credit: 893110145
RAC: 349302

The 5 Gb limit has been

The 5 Gb limit has been adequate for many many moons now, but when I increased it to 50 Gb, the problem went away completely.

So BOINC must be keeping staggering amounts of data (of some sort) on the disk that it wasn't keeping before.

I can probably dial back from 50 Gb if other uses want more disk, but I suspect I can just leave it there.


Thank you very much.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117675365966
RAC: 35163551

Ian&Steve C. wrote:... it

Ian&Steve C. wrote:
... it seems that Einstein likes to hold onto old data files that might or might not be used for future tasks. My Einstein data/project folder contained 2.5GB of data with no active tasks running on the system.

That's a little unfair since the scheduler can't know whether or not you intend to run future tasks of this type.  Whilst there are quorums outstanding that require this data, the files wont be deleted, since there may be an opportunity to send you more of the same.  Once there are no tasks remaining for a particular set, the client will be advised to delete the data.  Premature deletion restricts the scheduler's choice of suitable hosts to use for further tasks (locality scheduling).

Ian&Steve C. wrote:
So he could try resetting the project on that host (when all tasks are completed and reported and no tasks on the system), which will delete all the content and only re-download what it needs for the tasks that it gets.

Resetting is fine if the volunteer has finished running these GW CPU tasks.  However, with storage space of modern drives being cheap and plentiful, and with the intention to keep running the same type of tasks, resetting is counter-productive.  After a reset, to fill the work cache the scheduler is likely to send fresh tasks for a range of different frequencies.  Each one needs a big number of large data files.  Very quickly, much the same level of disk space use will occur (no real gain) at the large download cost for both the project and the volunteer.

As a classic example, the experience of the OP here is relevant.  Changing the disk space allowance from 5GB to 50GB has allowed a full work fetch to satisfy whatever the cache size was.  The same thing would happen after a reset - a bunch of fresh downloads.

I took a look at the new GW CPU tasks that were downloaded when BOINC had the new 50GB allowance.  When I looked there were 157 new tasks and on the first page of 20 tasks, I saw 18 different frequencies represented, from 176.90Hz right up to 195.25Hz.  Each of those 18 frequencies would have needed probably something like 20 or more large data files.  So instantly, a whole heap of disk space got occupied again.

I don't run CPU tasks at all but the same things happen with GPU tasks and my guess is that there's a couple of reasons why the scheduler chooses such a large range of different frequencies.  Firstly, since the machine no longer has an allocated frequency, it's a golden opportunity to get rid of any overdue resends.  Secondly, the scheduler uses some sort of 'fast but small cache' to deliver tasks for a given frequency rapidly, so a sudden large request could 'exhaust' a lot of these for different frequencies in order to get that big number of tasks 'out the door' as quickly as possible.

These 'fast caches' seem to be able to be 'refilled' quickly - ie. during the 60 sec backoff between scheduler requests - so my experience has been that if I 'top up' with a large number of small requests, the scheduler will keep supplying tasks for just a single frequency - just one group of large data files - thus eliminating the excessive burst of downloading.

I have a script that implements this and I can select the work cache size step to use.  When starting a new host on GW GPU tasks, the script starts with 0.02 days (to get minimal tasks for a single frequency) and every few minutes, the size gets incremented by a selectable value that gets around 3-5 new tasks per request.  I hardly ever get multiple frequencies.  When the target cache size (eg. 1.0 days) is reached, the script quits.  The BOINC client then maintains the set cache size after that, with small numbers of tasks at a time.  After potentially many hundreds of tasks for one frequency, the scheduler may introduce a new one that is 0.05Hz higher which, at most, downloads a couple of new large data files, so no big deal.

I setup some machines to run GW GPU tasks at a multiplicity of 4 a couple of weeks ago.  I just had a look at one that started with tasks named h1_0448.85_O2..... with issue numbers (second last field) around 2600.  It still has a few left to crunch (_16 right down to _0) so it has obviously stayed with the one group of large data files for all that time.  The next tasks in the cache are for 448.90, 448.95, and 449.00Hz.  So this machine has had hardly any data file downloads at all whilst it has completed thousands of tasks.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117675365966
RAC: 35163551

Glenn Hawley, RASC Calgary

Glenn Hawley, RASC Calgary wrote:
So BOINC must be keeping staggering amounts of data (of some sort) on the disk that it wasn't keeping before.

It's the Einstein project (not BOINC) that decides what should be kept.  When data is truly finished with, the project will remove the <sticky> label that it has applied to such files and then BOINC will be free to delete them once all tasks on your computer that depend on them have been completed and returned.

The real plan is to save excessive downloading of files that are quite likely to be needed again and again for future tasks.  There are more (and probably bigger) data files now than for previous searches so you are just seeing the inevitable result of the LIGO detectors gathering larger volumes of higher quality data.  By modern standards, the 5GB limit you had imposed was rather restrictive for the type of analysis that needs to happen.  Compared to repeated downloading and the bandwidth it consumes, disk space is quite cheap.

Glenn Hawley, RASC Calgary wrote:
I can probably dial back from 50 Gb if other uses want more disk, but I suspect I can just leave it there.

Please realise that by setting 50GB you are not locking away any extra disk space and preventing your "other uses" from accessing it.  The only thing that "other uses" can't access is what now has files stored on it.  That might be rather more than was really necessary because you allowed BOINC to request a lot of work in one big hit once you removed the 5GB restriction.  The scheduler (for reasons already listed) seems to have invoked a wide set of frequencies to fill the request.  Eventually that will all be recovered once the entire set of results for all those different frequencies (and all the inevitable resends probably over several weeks) have totally been safely returned and filed away.

In my previous message, I wrote a longish explanation of my impression of how things work, thinking that it might be of use to you and any others wondering why there are so many large data files.

Cheers,
Gary.

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 409
Credit: 10210673455
RAC: 22753288

Ian&Steve C.

Ian&Steve C. wrote:

San-Fernando-Valley wrote:

archae86 wrote:

...

Why not simply increase the 5--say to 10 for a start?

... hmmm, because it seems like he wants to know why this would maybe solve the problem!

 

it would solve the problem because it's obvious that 5 is not enough. so increase it to 10.

 

...

 

Sorry to talk back, but

he wants to know why

not how!

 

Then when he/one understands the "why"

he/one can bother with the how!

 

The obvious is only obvious if the problem is understood.

For experts the obvious is always obvious.

 

Have a nice christmas and stay safe ...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.