Deleting old data manually?

Alex Vincent

Joined: 19 Feb 05

Posts: 10

Credit: 38757210

RAC: 0

1 Jan 2021 23:52:09 UTC

Topic 224380

(moderation:

)

I'm a bit puzzled. A few weeks ago when I bumped up the local file storage from 4GB to 10GB, BOINC was happy and only downloaded a bit more.

Now disk usage has crept up to 6GB. I suspect there's some cruft (specifically, old job data) that's built up and isn't being properly cleaned.

Where would the job data files be stored, and what could be candidates for manual cleanup?

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119623621041

RAC: 24938195

Alex Vincent wrote:Where

2 Jan 2021 0:22:05 UTC

Message 182064

(moderation:

)

Alex Vincent wrote:

Where would the job data files be stored, and what could be candidates for manual cleanup?

If you're participating in a gravitational wave (GW) search, either CPU or GPU, you do need to store quite a lot of large data files for an extended period. When the files are truly finished with (no possibility of further tasks that might need them) the project (automatically) will issue a delete request that your BOINC client will act on.

Einstein is probably the only project that really needs to use 'locality scheduling'. In a nutshell, you get tasks for the data you already have wherever possible. There are potentially many thousands of tasks than can use a given 'set' of large data files. The scheduler would like you to keep what you have until all those tasks (and the many resends for failed tasks) are completely finished. If people prematurely delete the files, the server suffers from having to resend them, either to you or to someone else in order to get the leftover tasks finished.

Unfortunately, this means you need to allow BOINC to use rather more disk space than you might expect - maybe as much as 15-20GB. On my hosts that do GW, I allocate 50GB and I've never got close to that. Allocating a large amount like that doesn't prevent you from using that space for other purposes until BOINC needs it. If you fill it up with other stuff, all that happens is a note in your event log to tell you that BOINC can't download further work since there's no more space available. You can then decide what to do.

If you have finished supporting the GW searches and don't intend to download any further GW tasks, just 'reset the project' in BOINC Manager to remove all such files, when all the work has been returned. If you do that prematurely, the scheduler will just send you a big bunch of these files so the reduction in disk space used will be quite temporary at the expense of all that extra traffic from the server.

Cheers,
Gary.

GWGeorge007

Joined: 8 Jan 18

Posts: 3197

Credit: 5216006723

RAC: 4343391

Gary Roberts wrote: If

2 Jan 2021 1:23:47 UTC

Message 182066 in response to message 182064

(moderation:

)

Gary Roberts wrote:

If you're participating in a gravitational wave (GW) search, either CPU or GPU, you do need to store quite a lot of large data files for an extended period. When the files are truly finished with (no possibility of further tasks that might need them) the project (automatically) will issue a delete request that your BOINC client will act on.

Einstein is probably the only project that really needs to use 'locality scheduling'. In a nutshell, you get tasks for the data you already have wherever possible. There are potentially many thousands of tasks than can use a given 'set' of large data files. The scheduler would like you to keep what you have until all those tasks (and the many resends for failed tasks) are completely finished. If people prematurely delete the files, the server suffers from having to resend them, either to you or to someone else in order to get the leftover tasks finished.

Unfortunately, this means you need to allow BOINC to use rather more disk space than you might expect - maybe as much as 15-20GB. On my hosts that do GW, I allocate 50GB and I've never got close to that. Allocating a large amount like that doesn't prevent you from using that space for other purposes until BOINC needs it. If you fill it up with other stuff, all that happens is a note in your event log to tell you that BOINC can't download further work since there's no more space available. You can then decide what to do.

If you have finished supporting the GW searches and don't intend to download any further GW tasks, just 'reset the project' in BOINC Manager to remove all such files, when all the work has been returned. If you do that prematurely, the scheduler will just send you a big bunch of these files so the reduction in disk space used will be quite temporary at the expense of all that extra traffic from the server.

Hi Gary,

I presume that I have a similar problem with E@H as does Alex Vincent. I am receiving too many 'ERRORS' for one of my applications on both of my machines.

Computer #12851564: Application Gravitational Wave search O2 Multi-Directional v2.08 () windows_x86_64

Computer #12843281: Application Gravitational Wave search O2 Multi-Directional v2.08 () x86_64-pc-linux-gnu

If I understand your response to Alex, I would be best served by leaving the E@H files alone; i.e. DO NOT manually remove them. Is that correct?

I have lowered my expectations for getting 'applications' by un-checking the non-GPU application on the website for Gravitational Wave search O2 Multi-Directional, which is what my 3950X machine #12851564 is using.

I have also (I hope) lessened my use for ALL applications for my i7-990X machine #12843281 by changing my BOINC Manager preferences locally from 0.5 days of work and 0.25 additional days of work to 0.1 and 0.0.

Am I doing this correctly? Or did I just screw up royally?

George

Proud member of the Old Farts Association

MarkJ

Joined: 28 Feb 08

Posts: 437

Credit: 139002861

RAC: 0

George wrote:I presume that

2 Jan 2021 2:10:53 UTC

Message 182067 in response to message 182066

(moderation:

)

George wrote:

I presume that I have a similar problem with E@H as does Alex Vincent. I am receiving too many 'ERRORS' for one of my applications on both of my machines.

Computer #12851564: Application Gravitational Wave search O2 Multi-Directional v2.08 () windows_x86_64

All (1308)

In progress (808)

Pending (4)

Valid (69)

Invalid (0)

Error (427)

Computer #12843281: Application Gravitational Wave search O2 Multi-Directional v2.08 () x86_64-pc-linux-gnu

All (809)

In progress (331)

Pending (0)

Valid (22)

Invalid (6)

Error (450)

Your problem isn’t disk space, but having too much work on board. The machine can’t complete it in time. You are running multiple apps, including GPU and the DCF (duration correction factor) is swinging wildly from one app to another. The DCF is used to adjust the time estimate for each task. Unfortunately the BOINC client only has one DCF per project, so running mixed apps means it jumps around and doesn’t really know how long things will take, this is especially true with mixed CPU app and a GPU app time estimates.

If you want to run the different searches you’ll need to reduce your cache to a really low value like 0.01 days and zero for additional days. If you are happy to concentrate on a single search you could de-select all the other apps in the project preferences and that would allow the DCF to settle and then the computers could work out how long tasks really take.

BOINC blog

Alex Vincent

Joined: 19 Feb 05

Posts: 10

Credit: 38757210

RAC: 0

To be clear, I'm not seeing

2 Jan 2021 20:32:22 UTC

Message 182085

(moderation:

)

To be clear, I'm not seeing errors. I'm just concerned because upping my limit from 4GB to 10GB seemed to be enough to do 32 GW jobs simultaneously using just a little more than 4GB... and now it takes 6GB, about a month later.

I understand the message to not mess with what's under the hood. It just seems odd that the disk space consumption, for the same number of jobs taking roughly the same amount of time, has gone up by 50%.

MarkJ

Joined: 28 Feb 08

Posts: 437

Credit: 139002861

RAC: 0

George has a different

2 Jan 2021 21:29:52 UTC

Message 182087

(moderation:

)

Resetting the project deletes all the files. It would then download the data files it needs to process work. The frequencies being worked on change over time so it will download more and more data files as we work through the different frequencies. Resetting the project provides a temporary solution, but a better one is to allow BOINC to use more disk space if possible.

You can see the frequency as part of the work unit name. For example I have:

Starting task h1_0455.35_O2C02Cl4In0__O2MDFS2_Spotlight_455.80Hz_1450_2

Which means I am processing the 455.35Hz frequency. My next work request might want to do a different frequency so it has to download another set of data files (unless its already got them) which will stay on disk until either I reset the project or the project tell me they are no longer required.

BOINC blog

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119623621041

RAC: 24938195

Alex Vincent wrote:To be

2 Jan 2021 23:22:02 UTC

Message 182091 in response to message 182085

(moderation:

)

Alex Vincent wrote:

To be clear, I'm not seeing errors.

Of course, and apart from George, other responders didn't think you were either.

Alex Vincent wrote:

I'm just concerned because upping my limit from 4GB to 10GB seemed to be enough to do 32 GW jobs simultaneously using just a little more than 4GB... and now it takes 6GB, about a month later.

Tasks (or jobs, WUs, or whatever you call them) are NOT data. If things are going well, you could get hundreds of new "jobs" without a single byte of extra data. That's what 'locality scheduling' tries to do. If you want to understand locality scheduling, google "locality scheduling boinc". It's been in use at this project since day one and a *lot* was written about it in the early days.

Alex Vincent wrote:

It just seems odd that the disk space consumption, for the same number of jobs taking roughly the same amount of time, has gone up by 50%.

A 'job' is just a collection of different parameters that get fed to the science app to 'tell' the app how to crunch the data - the same data, if the frequency term in the task name doesn't change. You never see a 'job' downloading. The various parameters are part of the scheduler response, a quite small file each time there is a request for work. It's called 'sched_reply_einstein.phys.uwm.edu.xml' and sits in your BOINC data directory. You get a new one each time the scheduler responds to a request.

Even if a lot of new jobs were requested, the reply file is so small in comparison to the data files that you wouldn't notice. You only notice if the scheduler gives you tasks for a different frequency. A single task for a different frequency could result in the downloading of a whole batch of new data files. This would be very obvious by all the 'downloading' entries in the event log as well as the extra disk space being consumed.

However, that is not the main reason why I'm responding. I decided to take a look at your GW tasks list. I notice your CPU is a Ryzen 9 3950X and that tasks (returned on Jan 1) seem to be taking around 260-280Ksecs in both CPU time and elapsed time. Something must be wrong. By comparison, George (who posted earlier) has a machine with the same 3950X CPU and his tasks are taking around 50Ksecs - that's 5 times faster.

Maybe you should investigate why your machine is so slow??

Cheers,
Gary.

Alex Vincent

Joined: 19 Feb 05

Posts: 10

Credit: 38757210

RAC: 0

Gary Roberts wrote:However,

2 Jan 2021 23:55:19 UTC

Message 182093 in response to message 182091

(moderation:

)

Gary Roberts wrote:

However, that is not the main reason why I'm responding. I decided to take a look at your GW tasks list. I notice your CPU is a Ryzen 9 3950X and that tasks (returned on Jan 1) seem to be taking around 260-280Ksecs in both CPU time and elapsed time. Something must be wrong. By comparison, George (who posted earlier) has a machine with the same 3950X CPU and his tasks are taking around 50Ksecs - that's 5 times faster.

Maybe you should investigate why your machine is so slow??

If I knew how to do that investigation, I would. This machine is about six months old with 32GB RAM and a NVidia GT 710 graphics card as well.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119623621041

RAC: 24938195

Alex Vincent wrote:If I knew

3 Jan 2021 1:19:43 UTC

Message 182094 in response to message 182093

(moderation:

)

Alex Vincent wrote:

If I knew how to do that investigation, I would. This machine is about six months old with 32GB RAM and a NVidia GT 710 graphics card as well.

The GPU is irrelevant since you are only running CPU tasks. That GPU is very low end and couldn't be used for crunching anyway.

The machine would be under warranty. If possible, you should take it to whoever sold it to you and ask them why it's performing so poorly. If that's not possible, give it to a local (and reputable) tech for advice. Someone who knows what they are doing should be able to find the problem and fix it in a flash. It almost sounds like extreme thermal throttling due to overheating but difficult to imagine how it could be that bad in 6 months. Are you sure the CPU fan is actually spinning?

Cheers,
Gary.

Eugene Stemple

Joined: 9 Feb 11

Posts: 67

Credit: 411486908

RAC: 454340

@Alex. Regarding the long

3 Jan 2021 1:43:01 UTC

Message 182095

(moderation:

)

@Alex. Regarding the long run times, mentioned by Gary. Browsing in a (randomly selected) stderr.txt log from one of your recent tasks I am suspicious of the number of "restarting from checkpoint" entries. Take a look, in BOINC manager, at ->Options ->Computing Preferences ->Disk and memory in the "Memory" part. There is an option for "Leave non-GPU tasks in memory while suspended". If that option is NOT check-marked, i.e. not enabled, your CPU tasks may get "discarded" from memory whenever BOINC thinks the task should be suspended. No lasting harm done, of course, as the task will resume properly when the reason for the suspension goes away. BUT, it will resume from the last checkpoint which might be many hundreds of seconds in the past and that amount of CPU work is lost and will be re-done, hence inflating the run-time.

Where are the controls for suspending tasks by BOINC? ->Options ->Computing Preferences ->Computing. Where you will see a whole section on "When to suspend." I forget what the default settings are, but possibly you have chosen (intentionally or not) "Suspend when computer is in use" and depending on the other parameters in that section your E@H CPU tasks could end up suspended for something as innocous as moving the mouse.

Have a happy new year anyway!

Gene;

MAGIC Quantum M...

Joined: 18 Jan 05

Posts: 1963

Credit: 1524952534

RAC: 1788362

What is the size of your

3 Jan 2021 3:41:18 UTC

Message 182106

(moderation:

)

What is the size of your disk?

Like most of us with several hosts we have all different sizes from small to huge but I have all of mine set at

(The biggest problem we have here is wingmen who load over 1000 wu's at a time and can't ever finish them on time and I wish we had a 500 wu limit per host.)

Deleting old data manually?

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports