housekeeping, delete project files not needed

Eugene Stemple

Joined: 9 Feb 11

Posts: 67

Credit: 417090735

RAC: 422901

9 Apr 2020 18:01:15 UTC

Topic 221804

(moderation:

)

@Gary R / @ Keith M

It's spring housecleaning time... I have nearly 300 each of l1_* and h1_* work unit components in the project directory. Dating all the way back to January. As far as I can tell, all were completed and reported long ago. It seems like, at times in the past, I have seen Event Log messages to the effect "...file no longer needed and being deleted..." followed by a long list. Typically on a work request cycle but not every time. Apparently my host has missed such an opportunity. Update Project was no help. I am now on track (with NNT) to drain the cache and do a Reset Project. Is there any better way? Would it be a "bad thing" to just manually delete those files if, for example, they're more than 30 days old? (The l1+h1 combinations are ~8 MB total so 300 of them ties up a significant space.)

And while I'm at it... I have some O1 applications hanging around from 2016 and 2017. Like einstein_O1AS20 and einstein_O1Spot1 and einstein_O1MD1 and einstein_O10D1. From the server status, and applications, pages I take it that those apps have served their purpose and are no longer active. I have removed all references to them in the app_config.xml file. Will a Project Reset clear them out? Or, is there a safe way to delete them, and their corresponding "slideshow" files?

Any special precautions before the Project Reset besides waiting for all tasks to complete and report?

Happy to be here, with E@H taking the place of Seti as primary boinc project.

Gene;

MarkJ

Joined: 28 Feb 08

Posts: 437

Credit: 139002861

RAC: 0

The scheduler does send BOINC

9 Apr 2020 21:13:06 UTC

Message 176491

(moderation:

)

The scheduler does send BOINC a request to delete those at some point, assuming it’s for the current searches. I believe it’s done as they complete a frequency band.

The process you have outlined also works. About the only thing to be careful of is the app_config and app_info files and apps specified in app_info. Later BOINC clients will preserve them.

You’ll get the slideshow files back on the next scheduler request, fortunately they are fairly small.

BOINC blog

Keith Myers

Joined: 11 Feb 11

Posts: 5063

Credit: 19366298467

RAC: 7898340

If you just delete the older

9 Apr 2020 21:25:55 UTC

Message 176493

(moderation:

)

If you just delete the older apps like the O1AS20, at the next scheduler connection it will just re-download them because they are still in your client_state.xml file in file references.

The only way to clean up your older parameter sets and older applications is with a project reset.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119973245149

RAC: 26475053

Eugene Stemple wrote:Happy to

10 Apr 2020 0:00:00 UTC

Message 176497

(moderation:

)

Eugene Stemple wrote:

Happy to be here, with E@H taking the place of Seti as primary boinc project.

Hi Gene, I'm sure the Einstein folks are very happy to have you here :-).

The previous replies have confirmed the correct procedure - run your cache down and then reset the project.

The disadvantage of that is that you will throw away the good with the bad. Of course, the project will then send all the 'good' back again so just a bit of unnecessary bandwidth being used - both yours and the project's. My estimation is that the project is struggling a bit at the moment with the influx of a *lot* of extra hungry, bandwidth consuming hosts :-). For that reason, I tend to not worry too much if there is reasonable spare disk space.

In particular, with the GW searches, it's difficult to know when a bunch of large data files are no longer needed. Even if all primary tasks (the ones whose name includes a _0 or _1 extension) have been distributed, there will always be resends (_2 or higher) popping up even months later. One of the key things about locality scheduling is that when the scheduler is deciding whose host will be used to foist the resends on, your host coming along and announcing that it has at least some of the full group of large data files, makes the scheduler's job so much easier and efficient. For that reason, I like to keep *all* large data files until the scheduler decides to issue the delete directive.

Since I have a lot of hosts, sharing a single internet connection, I always try to structure things so as not to clobber my own bandwidth or add any unnecessary load on the servers.

Edit: I just had a look at one of my machines that has been running GW tasks for quite a while. It has around 6,000 large data files of the h1_nnnn.nn and l1_nnnn.nn variety, totaling about 22GB. The machine has a 128GB SSD so no shortage of space at the moment.

h1 files mean data from the Hanford observatory. l1 files mean data from the Livingston observatory. The nnnn.nn is a particular frequency. To process a single task, something like 12-16 of these data files spanning the stated and the nearby frequencies are required.

Locality scheduling means that once you have the full set of these files, there are potentially hundreds to perhaps thousands of individual tasks that could be sent to you without any further data download being needed. Of course, there will be other hosts that have been given the same full set of data so you can never hope to get more than a fraction of all the tasks tied to that data. You are in a fierce competition with other hosts. For some odd reason, that competition just got a whole lot fiercer :-).

Cheers,
Gary.

Eugene Stemple

Joined: 9 Feb 11

Posts: 67

Credit: 417090735

RAC: 422901

It seems I'm on the right

10 Apr 2020 17:03:10 UTC

Message 176509

(moderation:

)

It seems I'm on the right track. I'll continue NNT until the cache is empty. All GPU tasks have been completed; Six FGRP5 (CPU) remain and with 5 concurrent they'll finish today; and six O2MD1 (CPU) also remain and with 2 concurrent they'll finish tomorrow sometime. Then it's -Project Reset- . My app_config.xml file has sections for all the currently active apps to limit max_concurrent and to limit to 1 GPU concurrent; I don't have an app_info.xml file but things seem to be running o.k. without it.

Yeah, there will be some extra bandwidth consumed. But I feel it's a reasonable strategy to do this "once" and be more confident of a long-term hands-off run from a known fresh starting configuration.

I'll post something next week after the Reset, and recovery, have been completed.

Keith Myers

Joined: 11 Feb 11

Posts: 5063

Credit: 19366298467

RAC: 7898340

There should be no issues

10 Apr 2020 19:50:14 UTC

Message 176513 in response to message 176509

(moderation:

)

There should be no issues with the project reset, but you might want to make a copy of the project app_config.xml file and save it somewhere safe OTHER than the BOINC data directory.

That way in case the reset for some reason deletes the file, you can just copy it back to the project directory after the reset. That way you won't have to rewrite it. Make sure you remove any mention of the deprecated apps in the file or you will receive BOINC error message reminders about an app not being found.

Eugene Stemple

Joined: 9 Feb 11

Posts: 67

Credit: 417090735

RAC: 422901

The -project reset- was done

12 Apr 2020 17:12:09 UTC

Message 176553

(moderation:

)

The -project reset- was done at 03:44 UTC 4/12, after the cache was fully drained and reported. All went well. Yes (@Keith) I saved a copy of app_config.xml but the project copy was NOT deleted in the reset process although everything else was deleted, as I was expecting/hoping. Fresh copies of the apps, FGRPB1G, FGRP5, O2MD1, and O2MDF, were downloaded and over the course of the next (approx.) 8 hours there were 260 tasks downloaded - until the work requests returned "cache full." Cache buffer limits are set for 0.4 + 0.1 days. A span of deadlines out to April 26 and none appear to be at risk of time limit failures. Earliest deadline is 4/18 with 22 hours of work - so no problem there.

If anybody is curious, here are observed task run times: (hours:minutes)

FGRPB1G (GPU) 0:20

O2MDF (GPU) 0:25

FGRP5 (CPU) 5:00

O2MD1 (CPU) 12:00

Everything looks to be stable and running properly. A BRP4 application did not download; there is an entry in the app_config.xml for it so maybe a future work request will trigger a refresh, along with tasks for it. Now just sit back and let it run as designed to do. I might need to fine-tune the max_concurrent parameters but that is an easy adjustment if needed.

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3608086701

RAC: 508916

I'm guessing that if the BRP4

13 Apr 2020 11:30:05 UTC

Message 176572

(moderation:

)

I'm guessing that if the BRP4 app didn't initially download it never will. The only active versions of it at present are for the Intel GPU and assorted ARM platforms (eg android or rpi). The search itself has been largely on standby the last few years; with only apps for platforms too slow to run any of the main applications (GW or Fermi) still enabled.

housekeeping, delete project files not needed

Forums › Problems and Bug Reports

The scheduler does send BOINC

If you just delete the older

Eugene Stemple wrote:Happy to

It seems I'm on the right

There should be no issues

The -project reset- was done

I'm guessing that if the BRP4

Comment viewing options

Forums › Problems and Bug Reports