No more Tasks To Send for O1AS20-100I, data file caching?

Daniels_Parents
Daniels_Parents
Joined: 9 Feb 05
Posts: 101
Credit: 1877689213
RAC: 0
Topic 198629

Work Generator disabled fo several hours ...

I know I am a part of a story that starts long before I can remember and continues long beyond when anyone will remember me [Danny Hillis, Long Now]

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 197172382
RAC: 86192

No more Tasks To Send for O1AS20-100I, data file caching?

Yes, we generated all workunits for O1AS20-100I and with the current progress the run should be finished within the next week. In the meantime you can get FGRPB1 work to keep the CPU busy. Sometime next week we are going to open the O1AS20-100F run for all CPUs again. I don't want to do this on a Friday where I can't monitor the project and deal with any problems over the weekend.

Adam Socki
Adam Socki
Joined: 7 Mar 16
Posts: 26
Credit: 56142933
RAC: 7223

Nice. I started Einstein a

Nice. I started Einstein a couple months back so this will be the first project set that I will see though its completion.

Fun stuff!!

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5878
Credit: 118830362804
RAC: 22441599

RE: Yes, we generated all

Quote:
Yes, we generated all workunits for O1AS20-100I and with the current progress the run should be finished within the next week.


Thanks for the info - I've transitioned my hosts (about 40) that were doing this run back to FGRPB1 now that they can't get new work. Caches were starting to run a bit low :-).

When this run first started, I mentioned that I was going to attempt caching of data files and then deploying locally, rather than having every host download everything from the servers. I never really reported back on how it all went. Well it all went very well and I've ended up with over 4GB of cached files covering everything that might otherwise be downloaded multiple times. I used an NFS share on one of the crunching hosts with all the heavy lifting being done with rsync. I just love the efficiency of that utility. The host maintaining the cache (quite an old machine brought out of retirement) still was able to crunch at pretty much the expected rate despite performing its caching services.

I had to do things in a big hurry but it was a great learning experience. I decided to cache everything, including app versions, skygrids, ephemeris files, etc, in case new versions were released during the life of the run. I needed an appropriate directory structure for the cache, and a means of any particular host always knowing exactly what stuff it would need. I decided to use a very simple 'per host' control file containing (at a minimum) just three lines :-

[pre]idata or fdata -> the name of the top level cache dir containing the data files for each run.
the name used also controlled the top level cache dir for apps, iapps or fapps.

32bit, 64bit, avx -> the name of the cache subdir containing all the appropriate app versions for a particular host.

20-30, 70-80, ... -> the name of the cache subdir containing a particular frequency range for data files.[/pre]
I only used the one range (20-99) for idata but had subdirs for each 10Hz segment for fdata. Hosts doing the 'F' run started with a single range and if the scheduler chose to give them tasks for a different range, that range would be added as an extra line so that both ranges could be automatically kept in sync with the cache from that point onwards.

The various architecture independent fixed files, like skygrids and ephemeris files were all kept together in a particular cache subdir to be accessed by all hosts, independent of the 'I' or 'F' structure. The mere existence of a control file triggered the automatic syncing of these.

The O1AS cache was entirely separate from the cache I'd been using for the various FGRP runs. The control script I use would look for the existence of the above 'per host' control file and so would know which cache a given host was accessing. This has made it quite simple for me to transition these hosts back to doing FGRPB1 - just delete the control file and put the host back in a venue where FGRPB1 is allowed. I made the change last night and today I see the hosts in question have all topped up with FGRPB1 tasks while continuing to finish off the residual O1AS-I tasks.

I have a particular venue where only GW CPU runs are allowed. That venue now contains just the hosts continuing with the F run. When hosts formerly allowed to do I only are allowed to do F, I'll probably put at least some of them back into that venue and restore the control file (appropriately edited). Depends on how woeful they turn out to be when doing F tasks :-).

The main purpose for this message was not to comment on file caching but rather to inquire about what happens when F is completed. I would very much like to know if cached data files are likely to be used again in any future runs or not. I suspect that whatever comes next will be 'different', even if it's just the 'same' data but differently structured. If that is the case, I'll plan on clearing out the existing stuff when F finishes.

I will certainly be looking at caching as much as possible for the next run so any information that might allow me to start re-coding my control script would be much appreciated. To have information about types of files and specific names, perhaps a day or two before launch, would be very helpful. I realise there might be quite a pause between the end of F and the start of what follows so I'm just putting my hand up a bit early :-). Thanks.

Cheers,
Gary.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RE: When this run first

Quote:

When this run first started, I mentioned that I was going to attempt caching of data files and then deploying locally, rather than having every host download everything from the servers. I never really reported back on how it all went. Well it all went very well and I've ended up with over 4GB of cached files covering everything that might otherwise be downloaded multiple times. I used an NFS share on one of the crunching hosts with all the heavy lifting being done with rsync. I just love the efficiency of that utility. The host maintaining the cache (quite an old machine brought out of retirement) still was able to crunch at pretty much the expected rate despite performing its caching services.

I had to do things in a big hurry but it was a great learning experience. I decided to cache everything, including app versions, skygrids, ephemeris files, etc, in case new versions were released during the life of the run.

Interesting approach - kind of replicated smart caching - i'm guessing the advantage is reduce (a lot) the internet bandwidth usage. Yes rsync is cool.

Are you using rsync to pull back the new data to the central share when it is downloaded for the first time to a host?

Are you pushing to each host using rsync the same 4GB to each host or a subset based on the running app?

My approach was to set up a squid proxy on my firewall (*), and point boinc at that, and it caches everything. The hosts get a cached copy if it's there, if not the proxy gets a copy.

I should tweak it a bit to not keep a cached copy for certain files like PM*.bin4 (the BRP6 data files) and templates_LATeah*.dat (FGRPB1) which, i should never see again.

Quote:
To have information about types of files and specific names, perhaps a day or two before launch, would be very helpful.


Or even after launch, or lunch!

edit: to be a little more specific i'd be interested in knowing if the web servers add an expiration time with either the Expires header, or the Cache-Control: max-age directive.

(*) pfSense, is wonderful.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5878
Credit: 118830362804
RAC: 22441599

RE: ... i'm guessing the

Quote:
... i'm guessing the advantage is reduce (a lot) the internet bandwidth usage.


That wasn't the only factor - some time ago, my monthly allowance was a big factor but now it's actually larger than I could physically use. I'm now running the fleet in an area where the speeds are extremely low and the NBN is still a long time in the future. The distance to the local exchange is quite high and all the startups in the area are adding to the congestion. Even at 2.00AM download speeds hardly get above 1-2Mbps. During business hours, it's much less than that. When O1AS started, I measured the time it took for a machine to download everything it needed for the very first task. It was something close to an hour. At that point, caching became a necessity for me. Also, I really wanted to minimise my impact on the download servers because, at times, they really seem to get bogged down. That was the initial impetus even before O1AS started.

Quote:
Are you using rsync to pull back the new data to the central share when it is downloaded for the first time to a host?


Yes, rsync does the job both ways.

Quote:
Are you pushing to each host using rsync the same 4GB to each host or a subset based on the running app?


A subset. The third (and subsequent) line(s) of the control file give the frequency range for the data the host needs. A host only syncs the frequency ranges it needs. For O1AS-F, those ranges are in 10Hz increments. Because O1AS-I was advertised to only be using a limited range of frequencies, I decided to use just a single range for all 'I' data (20-100). That single subdir has 1120 files totaling 2.1GB. Each O1AS-F subdir (20-30, 30-40, etc) has between about 90 to 150 files, depending on how many hosts are using each frequency range and how many different ranges a particular host has migrated through.

Quote:
My approach was to set up a squid proxy ...


Way back, I had thought about using squid but was a bit scared of the learning curve since I'd never used it previously. I had been caching the data files for FGRP runs for some time using a simple script so I just decided to extend it for the GW stuff and to cache everything that might be repeatedly downloaded. When FGRPB1 started having template files, I built in caching of those as well but made it able to be disabled through a command line option. After the first couple of days, with thousands of files and very little reuse, I used the option to disable it. The code is still in the script but doesn't get run.

Cheers,
Gary.

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 197172382
RAC: 86192

RE: The main purpose for

Quote:
The main purpose for this message was not to comment on file caching but rather to inquire about what happens when F is completed. I would very much like to know if cached data files are likely to be used again in any future runs or not. I suspect that whatever comes next will be 'different', even if it's just the 'same' data but differently structured. If that is the case, I'll plan on clearing out the existing stuff when F finishes.


We are currently working on getting the next search planned and organized. Right now we aim to provide a directed search that reuses the data from the current search but will also use more data from frequencies up to 1500 Hz. Most likely there will be a F and an I run in parallel as was the case for O1AS20-100. More will be announced once this is ready.

Quote:
to be a little more specific i'd be interested in knowing if the web servers add an expiration time with either the Expires header, or the Cache-Control: max-age directive.


Our download servers are not adding any extra headers to the data files. The server scheduler determines if a datafile is needed on the client and sends a delete request if it needs space to download another datafile in order to assign a task to it. The client itself always sends a list of datafiles that are available with every work request.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4335
Credit: 252396305
RAC: 33871

RE: I would very much like

Quote:
I would very much like to know if cached data files are likely to be used again in any future runs or not

Highly unlikely. The data of O1 is pretty noisy. We plan to use the best available methods to "clean" the data from artifacts that confuse our search method. The methods available have been improved during this analysis run, we will probably produce new, better cleaned files for the next analysis.

BM

BM

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5878
Credit: 118830362804
RAC: 22441599

Thanks very much for the

Thanks very much for the update. I look forward to the next analysis when it's ready to go.

Cheers,
Gary.

PorkyPies
PorkyPies
Joined: 27 Apr 16
Posts: 199
Credit: 33925192
RAC: 9432

RE: My approach was to set

Quote:
My approach was to set up a squid proxy on my firewall (*), and point boinc at that, and it caches everything. The hosts get a cached copy if it's there, if not the proxy gets a copy.


I have a dedicated machine running squid. It's great at speeding up duplicate downloads whether they are BOINC files or O/S updates and it looks after the caching without me having to intervene.

rbpeake
rbpeake
Joined: 18 Jan 05
Posts: 266
Credit: 1163712797
RAC: 655503

RE: RE: I would very much

Quote:
Quote:
I would very much like to know if cached data files are likely to be used again in any future runs or not

Highly unlikely. The data of O1 is pretty noisy. We plan to use the best available methods to "clean" the data from artifacts that confuse our search method. The methods available have been improved during this analysis run, we will probably produce new, better cleaned files for the next analysis.

BM


Hopefully this project will help clean the data!
https://www.zooniverse.org/projects/zooniverse/gravity-spy

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.