The "cleanup" for the S5GC1HF run

Christoph

Joined: 25 Aug 05

Posts: 41

Credit: 5954206

RAC: 0

Ok, up to now I was unable to

18 Apr 2011 19:38:16 UTC

Message 104886

(moderation:

)

Ok, up to now I was unable to retrieve other frequencies. First I had a typo in my delete command, so it was ignored and removed. Work for the existing frequency was downloaded. Now I think I got it right, but propably I have enough work on board. That will change when in four hours one MW task completes. 30 min to go for one Einstein, but I don't now if that will be enough for a work request.

Oh, and I re-activated my old AMD dual core Laptop. It is currently running SETI Beta. Because of the whining vent I usually use it only for testing. It got 1477.65 - 1478.20.

Greetings, Christoph

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117646462800

RAC: 35207042

RE: ... First I had a typo

19 Apr 2011 6:25:59 UTC

Message 104887 in response to message 104886

(moderation:

)

Quote:

... First I had a typo in my delete command, so it was ignored and removed.

I'm not sure what you mean by this. What exactly are you using to edit your state file? You need a plain text editor, preferably one that can handle regular expressions. You will find it easiest to construct a small script to make the whole process as automatic as possible. After you have been going for a while and you accumulate a wide range of blocks in your state file, it would be far too onerous to edit them out one at a time manually.

I'm using Linux for my hosts and the standard unix tools make it very easy to maintain the state file. I've never written any Windows batch files but if I was forced too, this is what I would do. Someone who knows how to write scripts for Windows would be able to do a much better job. What is below is just off the top of my head.

For me (with my limitations) I'd grab Notepad2 for any manual editing I needed to do. For script work, I'd use grep.exe, sed.exe, etc, so I could pretend I was using Unix :-). Google the names - I'm sure they exist. Then when I needed to remove all tags in the state file, I'd just run this (uncommented) script:-

grep.exe marked client_state.xml
pause
..\BOINC\boinccmd --quit
pause
ren client_state.xml client_state.bak
sed.exe /^.*marked.*/d client_state.bak > client_state.xml
del client_state.bak

Firstly, (assuming the script is being run from the BOINC_Data directory), grep is run to see if there are actually any tags in the file. The pause will allow you to bail out if there aren't.

Secondly, if you proceed from this point, boinccmd is used to shut boinc down. Be aware that it can take a few seconds for boinc to terminate and write out the state file. The second pause is simply to allow you to wait a few seconds for this to happen. I don't know how you could do this properly in Windows but hopefully there would be a way.

Thirdly, make a backup of the state file and run it through a stream editor (sed.exe) to search for and remove all lines that contain the given regular expression (the stuff between the slashes). The 'd' is the delete command. Write the output as a new state file. If you are happy that this will all work, you can delete the backup state file. If you wanted to be cautious you could omit that step until you had restarted boinc and all was well.

Finally, depending on how you normally launch boinc, you could include that at the end of the script to make the whole thing pretty automatic.

Quote:

Work for the existing frequency was downloaded. Now I think I got it right, but propably I have enough work on board. That will change when in four hours one MW task completes. 30 min to go for one Einstein, but I don't now if that will be enough for a work request.

You will always get work for the existing frequency, or perhaps one above, if the existing is depleted (temporarily or otherwise), and if there are no current resends available. Your chances of getting a resend therefore very much depend on having as wide a range as possible of old frequencies in your project directory and documented by blocks in your state file. If you can double the old frequencies you have saved you will double your chances of getting a resend when you ask. You must take control of when you ask (using NNT). If you wait say around 4-8 hours before allowing a work request, there is more chance that some may have accumulated in the interim.

However the two most important things you should do are these:-

1. Use the technique I listed in my previous response to seed your state file with only the highest frequency you currently have. You can achieve this by temporarily turning off all the lower ones using manually inserted tags - just copy and paste an existing one in the appropriate places. This will get you a substantial number of new contiguous LIGO files and blocks.

2. Progressively expand the size of your work cache. You need one that's several times larger than your current one. The main reason for this is that when you start getting resends for older data, you need those tasks to stick around for several days so that the LIGO files wont be deleted if/when they become . If it's 6 or 8 days before these tasks get to the top of the queue, there is much more opportunity to get more of them and so lessen the risk of having to manually insert the LIGO data into the project directory and, more importantly, the blocks back into your state file. When properly set up with enough resends in your cache to properly protect your full range of data, it's basically a 10 second operation to run a script to remove (potentially hundreds of) tags immediately prior to doing a work fetch run.

Before you attempt to make a big increase in your work cache, take the opportunity as described above to expand your range of saved frequencies. Modifying your state file with temporary tags is a bit tedious so you should take the opportunity to do it easily with your second host - see below.

Quote:

Oh, and I re-activated my old AMD dual core Laptop. It is currently running SETI Beta. Because of the whining vent I usually use it only for testing. It got 1477.65 - 1478.20.

You really should take the opportunity of this second host to get a wider range of data for your first host's current frequency and not a totally different frequency. Here's how to do it.

* Set NNT on the AMD and let the current tasks complete and be reported.
* Stop BOINC and open the state file for editing with a good text editor.
* Remove every single block for both LIGO data and skygrid files and remove the files themselves from the project directory.
* Seed the state file with 5 blocks (4 LIGO files and 1 skygrid file) for the topmost frequency you currently have on your primary host and then save and close.
* Place copies of the 4 LIGO files and 1 skygrid file (for this same frequency) in your project directory.
* Restart BOINC, set your cache settings in the manager to 0.0/0.1 and then unset NNT.

You should get a single task for the particular frequency you have chosen and a total of 44 LIGO files and blocks for the 11 frequency bands immediately above your chosen frequency. Your intention is to harvest all of these for use on your primary host. You would be wasting the opportunity to get the maximum benefit if you didn't start with the topmost frequency that you currently had. Unless you wanted to, there is no need to keep running the noisy AMD once it had finished that single task that it got. Just set NNT to prevent more tasks.

Once you have seeded the project directory on your primary host with the new LIGO data you can make a step increase in your cache there. There is no need to seed the state file with the new blocks you have harvested - just save them somewhere for future use. Increase the "extra days" setting by say 0.5 - 0.8 days. That should get you a number of tasks that will probably cover several frequency bands. If you have other projects that you don't want to disturb, make sure you set NNT for them until you eventually revert to your usual cache size. Your life will be more difficult and you'll need to think carefully about each move if you don't want to disturb other projects. My comments from here on will assume that you don't have the complications of multiple active projects

If you got say 10 new E@H tasks covering say 4 higher frequency bands, BOINC will notice that the 16 new LIGO files are already in place and will automatically insert the blocks in your state file and will issue "download skipped" messages for each LIGO file you already have. The next time around you can make a bigger cache size increment, say 1.0 - 1.5 extra days which will give a larger number of tasks and the opportunity to get more blocks into your state file. Eventually you could even start drawing completely new frequencies above what you had harvested on the AMD. When you have a nice total cache (say 6 - 8 days) you can set NNT and run a script to get rid of all delete tags you might have accumulated along the way. That only takes a few seconds to run and afterwards you should glance through the state file and the project directory to make sure there are no "holes" in your continuous data range. If there are you should fill them from what you've saved.

You should wait a decent time as mentioned previously, and then you can unset NNT. If you had waited 8 hours, BOINC will try to get 0.33 days work unless you modify your cache settings at the same time. The search for new tasks on the server will start at your lowest advertised frequency and will move progressively upwards through the bands collecting any available resends. If there were plenty, your entire work request could be filled with resends. If there were none, you will get only primary tasks. In either case you will get plenty of "server request to delete file ..." messages. So you then just set NNT, run your script to remove the tags, fill any gaps, and wait another 8 hours or so and repeat the process.

I realise that this is all quite complicated. There are further subtleties that I haven't even touched on so you have to be prepared to work things out as you go. If you have any specific questions, please ask.

Cheers,
Gary.

Christoph

Joined: 25 Aug 05

Posts: 41

Credit: 5954206

RAC: 0

Well, my typing error was

19 Apr 2011 18:40:41 UTC

Message 104888

(moderation:

)

Well, my typing error was instead of .

Now I have additional skygrid 1440 and 1500. On this machine the cache is now 6 days.

On AMD I will increase it as you described it. It got 1480 frequency. After collecting some bands I will copy them around.

After some file shuffling I should have the necessary space on the Laptop. BOINC data is on D: where I have some 6 gig free, but C is nearly full.

Greetings, Christoph

P.S.: I appologize for my mostly short answers. Yesterday I had a free day, otherwise I'm working 'normal' hours. In the moment I'm mostly quiet done after work.

I'm serious enough to copy all your posts in a text file so that I can reduce it to a shorter step by step instruction, which will be enough at least for persons, which did read the full story and need only a small help to remember.

About scripts, I may ask a friend of my whom knows programming and Linux and Windows(I think). I did myself start with Java, but I'm still at the very beginning of it.

Greetings, Christoph

Christoph

Joined: 25 Aug 05

Posts: 41

Credit: 5954206

RAC: 0

I have catched the freq.

20 Apr 2011 4:19:57 UTC

Message 104889

(moderation:

)

I have catched the freq. 1490.60 - 1491.20. Since you wrote that you have the complete 1490 band, could you send me the respective file_infos? Than I have that complete and can use my AMD to supplement the remaining patchwork of frequencies.

Greetings, Christoph

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117646462800

RAC: 35207042

RE: I have catched the

21 Apr 2011 3:14:50 UTC

Message 104890 in response to message 104889

(moderation:

)

Quote:

I have catched the freq. 1490.60 - 1491.20. Since you wrote that you have the complete 1490 band, could you send me the respective file_infos? Than I have that complete and can use my AMD to supplement the remaining patchwork of frequencies.

Well, I mentioned that I have up to 1494+. I've checked exactly what I have and it's 1491.80 to 1495.70 and some more in the 1497.xx range. It's quite easy to grow the range upwards - but not downwards. As I've started transferring hosts to 149x.xx, I'll make a point of filling in from 1495.75 up to 1497.xx fairly shortly.

If you send me a PM with an email address to use, I'll attach all the ones I already have to a message to that address. When you get that email, I'd appreciate it if you would respond and send me a copy of the 1490.60 - 1491.20 set that you have. I'll grow that range up to 1491.75 and then send the extra bits to you.

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117646462800

RAC: 35207042

Latest update on the vacuum

21 Apr 2011 8:53:10 UTC

Message 104891

(moderation:

)

Latest update on the vacuum cleaners.

As previously discussed, I have 4 quads working the 1430Hz - 1440Hz frequency range and existing on a diet of essentially resends only. They have 10 day work caches and I've been able to keep them full - well up to now at least. There are certainly signs that resends for this frequency range are getting more difficult to find now but, on past experience, that will change when the new run kicks in and many hosts currently sucking up everything in sight get shifted away from the old run. At the moment I would guess the scheduler is sending resends plus the necessary LIGO data to many hosts needing a frequency change. When a host has transitioned to the new run, it will get tasks there and not be given old run resends unless it continues to "advertise" old run data availability.

I have quite a number of other hosts that are not specifically looking for resends. These hosts have been chewing on the remaining primary tasks at the very upper end of this same frequency range (which now extends through to 1440.60Hz). There are extremely few primary tasks left (for that range) and I was interested to note that there are no primary tasks (seemingly) just above 1439.95Hz and into the lower part of the 1440.xx range. A task at 1439.95 requires data from 1439.95 to 1440.55. A day ago I wasn't having any trouble getting such tasks, including 1440.00 tasks (which is why I have data up to 1440.60. This morning, the available sequence numbers for 1439.95 tasks were around _15 and diminishing rapidly. The remaining few tasks would have disappeared in a very short time after that.

For two machines, in trying to get tasks of 1440.00 and above, there were server delete requests for all bands from 1440.00 to my upper limit (1440.60). These machines are now pulling new tasks in the 149x.xx range, a planned change where there are still available tasks (sequence#s around _500 or thereabouts). So I'll need to shift all the rest of my hosts in the next few days - well before their caches run dry. A good job for the Easter break.

As previously described, it's quite simple to set up the transition. Just drop the LIGO files into the project directory and drop (at least a subset of) the blocks into the state file. The thing that takes the most time is a couple of controlled work request cycles to top up the work cache whilst observing that the correct frequency range is being used for the new tasks. It's all going smoothly so far.

In case anybody is wondering why I want all hosts feeding in the same frequency range, it's to do with the bandwidth used in downloading all the different LIGO data for randomly selected frequency ranges. A few months ago, I was using 250+ GB a month. Since Christmas, I've been using less than 30GB per month.

The volume of downloads dramatically increases as the end of a run approaches (eg like right now). This is because the scheduler has to shift clients to different frequency bands much more frequently as the small number of remaining primary tasks rapidly vanishes. As explained above, when the bulk of hosts has transitioned to the new run, those hosts specifically set up to advertise a specific old frequency range will have the field to themselves. I've done this at the end of past runs and kept a few hosts going on old data up to a month or more after a run transition occurs. I expect to be able to do the same again this time. All I have to do is keep enough work on hand to last a few days into the new run. At that time, the bulk of my hosts will be allowed to transition to the new run and the number left will depend on resend availability. That number may even be more than the current 4 doing resends.

Cheers,
Gary.

Christoph

Joined: 25 Aug 05

Posts: 41

Credit: 5954206

RAC: 0

RE: I'll grow that range up

21 Apr 2011 23:21:13 UTC

Message 104892 in response to message 104890

(moderation:

)

Quote:

I'll grow that range up to 1491.75 and then send the extra bits to you.

I'm now up to 1491.85. So, I will try to supplement my lower frequencies now.

Greetings, Christoph

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117646462800

RAC: 35207042

Here's a short one for a

23 Apr 2011 9:49:50 UTC

Message 104893

(moderation:

)

Here's a short one for a change :-)

The 1430Hz to 1440Hz range is completely devoid of primary tasks (and has been that way for a day or two now) but continues to supply lots of resends. The 4 quads working in this range continue to be able to maintain full (or overfull) caches essentially on resends alone. I have set up the 1490.xxHz frequency set on these 4 machines so that if/when they ask for work and a 1430-1440 resend is not available, they will get new tasks from the 1490.xxHz and above range. i've tested this and it works as expected.

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117646462800

RAC: 35207042

I've had commitments over

28 Apr 2011 9:51:41 UTC

Message 104894

(moderation:

)

I've had commitments over Easter so I've not had time to do much reading or posting. However, my 4 vacuum cleaners continue to maintain full caches on a resend only diet. I've also shifted the bulk of the rest of the fleet into the 1480 - 1490Hz range or the 1490 - 1500Hz range. There are good numbers of primary tasks available in these two ranges.

The 4 VCs have the range from 1430Hz to about 1442Hz covered. All 4 have full 10 day caches (resends only) as whatever primary tasks they had initially are fully completed and returned. I estimate they have taken more than 1000 resends out of the system already. I don't think it's possible for any other host to still have primary tasks for the bulk of the 1430 - 1440Hz range. So the only way a random host would be able to get a task in this range would be for the scheduler to choose it deliberately as a resend recipient and to also hand it the ~180Mb data payload. So, my reason for going to the trouble to do this is something like (some number less than 1000)X180 MB bandwidth that has been saved. That unknowable number may well be of the order of 200 - 500 so far. We aren't finished by a long short. Things are only just getting started.

The next stage in this game will be a frequency shifting frenzy as more and more hosts run out of work for whatever range they are currently working in. At some point (hopefully a bit before all primary tasks are exhausted) the new run will be launched and the bulk of hosts will shift to it rather quickly. The shift will be initiated by a request for work which the scheduler cannot fill for your current frequency range. Instead of being shifted to a different 'old run' frequency, the host will be sent a task for the new run. Once shifted to the new run, there is little likelihood of getting a lot of old run tasks unless you take deliberate action. That action is to maintain and replace when necessary the old run LIGO data and to keep removing the tags that the server applies to the blocks in the state file. You will get old run tasks (plus the data payload) if the scheduler has too many resends building up and can't find a suitable host that already has the data (or part of it). I'm hoping to continue making a dent in those resend numbers.

Within a few days of the launch of the new run, most hosts will have transitioned to it. Those hosts that haven't (and which stop themselves from being transitioned by maintaining their old data as 'available') will be able to continue for quite a while on a resend diet. At this time the frequencies most likely to have plenty of resends will be those that have the highest availability of primary tasks right now. On past experience, there will be people who want to join the new action ASAP so they will be likely to abort any old run tasks in order to do so. That should create a bit of a spike in resend availability. At this moment, there are primary tasks available from about 1450 - 1500Hz. The lower frequencies are pretty much exhausted.

Cheers,
Gary.

Christoph

Joined: 25 Aug 05

Posts: 41

Credit: 5954206

RAC: 0

I can confirm that it is hard

28 Apr 2011 11:14:31 UTC

Message 104895 in response to message 104894

(moderation:

)

I can confirm that it is hard to get 1440 tasks I did try it with only the 1440 MHz file reported as available, but got nothing. Only request to delete the file. Which was immedeatly executed, since the files were not in use. Gathering now other ranges.

Greetings, Christoph

The "cleanup" for the S5GC1HF run

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner