Locality scheduling, not working for me

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 179947733
RAC: 53109

Well, if E@h "plays by the

Well, if E@h "plays by the rules" and not access to files w/o report soft link to it to BOINC client (actually it's quite possible to access files in project directory directly, w/o making soft link to them through BOINC libs)...

I looked into slot and yes, seems it "plays by the rules". 

But (!) there are only 88 files in slot, and 497 files in project dir.

Of course some of them for FGRP (but some in slot not soft links to "support data" also).

At first glance not all downloaded files really used...

To be more sure it's better to have clean (from GW tasks and support files) project dir, then download exactly 1 GW task and monitor its slot through execution. Worth to do later. If project has some misconfiguration here it quite costly for bandwidth...

 

Eugene Stemple
Eugene Stemple
Joined: 9 Feb 11
Posts: 58
Credit: 271810041
RAC: 301965

"Raistmer says: >>I got my

"Raistmer says:

>>I got my first "BOINC will delete" lines in log.... when host ASKED for another task, but NOT FINISHED prev one yet (!!!).  "

Yes, I saw it happen once among the dozen or so download batches I followed that the "BOINC will delete " was received even while the downloads were in progress!   I rationalized it to my satisfaction by assuming:  the server sends my host a list of data files it will send and work units that will be using them;  the data downloads begin, and take some time to complete; while that stream is in progress the server says something to the effect "Oh, by the way, as soon as these tasks are done with this data you can delete the data."   As we have been told, "BOINC will delete..." is meant to signal a future action and not an immediate action.  Nevertheless, it looks odd to be downloading data and to see BOINC intend to delete it.

My understanding is that an individual task (workunit) uses a span of several data files, of 0.05 hz width, and that is why blocks of consecutive data files are downloaded, even for a single task.

On the topic of data plans and bandwidth quotas -- I have a 50 GB per month limit, although unused amounts carry forward to the next month, up to a 100 GB maximum.  By sheer good luck, this month my ISP is having a system "problem" and is not accumulating my actual data usage against my quota - the ISP shows ZERO usage!  There is an active "trouble ticket" on this issue and eventually the actual usage will register as it is supposed to.

 

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 179947733
RAC: 53109

Yep, BOINC uses "lazy"

Yep, BOINC uses "lazy" deletion but files marked "deleted" can be deleted at any moment, not quite fit if they would be needed for particular task execution still. 

And to download whole bunch of files to use just some of them looks little excessive... So if there is a way to attract project staff attention perhaps it would be right thing to do.

At least maybe they have some simple explanation of this...

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 179947733
RAC: 53109

And again the same.At

And again the same.

At requesting of third GW task new list of support files marked as "deleted".

And actually processed one is the first one. That is, before first and second deletion no running task change.

First computing, second stored in cache. Still some new candidates for deletion...

.

 

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 179947733
RAC: 53109

And regarding files marked

 

There are 3 xml files that handle file lists for BOINC & E@h:

client_state.xml

sched_reply_einstein.phys.uwm.edu.xml

sched_request_einstein.phys.uwm.edu.xml

 

So I checked one of files, namely h1_0931.10_O2C02Cl5In0.AN54, through all of them.

This file is still on HDD and soft-linked from currently executing task.

But also it was in my first list of "for deletion" files.

So, E@h project considers it as already deleted - no mention in request or reply.

client_state.xml mentions it as bound to particular task. Also, that name appears in command line of the same task.

So, currently it looks like there is no locality scheduling at all. Project marks files to deletion as soon as in get request that lists those files. And "forgets" about them. BOINC client keeps track of file until particular task finishes then deletes it, just as it would delete task itself.

All that remains unclear - will exactly same data chunk reappear on the same host after deletion (that is, needed locality scheduling or not for this app).

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110031713415
RAC: 22413665

Raistmer* wrote:... did

Raistmer* wrote:
... did anyone check if E@h GW app process really access all those downloaded files during task processing?

The GW GPU search has gone through a large number of relatively short-lived iterations over the last year or two.  Several people, at different times, have made the effort to document various 'behaviours' on iterations that have lasted long enough for the data to be collected and analysed.

With regard to your comment, "really access all ... files", I did spend quite a bit of time working out the relationships between different tasks and the data files required by those tasks for the search iteration that was in progress in June last year.  In this message, I showed (via a formula) how to work out the number of data files needed for a given task based on the two frequency values specified in the task name.  I produced a table showing the number of data files needed as calculated from the "delta frequency" (DF) extracted from the task name.  I did check this quite closely at the time, noting that all data files downloaded were specified by parameters in the <workunit> ... </workunit> blocks in the state file.

I have no idea if the same scheme is still being followed in the current S3a iteration of the GW GPU search.  It seemed to be in play for the relatively long duration of the recent S2 iteration.  When S2 transitioned abruptly to S3 which itself them morphed rapidly to S3a, the pattern of huge data downloads for single tasks followed almost immediately by deletion requests became evident.  There was no way my internet connection could sustain that for the crazy number of hosts I'm running so all my GW GPU hosts immediately transitioned to GRP work only.

I suspect that the formula for the number of data files given in the above link still applies.  Also, I would be completely surprised if any data files that are downloaded are never accessed by the task for which they were downloaded.  However, it could be that there is some server-side misconfiguration that is allowing premature deletion of data so that locality scheduling is compromised in some way.

Until there is some comment from the Devs about this matter (I've asked Bernd to take a look), I won't restart any hosts on GW GPU so I won't be in a position to properly check all this.

Maybe you could do some careful experiments and let us all know the results? :-).

Cheers,
Gary.

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 137724474
RAC: 16134

Is it possible that we're

Is it possible that we're going through the data so fast, now we have a GPU app, that its doing each frequency much quicker than before and therefore doesn't need to keep the data files around?

From looking at what its downloading (and later deleting) on my machines it doesn't appear to be downloading the same set of data files more than once.

 

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 179947733
RAC: 53109

Yes, it's quite possible. But

Yes, it's quite possible. But current behavior effectively disables locality scheduling: host downloads some files as "tasks" and some as "persistent" ones, not strictly related to particular task. But in fact, they are. Those "support" files marked to deletion as soon as project's server communicated again.

Even no need to get new GW task through communication. Client reports current list of files, server replies with command to delete them. After this client will no more report back to server about their existence and will delete them as soon as particular task (the one that has that file listed in its workunit description) will be done.

Correct locality scheduling requires from server to track whole or quite big set of tasks versus whole set of support files.

 

Maybe size of support files needed for this particular search SO big, that chances to get 2 tasks that use same chunk of support data are very low. In this case keeping track of "who needs who" on server is just waste of resources. And maybe because of this server was configured (so, not misconfiguration but deliberate config) to cancel "persistent" files as soon as it receives their names. This allowed less changes to structure that used locality scheduling before.

Nevertheless from client point of view deliberately or not - no matter. Current workunit size for GW task is not its "workunit file" in BOINC terms but + size of all "support" files it has linked to.

Quite big amount of data.

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 179947733
RAC: 53109

Gary Roberts wrote:Also, I

Gary Roberts wrote:

Also, I would be completely surprised if any data files that are downloaded are never accessed by the task for which they were downloaded.

Yes, yesterday I thought I found such case but later discovered "missing" file in last MD5 list so it's still on HDD (hence edited first versions of my last yesterday's posts).

Gary Roberts wrote:

So, my understanding is that the 2nd frequency term in a task name is the 'analysis frequency'.  The first frequency term in the task name indicates the 'lowest frequency large data file pair' needed for the analysis and there will be an unstated 'highest frequency large data file pair' which is a corresponding amount above the analysis frequency.

So, DF you mentioned there isn't the full freq range task analyses? If some higher freqs in use actual DF will be bigger?

 

Current GW task my host processes:

h1_0900.15_O2C02Cl5In0__O2MD1S3_Spotlight_900.90Hz_1077_0_3

So, DF as defined will be 900.90-900.15=0.75 (Hz).

Lowest freq in "support" file names:

900.15

Highest freq is:

901.70

901.7-900.9=0.8, little more than DF but close

So, indeed, real freq span ~midpoint freq +/- DF value.

 

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 179947733
RAC: 53109

Another peculiarity of GW

Another peculiarity of GW task:

<wu_name>h1_0900.15_O2C02Cl5In0__O2MD1S3_Spotlight_900.90Hz_1077</wu_name>
<result_name>h1_0900.15_O2C02Cl5In0__O2MD1S3_Spotlight_900.90Hz_1077_0</result_name>
And few other mentioned in soft links in slot dir. But I can't find files with such names in E@h project dir.

 

So, instead of SETI tasks where each task existed as file on HDD, tasks here are just structures in xml files ?...

 

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.