"Unsent" WUs

Winterknight
Winterknight
Joined: 4 Jun 05
Posts: 1222
Credit: 312389590
RAC: 648141

It is quite possible that the

It is quite possible that the freq bands were not restricted when they knew Atlas was going to be brought on-line as BOINC rules state duplicate tasks cannot be sent to the same account holder. Therefore the Atlas account with thousands of cpu's would probably need lots of freq slots.

Arion
Arion
Joined: 20 Mar 05
Posts: 147
Credit: 1626747
RAC: 0

Well Gary did do a very good

Well Gary did do a very good job of explaining the large frequences and data files, etc. and I understand all that. My beef, small or large as it is has to do with the delay in original sending and secondary sending for validation. And I think Gary is right and in his mind and the the projects mind things are working like they are supposed to.

On the vast majority of projects where validation is required there is always the delay on when credit is awarded due to someone needs to validate the WU and there is a good reason for it. It is also understandable that you might have SOME pending credits. They may even take a while to eventually catch up. That's fine. Over all your credits awarded, if things are running like they did on the last run, remain constant. While you may be dropping some to pending you are also gaining some from the straglers to cover for that difference. I can even understand that every once in a while I might get behind the ball a bit. BUT, when I process the same number of WU's every day and normally see say 1200 credits and suddenly for 5, 6, 7 days straight of crunching and I see 220 credits a day I want to know what is wrong.

When I saw I was the first one to be assigned the work and it hasn't been sent out for validation yet I wanted to know what happened. Gary explains its the frequences. Okay! But I see that instead of like the last run it was sent the same day it is now anywhere from 3 to 10 days - that's a problem in my eyes.( if it happens occasionally that's different) So where on a normal basis like the last run I might have to wait 14 days for this wingman to time out and have it reissued now I might have to wait the same amount of time for it to just be issued for validation. On top of that he may time out in 18 days now, I think Brain said. So I'm waiting 4 weeks maybe for it to validate or be reissued to a 3rd wingman and maybe he'll finish it soon or time out in another 18 days. Now I'm waiting an additional 2 more weeks.

So once again, unless I've completely missed something, my point has to do with WHY does it take anywhere from 3 days to 10 days for the wu to be sent out for validation? If the second wingman times out or has a client error why is then also delayed from 3 days to 10 days before it is sent out again? Once again this is a change since the last run where it almost never went past 1 day...... If this is due to the large bands of frequencies and/or the limited number of hosts that maybe it needs to be adjusted if there is so many people questioning it. Break it up into smaller frequencies (bands) and send it to 2 at the same time and things are like they were. Even if it takes longer to do the processing, the turn-around time is reduced from what it is now. If we eliminate the 3-10 day delay and bring it back to within 1 day maybe more will get processed and we'll go through the data sets quicker.

You know if this was just a few wu's and it happened infreguently it really wouldn't matter at all to me and apparently to all the others that have large numbers of pending results and building. But it Isn't just a few or infrequently, its become the norm.

utopia-i
utopia-i
Joined: 8 Nov 07
Posts: 7
Credit: 1014015
RAC: 0

not a frequent visitor to

not a frequent visitor to these forums so apologies if this has been explained before ... Pending Credit -

http://einsteinathome.org/workunit/42646621 ..

so in simple terms please ... how long until this WU is sent out so at least I'm in with a chance of the quorum being completed. I completed on 26 Aug 2008 7:10:31 UTC

cheers!

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: If we eliminate the

Message 85569 in response to message 85567

Quote:
If we eliminate the 3-10 day delay and bring it back to within 1 day maybe more will get processed and we'll go through the data sets quicker.

That implies that hosts are sitting idle, doing nothing...or there is an almost "HR-like" behavior by the scheduler that says "This task was done by an AMD running Linux. You have an Intel running Windows, so you can't have this task to work on because it is in a different HR class."

Seriously, increasing the performance of the Windows app will improve the situation... It has direct benefit as well as intangible benefit (how many people might come back, how many people might stop aborting tasks, etc...?)

John Clark
John Clark
Joined: 4 May 07
Posts: 1087
Credit: 3143193
RAC: 0

RE: not a frequent visitor

Message 85570 in response to message 85568

Quote:

not a frequent visitor to these forums so apologies if this has been explained before ... Pending Credit -

http://einsteinathome.org/workunit/42646621 ..

so in simple terms please ... how long until this WU is sent out so at least I'm in with a chance of the quorum being completed. I completed on 26 Aug 2008 7:10:31 UTC

cheers!

From what I gather in this thread -

The answer to your question seems to be how long is a piece of string?. This is because -

1. You need to wait until you get a new wingman, which you will when the WU is resent to another cruncher.
2. Whoever the new cruncher/wingman is the time they take to return the completed WU will depend on that computer's WU turn around time.

I have several such WUs like this in my pending (for example this or even this).

ATM the rise in pending and waiting for wingmen seems to be fairly common. But, given time, it will sort out.

Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

RE: RE: not a frequent

Message 85571 in response to message 85570

Quote:
Quote:

not a frequent visitor to these forums so apologies if this has been explained before ... Pending Credit -

http://einsteinathome.org/workunit/42646621 ..

so in simple terms please ... how long until this WU is sent out so at least I'm in with a chance of the quorum being completed. I completed on 26 Aug 2008 7:10:31 UTC

cheers!

From what I gather in this thread -

The answer to your question seems to be how long is a piece of string?. This is because -

1. You need to wait until you get a new wingman, which you will when the WU is resent to another cruncher.
2. Whoever the new cruncher/wingman is the time they take to return the completed WU will depend on that computer's WU turn around time.

I have several such WUs like this in my pending (for example this or even this).

ATM the rise in pending and waiting for wingmen seems to be fairly common. But, given time, it will sort out.

Hmmm...

Well, a from the hip calculation indicates that Utopia-i seems to be waiting on the oldest unsent task at the moment (or close to it from looking at the create date)! ;-)

One thing to keep in mind here is that the whole idea of Locality Scheduling is to reduce the bandwidth needed to get the datapacks out in the field. Therefore, it is reasonable to assume that the lower the template frequency is, the longer it will take for a resend to actually get sent out. The reason is because an increasing majority of the host population has already worked past that frequency and thus doesn't have any of the datapacks onboard for it.

So in essence this means that the pool of hosts currently 'eligible' is reduced to 'slower' hosts (either physically or due to low resource share) which are currently working on templates lower in frequency than the one in question, or brand new hosts joining the project. IOW, the scheduler 'knows' that in all probability there are hosts which would get to that template eventually following the normal progression. Why should it force an established host which has already worked past that frequency to have to DL the whole datapack set just to complete a few remaining tasks for a template, when there is still over a year to go on R4.

Alinator

Slagathor
Slagathor
Joined: 28 Feb 05
Posts: 14
Credit: 21118749
RAC: 0

Although you can't do

Although you can't do anything about the Unsent workunits in your own portfolio, if you want to help someone else with his (and get immediate credit applied when you do), you can induce the scheduler to make you a wingman on a workunit with an Unsent task, following this suggestion by Gary Roberts.

For example, if you want to get this workunit (the _1 of which is Unsent at the time of this writing), drop the following file_info block into your client_state.xml, mark your other datapacks as , and fetch new work:

    h1_1094.90_S5R4
    4262400.000000
    0.000000
    9dc3ab0f86bd89f8e3e3a49b2caf03db
    1
    
    
    http://einstein.phys.uwm.edu/download/172/h1_1094.90_S5R4
    http://einstein.ligo.caltech.edu/download/172/h1_1094.90_S5R4
    http://einstein.astro.gla.ac.uk/download/172/h1_1094.90_S5R4
    http://einstein.aei.mpg.de/download/172/h1_1094.90_S5R4
    http://einstein.phys.uwm.edu/download/172/h1_1094.90_S5R4
    http://einstein.ligo.caltech.edu/download/172/h1_1094.90_S5R4
    http://einstein.astro.gla.ac.uk/download/172/h1_1094.90_S5R4
    http://einstein.aei.mpg.de/download/172/h1_1094.90_S5R4

l1_1094.90_S5R4
3847680.000000
0.000000
6f49bd6a671dbe8d240b91be517c8028
1


http://einstein.phys.uwm.edu/download/ff/l1_1094.90_S5R4
http://einstein.ligo.caltech.edu/download/ff/l1_1094.90_S5R4
http://einstein.astro.gla.ac.uk/download/ff/l1_1094.90_S5R4
http://einstein.aei.mpg.de/download/ff/l1_1094.90_S5R4
http://einstein.phys.uwm.edu/download/ff/l1_1094.90_S5R4
http://einstein.ligo.caltech.edu/download/ff/l1_1094.90_S5R4
http://einstein.astro.gla.ac.uk/download/ff/l1_1094.90_S5R4
http://einstein.aei.mpg.de/download/ff/l1_1094.90_S5R4

As Bernd might say, only run this if you're sure of what you're doing.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752967155
RAC: 1383389

RE: Although you can't do

Message 85573 in response to message 85572

Quote:

Although you can't do anything about the Unsent workunits in your own portfolio, if you want to help someone else with his (and get immediate credit applied when you do), you can induce the scheduler to make you a wingman on a workunit with an Unsent task, following this suggestion by Gary Roberts.

For example, if you want to get this workunit (the _1 of which is Unsent at the time of this writing), drop the following file_info block into your client_state.xml, mark your other datapacks as , and fetch new work:

    h1_1094.90_S5R4
    4262400.000000
    0.000000
    9dc3ab0f86bd89f8e3e3a49b2caf03db
    1
    
    
    http://einstein.phys.uwm.edu/download/172/h1_1094.90_S5R4
    http://einstein.ligo.caltech.edu/download/172/h1_1094.90_S5R4
    http://einstein.astro.gla.ac.uk/download/172/h1_1094.90_S5R4
    http://einstein.aei.mpg.de/download/172/h1_1094.90_S5R4
    http://einstein.phys.uwm.edu/download/172/h1_1094.90_S5R4
    http://einstein.ligo.caltech.edu/download/172/h1_1094.90_S5R4
    http://einstein.astro.gla.ac.uk/download/172/h1_1094.90_S5R4
    http://einstein.aei.mpg.de/download/172/h1_1094.90_S5R4

l1_1094.90_S5R4
3847680.000000
0.000000
6f49bd6a671dbe8d240b91be517c8028
1


http://einstein.phys.uwm.edu/download/ff/l1_1094.90_S5R4
http://einstein.ligo.caltech.edu/download/ff/l1_1094.90_S5R4
http://einstein.astro.gla.ac.uk/download/ff/l1_1094.90_S5R4
http://einstein.aei.mpg.de/download/ff/l1_1094.90_S5R4
http://einstein.phys.uwm.edu/download/ff/l1_1094.90_S5R4
http://einstein.ligo.caltech.edu/download/ff/l1_1094.90_S5R4
http://einstein.astro.gla.ac.uk/download/ff/l1_1094.90_S5R4
http://einstein.aei.mpg.de/download/ff/l1_1094.90_S5R4

As Bernd might say, only run this if you're sure of what you're doing.


The trouble with this is the .

Either you have to know the checksum in advance, and let BOINC download the file: or you have to download it manually, calculate what the checksum should be, and put both into the BOINC system together.

Either way, you also have to find out which of the 400 or more download fanout subdirectories those two particular files live in.

Unless someone is prepared to host an MD5/fanout online lookup list, where we can all chip in details of any data files we happen to have access to?

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

RE: RE: Although you

Message 85574 in response to message 85573

Quote:
Quote:

Although you can't do anything about the Unsent workunits in your own portfolio, if you want to help someone else with his (and get immediate credit applied when you do), you can induce the scheduler to make you a wingman on a workunit with an Unsent task, following this suggestion by Gary Roberts.

For example, if you want to get this workunit (the _1 of which is Unsent at the time of this writing), drop the following file_info block into your client_state.xml, mark your other datapacks as , and fetch new work:

    h1_1094.90_S5R4
    4262400.000000
    0.000000
    9dc3ab0f86bd89f8e3e3a49b2caf03db
    1
    
    
    http://einstein.phys.uwm.edu/download/172/h1_1094.90_S5R4
    http://einstein.ligo.caltech.edu/download/172/h1_1094.90_S5R4
    http://einstein.astro.gla.ac.uk/download/172/h1_1094.90_S5R4
    http://einstein.aei.mpg.de/download/172/h1_1094.90_S5R4
    http://einstein.phys.uwm.edu/download/172/h1_1094.90_S5R4
    http://einstein.ligo.caltech.edu/download/172/h1_1094.90_S5R4
    http://einstein.astro.gla.ac.uk/download/172/h1_1094.90_S5R4
    http://einstein.aei.mpg.de/download/172/h1_1094.90_S5R4

l1_1094.90_S5R4
3847680.000000
0.000000
6f49bd6a671dbe8d240b91be517c8028
1


http://einstein.phys.uwm.edu/download/ff/l1_1094.90_S5R4
http://einstein.ligo.caltech.edu/download/ff/l1_1094.90_S5R4
http://einstein.astro.gla.ac.uk/download/ff/l1_1094.90_S5R4
http://einstein.aei.mpg.de/download/ff/l1_1094.90_S5R4
http://einstein.phys.uwm.edu/download/ff/l1_1094.90_S5R4
http://einstein.ligo.caltech.edu/download/ff/l1_1094.90_S5R4
http://einstein.astro.gla.ac.uk/download/ff/l1_1094.90_S5R4
http://einstein.aei.mpg.de/download/ff/l1_1094.90_S5R4

As Bernd might say, only run this if you're sure of what you're doing.


The trouble with this is the .

Either you have to know the checksum in advance, and let BOINC download the file: or you have to download it manually, calculate what the checksum should be, and put both into the BOINC system together.

Either way, you also have to find out which of the 400 or more download fanout subdirectories those two particular files live in.

Unless someone is prepared to host an MD5/fanout online lookup list, where we can all chip in details of any data files we happen to have access to?

Hmmm...

Interesting procedure, and I don't see why it wouldn't work once you jump the MD5 hurdle...

But in all likelyhood, you would end up having to manually 'scrape' the datapacks from the fanout directory, calculate the hash, and plug it all in yourself. Even if all us hard core crunchers teamed up to post MD5's for various datapacks somewhere, just one missing one brings you back to some extra manual 'hack-and-whack'.

I'm left with the feeling of why should I bother with that, since the whole point of a BOINC scheduler is so I don't even have to think about things like that. ;-)

Alinator

Slagathor
Slagathor
Joined: 28 Feb 05
Posts: 14
Credit: 21118749
RAC: 0

RE: Unless someone is

Message 85575 in response to message 85573

Quote:
Unless someone is prepared to host an MD5/fanout online lookup list, where we can all chip in details of any data files we happen to have access to?


I wrote a simple script to generate the file_info block, just to see if it would work. It does. Dropping the file_info block into the client_state.xml with the correct md5 and download locations causes the file to be downloaded onto a machine that doesn't already have it. It is then assigned workunits in the usual way, and adjacent datapacks are automatically downloaded as needed.

Since I doubt there will be much (if any) demand for it, I'm not going to create a web page to do this, but if anyone is interested in the file_info block for a particular datapack, just send a PM (or post here) and I'll generate it for you. Takes about 10 seconds.

Part of the key to making it work is marking the other datapacks in your client_state.xml as . Otherwise, you will (usually) continue to get workunits associated with those.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.