"Unsent" units

VQ-2 Ghost

Joined: 6 May 06

Posts: 3

Credit: 49770

RAC: 0

14 Jul 2006 15:30:14 UTC

Topic 191569

(moderation:

)

Greetings,

I have 2 computers in this account; they are both happily and slowly crunching away.

What I donâ€™t understand is why some of the work that the faster computer has already crunched and returned is not being sent to any other computer(s) and remains with a status of â€œUnsentâ€? for days on end.

As we are all in this mostly for the science, seeing the credit rise is also taken as sort of a reward for the work our computers do, it also serves as an incentive to push forward.

When our computers crunch away and the credit does not rise on par with the work done, it feels more like a letdown and the level of satisfaction is not as expected.

Having said all that, I fail to see a concise and satisfactory answer in this newsgroup, maybe this time someone can shed more light on this subject.

What am I missing here that will keep me in this project?

To see 5 examples of what I am concerned about, please see the links below.

http://einsteinathome.org/workunit/10879982

http://einsteinathome.org/workunit/10833910

http://einsteinathome.org/workunit/10812345

http://einsteinathome.org/workunit/10749932

http://einsteinathome.org/workunit/10686168

Thanks!

Alinator

Joined: 8 May 05

Posts: 927

Credit: 9352143

RAC: 0

"Unsent" units

14 Jul 2006 15:53:06 UTC

Message 42515

(moderation:

)

Essentially, EAH raw data is divided into a number of prepackaged "packs". That's the large data file you DL periodically and it's sent out to a number users. That data pack is further subdivided into a number of individual results, which are then crunched by the hosts which have that data pack onboard and returned.

What I think you are seeing here is the inital distribution of the main pack is limited to some number of hosts, so what happens is if you plow through them quickly you may end up having to wait for some of the other hosts which are working on that pack to finish up results they have and request more work out of the pack.

HTH,

Alinator

VQ-2 Ghost

Joined: 6 May 06

Posts: 3

Credit: 49770

RAC: 0

If what you think may be

14 Jul 2006 17:25:39 UTC

Message 42516 in response to message 42515

(moderation:

)

If what you think may be happening with those units is in fact the reason for the delay in sending them to be crunched by others, then the delay makes sense.

Thanks for your prompt restponse.

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3592309335

RAC: 646311

You probably just got stuck

15 Jul 2006 1:56:57 UTC

Message 42517

(moderation:

)

You probably just got stuck with several p3-500 class systems that take several days to do a WU, or a few noobs who only took single WU and left. Eventually the scheduler will realize these WU's are unattended and bring more PCs onto the data.

archae86

Joined: 6 Dec 05

Posts: 3164

Credit: 7369381687

RAC: 2241919

RE: You probably just got

15 Jul 2006 2:15:34 UTC

Message 42518 in response to message 42517

(moderation:

)

Quote:

You probably just got stuck with several p3-500 class systems that take several days to do a WU, or a few noobs who only took single WU and left. Eventually the scheduler will realize these WU's are unattended and bring more PCs onto the data.

Ummm, no Dan, not for the "unsent" phenomenon reported by VQ-2 Ghost and viewable still on some of the links supplied.

Unsent on the only other result for the WU means he is waiting for the scheduler to choose to send out the _1 result for his WU for the very first time.

to VQ-2 Ghost:

If your host prefetches results much faster than the sum of the other hosts which are currently working on the same major datafile, you'll notice you are downloading sequential results, all of the _0 flavor. This is a big clue that you are "ahead of the pack" and likely to wait a while. The next major datafile will be a whole new ballgame.

The good news is that in S5 this just affects how long you wait for confirmation and credit. In S4, if you got in with a pack of low claimers, it could suppress your awarded credit for weeks worth of work. It happened to me, in perhaps February when one my hosts spent a whole month processing results from one major datafile and getting low awarded credit.

I even tried suspending the project and deleting the major datafile from the project directory, hoping to get assigned a new one with middle-of-the-road partners, but the client just noticed the file was missing and downloaded it again.

This too will pass, eventually.

VQ-2 Ghost

Joined: 6 May 06

Posts: 3

Credit: 49770

RAC: 0

Thanks for some great

15 Jul 2006 3:48:41 UTC

Message 42519 in response to message 42518

(moderation:

)

Thanks for some great answers, this kind of delay is now understood!

By sheer coinsidence, even a crunched unit my computer had returned on July the 9th and was just sitting there looking pretty, has now been sent to some other computer for crunchin since my 1st post on this subject.

Cheers!

Idefix

Joined: 21 Mar 05

Posts: 11

Credit: 43293

RAC: 0

Hi archae86, RE: I even

15 Jul 2006 12:27:56 UTC

Message 42520 in response to message 42518

(moderation:

)

Hi archae86,

Quote:

I even tried suspending the project and deleting the major datafile from the project directory, hoping to get assigned a new one with middle-of-the-road partners, but the client just noticed the file was missing and downloaded it again.

Did you delete the corresponding entry in the client_state.xml as well? If not you are reporting "I have data file x, please send me workunits for this data file."

Regards,
Carsten

archae86

Joined: 6 Dec 05

Posts: 3164

Credit: 7369381687

RAC: 2241919

RE: Did you delete the

15 Jul 2006 13:58:12 UTC

Message 42521 in response to message 42520

(moderation:

)

Quote:

Did you delete the corresponding entry in the client_state.xml as well?
Regards,
Carsten

No I did not. Back then I had neither the insight nor the courage to tamper with client_state.xml at all. Should I ever see the need to try this again, I'll take a look there.

These days I do go into client_state.xml to rebalance short vs. long_term debt differences to avoid anomalous prefetch behavior, and once or twice have done more ambitious things. So far I've not burned myself--though I think it is risky.

Thanks for the tip,
Peter

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3592309335

RAC: 646311

RE: RE: You probably just

15 Jul 2006 14:09:20 UTC

Message 42522 in response to message 42518

(moderation:

)

Quote:

Quote:
You probably just got stuck with several p3-500 class systems that take several days to do a WU, or a few noobs who only took single WU and left. Eventually the scheduler will realize these WU's are unattended and bring more PCs onto the data.
Ummm, no Dan, not for the "unsent" phenomenon reported by VQ-2 Ghost and viewable still on some of the links supplied.

Unsent on the only other result for the WU means he is waiting for the scheduler to choose to send out the _1 result for his WU for the very first time.

To minimize the number of times a file is sent, only a few PCs are given it, with the expectation that they'll eventually request all the remaining work units in it. If they're all extremely slow or drop the project, some of the work units in the datafile will be stuck in the unsent state until the scheduler realizes there's a problem and sends the data to annother box.

Alinator

Joined: 8 May 05

Posts: 927

Credit: 9352143

RAC: 0

RE: Hi archae86,RE: I

15 Jul 2006 15:04:17 UTC

Message 42523 in response to message 42520

(moderation:

)

Quote:

Hi archae86,
Quote:
I even tried suspending the project and deleting the major datafile from the project directory, hoping to get assigned a new one with middle-of-the-road partners, but the client just noticed the file was missing and downloaded it again.
Did you delete the corresponding entry in the client_state.xml as well? If not you are reporting "I have data file x, please send me workunits for this data file."

Regards,
Carsten

Of course that only exacerbates stalling at "Unsent" for the other hosts still working on the data pack by forcing the server scheduler to intervene and distribute it to other hosts.

I'm not sure how long it takes before the scheduler decides progress is too slow for a given data pack and sends it out again (or even how many hosts get it initially for that matter).

Alinator

"Unsent" units

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner