Greetings,
I have 2 computers in this account; they are both happily and slowly crunching away.
What I don’t understand is why some of the work that the faster computer has already crunched and returned is not being sent to any other computer(s) and remains with a status of “Unsent� for days on end.
As we are all in this mostly for the science, seeing the credit rise is also taken as sort of a reward for the work our computers do, it also serves as an incentive to push forward.
When our computers crunch away and the credit does not rise on par with the work done, it feels more like a letdown and the level of satisfaction is not as expected.
Having said all that, I fail to see a concise and satisfactory answer in this newsgroup, maybe this time someone can shed more light on this subject.
What am I missing here that will keep me in this project?
To see 5 examples of what I am concerned about, please see the links below.
http://einsteinathome.org/workunit/10879982
http://einsteinathome.org/workunit/10833910
http://einsteinathome.org/workunit/10812345
http://einsteinathome.org/workunit/10749932
http://einsteinathome.org/workunit/10686168
Thanks!
Copyright © 2024 Einstein@Home. All rights reserved.
"Unsent" units
)
Essentially, EAH raw data is divided into a number of prepackaged "packs". That's the large data file you DL periodically and it's sent out to a number users. That data pack is further subdivided into a number of individual results, which are then crunched by the hosts which have that data pack onboard and returned.
What I think you are seeing here is the inital distribution of the main pack is limited to some number of hosts, so what happens is if you plow through them quickly you may end up having to wait for some of the other hosts which are working on that pack to finish up results they have and request more work out of the pack.
HTH,
Alinator
If what you think may be
)
If what you think may be happening with those units is in fact the reason for the delay in sending them to be crunched by others, then the delay makes sense.
Thanks for your prompt restponse.
You probably just got stuck
)
You probably just got stuck with several p3-500 class systems that take several days to do a WU, or a few noobs who only took single WU and left. Eventually the scheduler will realize these WU's are unattended and bring more PCs onto the data.
RE: You probably just got
)
Ummm, no Dan, not for the "unsent" phenomenon reported by VQ-2 Ghost and viewable still on some of the links supplied.
Unsent on the only other result for the WU means he is waiting for the scheduler to choose to send out the _1 result for his WU for the very first time.
to VQ-2 Ghost:
If your host prefetches results much faster than the sum of the other hosts which are currently working on the same major datafile, you'll notice you are downloading sequential results, all of the _0 flavor. This is a big clue that you are "ahead of the pack" and likely to wait a while. The next major datafile will be a whole new ballgame.
The good news is that in S5 this just affects how long you wait for confirmation and credit. In S4, if you got in with a pack of low claimers, it could suppress your awarded credit for weeks worth of work. It happened to me, in perhaps February when one my hosts spent a whole month processing results from one major datafile and getting low awarded credit.
I even tried suspending the project and deleting the major datafile from the project directory, hoping to get assigned a new one with middle-of-the-road partners, but the client just noticed the file was missing and downloaded it again.
This too will pass, eventually.
Thanks for some great
)
Thanks for some great answers, this kind of delay is now understood!
By sheer coinsidence, even a crunched unit my computer had returned on July the 9th and was just sitting there looking pretty, has now been sent to some other computer for crunchin since my 1st post on this subject.
Cheers!
Hi archae86, RE: I even
)
Hi archae86,
Did you delete the corresponding entry in the client_state.xml as well? If not you are reporting "I have data file x, please send me workunits for this data file."
Regards,
Carsten
RE: Did you delete the
)
No I did not. Back then I had neither the insight nor the courage to tamper with client_state.xml at all. Should I ever see the need to try this again, I'll take a look there.
These days I do go into client_state.xml to rebalance short vs. long_term debt differences to avoid anomalous prefetch behavior, and once or twice have done more ambitious things. So far I've not burned myself--though I think it is risky.
Thanks for the tip,
Peter
RE: RE: You probably just
)
To minimize the number of times a file is sent, only a few PCs are given it, with the expectation that they'll eventually request all the remaining work units in it. If they're all extremely slow or drop the project, some of the work units in the datafile will be stuck in the unsent state until the scheduler realizes there's a problem and sends the data to annother box.
RE: Hi archae86,RE: I
)
Of course that only exacerbates stalling at "Unsent" for the other hosts still working on the data pack by forcing the server scheduler to intervene and distribute it to other hosts.
I'm not sure how long it takes before the scheduler decides progress is too slow for a given data pack and sends it out again (or even how many hosts get it initially for that matter).
Alinator