Ghost WU and resending lost results

Tahoe
Tahoe
Joined: 9 Mar 05
Posts: 12
Credit: 23841020
RAC: 0

I can't be sure if the

I can't be sure if the problem I am experiencing is related to the scheduler changes or not. I have experienced an inundation of workunits on a machine that seems to have this problem with regularity. My post "Overcommitted"
in Problems and Bug reports goes into the background. Overcommitted

Since that time I have removed the application and the boinc directory and reinstalled boinc 4.45.

Machine # 383671 has currently 47 work units and it appears that several old ones were submitted per this lost work unit update. The units at the top of the list are due Aug 8 and are preempted because the machine received a large batch of units dated Aug 5, Aug 6, and Aug 7. I aborted the 8/5 units since they would not get credit (today is Aug 5).

I'm open to suggestions. Since I have 3 other machines identical to this configuration that operate normally, I may ghost one of their drive images on to this machine and replace BOINC for another attempt at a clean install.

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

RE: I can't be sure if the

Message 14772 in response to message 14771

Quote:

I can't be sure if the problem I am experiencing is related to the scheduler changes or not. I have experienced an inundation of workunits on a machine that seems to have this problem with regularity. My post "Overcommitted"
in Problems and Bug reports goes into the background. Overcommitted

Since that time I have removed the application and the boinc directory and reinstalled boinc 4.45.

Machine # 383671 has currently 47 work units and it appears that several old ones were submitted per this lost work unit update. The units at the top of the list are due Aug 8 and are preempted because the machine received a large batch of units dated Aug 5, Aug 6, and Aug 7. I aborted the 8/5 units since they would not get credit (today is Aug 5).

I'm open to suggestions. Since I have 3 other machines identical to this configuration that operate normally, I may ghost one of their drive images on to this machine and replace BOINC for another attempt at a clean install.

My first suggestion is to abort all the "extra" results, meaning whatever can't be completed in a week. Actually six days now. Running 24x7, just one project (E@H) thats 28 workunits to keep. Maybe delete a few more to make sure you'll meet the deadline with the ones you keep

Second one is to not reinstall BOINC and to avoid resetting/detaching the project. Those cause the scheduler to assign new hostids and additional work, which will be "resent" when the old hosts are merged with the new. And the "too much work" continues.

IF you do have to reset/detach/reinstall, then add one step to the process. Just as soon as you reset or reattach - select the project again and click the "no new work" button. That way it'll only download one WU. Leave it that way until you merge the old host with the new one, and the scheduler will resend anything "lost". Of course, if it doesn't assign a new host, it'll resend all the "lost" ones, you'll see that in the message log.

After that you can "allow new work".

Walt

Tahoe
Tahoe
Joined: 9 Mar 05
Posts: 12
Credit: 23841020
RAC: 0

Walt, Thanks! your advise

Walt, Thanks! your advise qualifies as "sticky". I did merge the host after the last reinstall so that is likely what caused all the lost units coming back. For now, I have the "no new work" button selected and I will trim the work que to an achievable level. When the que runs down, I will see if this machine contines to over commit.

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

Could this patch have any

Could this patch have any side effects on the upload handler?

I can still receive work with a 4.19 on Linux going through a squid proxy with PW - but I cannot deliver results anymore.

resultid=7800129 ist stuck for quite some time now, upload always gives me a -127, temporarily failed upload.

As downloads still work, it cannot be a proxy or rights problem, all files belong to the user running BOINC, uploads did work before, I can "wget" the upload handler reply so there's nothing blocked either.

BOINC runs with "-return_results_immediately" because it is a slow machine.

I'm not 100% sure but windows clients seem not to be affected (I can check that on Monday)

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

Interesting effect : when the

Message 14775 in response to message 14774

Interesting effect : when the next result was ready, it uploaded the one that was stuck with -127 before without any trouble - now the new one (resultid=8035146) keeps giving me -127.

Merry Margaret
Merry Margaret
Joined: 14 Mar 05
Posts: 7
Credit: 47111
RAC: 0

[=dark blue]Please refer to

[=dark blue]Please refer to post at:
http://einsteinathome.org/node/189802

I have no work units for one machine.[/]

Snake Doctor
Snake Doctor
Joined: 21 Jul 05
Posts: 71
Credit: 552724
RAC: 0

I have read this thread with

I have read this thread with interest. I have been trying to figure out what is going on with my results for some time. It seems as if thing go fine for a few days and then all of the sudden I get a large "patch" of results that I have never seen on my machine, and never get completed. It is almost as if they were never sent to me at all but are now in my results list. I know my machine does not request work often enough to receive that many WUs.

I have just discovered that WUs may be delivered in "packets", and that the system draws on these packets for a local supply of WUs. If this is true, I wonder if what I am seeing could be the result of these packets being too large, and containing WUs that have too short deadlines. for the system to complete in the time allotted?

In any case I would like to have some idea how to prevent these large gaps of incomplete WUs that I have never seen from developing in the first place. My results are here - http://einsteinathome.org/account/tasks

Regards
Phil


We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.

nfortino
nfortino
Joined: 7 Jun 05
Posts: 12
Credit: 1046710
RAC: 0

RE: ... In any case I would

Message 14778 in response to message 14777

Quote:

...
In any case I would like to have some idea how to prevent these large gaps of incomplete WUs that I have never seen from developing in the first place. My results are here - http://einsteinathome.org/account/tasks

Regards
Phil

Try upgrading your BOINC client to 4.45 from http://boinc.berkeley.edu/download.php. This should fix the problem (details in this thread), and also download those results which you are currently missing.

Snake Doctor
Snake Doctor
Joined: 21 Jul 05
Posts: 71
Credit: 552724
RAC: 0

RE: RE: ... In any case I

Message 14779 in response to message 14778

Quote:
Quote:

...
In any case I would like to have some idea how to prevent these large gaps of incomplete WUs that I have never seen from developing in the first place. My results are here - http://einsteinathome.org/account/tasks

Regards
Phil

Try upgrading your BOINC client to 4.45 from http://boinc.berkeley.edu/download.php. This should fix the problem (details in this thread), and also download those results which you are currently missing.

I have been running 4.45 since I first installed BOINC. How would I download the results I do not have? I have rest the project, I have reinstalled the BOINC software, and of cource restarted the system and the BOINC siftware many times, and nothing has changed.

Is there some way to get the server to download these WUs?.

I am thinking of tryng the Ver 5 Alpha of BOINC to see if something there will work better.

Regards
Phil


We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.

nfortino
nfortino
Joined: 7 Jun 05
Posts: 12
Credit: 1046710
RAC: 0

RE: I have been running

Message 14780 in response to message 14779

Quote:

I have been running 4.45 since I first installed BOINC.
...

Are you sure about that? Forgive the arrogance, but unless I can't read the results page properly, you are using version 4.43. (example http://einsteinathome.org/task/8290642 under stderr out) I must retract my advice, however, as there is no version 4.45 for a Mac available for download (4.43 is the latest release version). I am afraid I can't comment on what will happen with the alpha release of BOINC, but I'd stick with the regular release version if I were you, as the ghost WU don't cause significant harm.

Nick

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.