I can't be sure if the problem I am experiencing is related to the scheduler changes or not. I have experienced an inundation of workunits on a machine that seems to have this problem with regularity. My post "Overcommitted"
in Problems and Bug reports goes into the background. Overcommitted
Since that time I have removed the application and the boinc directory and reinstalled boinc 4.45.
Machine # 383671 has currently 47 work units and it appears that several old ones were submitted per this lost work unit update. The units at the top of the list are due Aug 8 and are preempted because the machine received a large batch of units dated Aug 5, Aug 6, and Aug 7. I aborted the 8/5 units since they would not get credit (today is Aug 5).
I'm open to suggestions. Since I have 3 other machines identical to this configuration that operate normally, I may ghost one of their drive images on to this machine and replace BOINC for another attempt at a clean install.
I can't be sure if the problem I am experiencing is related to the scheduler changes or not. I have experienced an inundation of workunits on a machine that seems to have this problem with regularity. My post "Overcommitted"
in Problems and Bug reports goes into the background. Overcommitted
Since that time I have removed the application and the boinc directory and reinstalled boinc 4.45.
Machine # 383671 has currently 47 work units and it appears that several old ones were submitted per this lost work unit update. The units at the top of the list are due Aug 8 and are preempted because the machine received a large batch of units dated Aug 5, Aug 6, and Aug 7. I aborted the 8/5 units since they would not get credit (today is Aug 5).
I'm open to suggestions. Since I have 3 other machines identical to this configuration that operate normally, I may ghost one of their drive images on to this machine and replace BOINC for another attempt at a clean install.
My first suggestion is to abort all the "extra" results, meaning whatever can't be completed in a week. Actually six days now. Running 24x7, just one project (E@H) thats 28 workunits to keep. Maybe delete a few more to make sure you'll meet the deadline with the ones you keep
Second one is to not reinstall BOINC and to avoid resetting/detaching the project. Those cause the scheduler to assign new hostids and additional work, which will be "resent" when the old hosts are merged with the new. And the "too much work" continues.
IF you do have to reset/detach/reinstall, then add one step to the process. Just as soon as you reset or reattach - select the project again and click the "no new work" button. That way it'll only download one WU. Leave it that way until you merge the old host with the new one, and the scheduler will resend anything "lost". Of course, if it doesn't assign a new host, it'll resend all the "lost" ones, you'll see that in the message log.
Walt, Thanks! your advise qualifies as "sticky". I did merge the host after the last reinstall so that is likely what caused all the lost units coming back. For now, I have the "no new work" button selected and I will trim the work que to an achievable level. When the que runs down, I will see if this machine contines to over commit.
Could this patch have any side effects on the upload handler?
I can still receive work with a 4.19 on Linux going through a squid proxy with PW - but I cannot deliver results anymore.
resultid=7800129 ist stuck for quite some time now, upload always gives me a -127, temporarily failed upload.
As downloads still work, it cannot be a proxy or rights problem, all files belong to the user running BOINC, uploads did work before, I can "wget" the upload handler reply so there's nothing blocked either.
BOINC runs with "-return_results_immediately" because it is a slow machine.
I'm not 100% sure but windows clients seem not to be affected (I can check that on Monday)
Interesting effect : when the next result was ready, it uploaded the one that was stuck with -127 before without any trouble - now the new one (resultid=8035146) keeps giving me -127.
I have read this thread with interest. I have been trying to figure out what is going on with my results for some time. It seems as if thing go fine for a few days and then all of the sudden I get a large "patch" of results that I have never seen on my machine, and never get completed. It is almost as if they were never sent to me at all but are now in my results list. I know my machine does not request work often enough to receive that many WUs.
I have just discovered that WUs may be delivered in "packets", and that the system draws on these packets for a local supply of WUs. If this is true, I wonder if what I am seeing could be the result of these packets being too large, and containing WUs that have too short deadlines. for the system to complete in the time allotted?
In any case I would like to have some idea how to prevent these large gaps of incomplete WUs that I have never seen from developing in the first place. My results are here - http://einsteinathome.org/account/tasks
Regards
Phil
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
...
In any case I would like to have some idea how to prevent these large gaps of incomplete WUs that I have never seen from developing in the first place. My results are here - http://einsteinathome.org/account/tasks
Regards
Phil
Try upgrading your BOINC client to 4.45 from http://boinc.berkeley.edu/download.php. This should fix the problem (details in this thread), and also download those results which you are currently missing.
...
In any case I would like to have some idea how to prevent these large gaps of incomplete WUs that I have never seen from developing in the first place. My results are here - http://einsteinathome.org/account/tasks
Regards
Phil
Try upgrading your BOINC client to 4.45 from http://boinc.berkeley.edu/download.php. This should fix the problem (details in this thread), and also download those results which you are currently missing.
I have been running 4.45 since I first installed BOINC. How would I download the results I do not have? I have rest the project, I have reinstalled the BOINC software, and of cource restarted the system and the BOINC siftware many times, and nothing has changed.
Is there some way to get the server to download these WUs?.
I am thinking of tryng the Ver 5 Alpha of BOINC to see if something there will work better.
Regards
Phil
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
I have been running 4.45 since I first installed BOINC.
...
Are you sure about that? Forgive the arrogance, but unless I can't read the results page properly, you are using version 4.43. (example http://einsteinathome.org/task/8290642 under stderr out) I must retract my advice, however, as there is no version 4.45 for a Mac available for download (4.43 is the latest release version). I am afraid I can't comment on what will happen with the alpha release of BOINC, but I'd stick with the regular release version if I were you, as the ghost WU don't cause significant harm.
I can't be sure if the
)
I can't be sure if the problem I am experiencing is related to the scheduler changes or not. I have experienced an inundation of workunits on a machine that seems to have this problem with regularity. My post "Overcommitted"
in Problems and Bug reports goes into the background. Overcommitted
Since that time I have removed the application and the boinc directory and reinstalled boinc 4.45.
Machine # 383671 has currently 47 work units and it appears that several old ones were submitted per this lost work unit update. The units at the top of the list are due Aug 8 and are preempted because the machine received a large batch of units dated Aug 5, Aug 6, and Aug 7. I aborted the 8/5 units since they would not get credit (today is Aug 5).
I'm open to suggestions. Since I have 3 other machines identical to this configuration that operate normally, I may ghost one of their drive images on to this machine and replace BOINC for another attempt at a clean install.
RE: I can't be sure if the
)
My first suggestion is to abort all the "extra" results, meaning whatever can't be completed in a week. Actually six days now. Running 24x7, just one project (E@H) thats 28 workunits to keep. Maybe delete a few more to make sure you'll meet the deadline with the ones you keep
Second one is to not reinstall BOINC and to avoid resetting/detaching the project. Those cause the scheduler to assign new hostids and additional work, which will be "resent" when the old hosts are merged with the new. And the "too much work" continues.
IF you do have to reset/detach/reinstall, then add one step to the process. Just as soon as you reset or reattach - select the project again and click the "no new work" button. That way it'll only download one WU. Leave it that way until you merge the old host with the new one, and the scheduler will resend anything "lost". Of course, if it doesn't assign a new host, it'll resend all the "lost" ones, you'll see that in the message log.
After that you can "allow new work".
Walt
Walt, Thanks! your advise
)
Walt, Thanks! your advise qualifies as "sticky". I did merge the host after the last reinstall so that is likely what caused all the lost units coming back. For now, I have the "no new work" button selected and I will trim the work que to an achievable level. When the que runs down, I will see if this machine contines to over commit.
Could this patch have any
)
Could this patch have any side effects on the upload handler?
I can still receive work with a 4.19 on Linux going through a squid proxy with PW - but I cannot deliver results anymore.
resultid=7800129 ist stuck for quite some time now, upload always gives me a -127, temporarily failed upload.
As downloads still work, it cannot be a proxy or rights problem, all files belong to the user running BOINC, uploads did work before, I can "wget" the upload handler reply so there's nothing blocked either.
BOINC runs with "-return_results_immediately" because it is a slow machine.
I'm not 100% sure but windows clients seem not to be affected (I can check that on Monday)
Interesting effect : when the
)
Interesting effect : when the next result was ready, it uploaded the one that was stuck with -127 before without any trouble - now the new one (resultid=8035146) keeps giving me -127.
[=dark blue]Please refer to
)
[=dark blue]Please refer to post at:
http://einsteinathome.org/node/189802
I have no work units for one machine.[/]
I have read this thread with
)
I have read this thread with interest. I have been trying to figure out what is going on with my results for some time. It seems as if thing go fine for a few days and then all of the sudden I get a large "patch" of results that I have never seen on my machine, and never get completed. It is almost as if they were never sent to me at all but are now in my results list. I know my machine does not request work often enough to receive that many WUs.
I have just discovered that WUs may be delivered in "packets", and that the system draws on these packets for a local supply of WUs. If this is true, I wonder if what I am seeing could be the result of these packets being too large, and containing WUs that have too short deadlines. for the system to complete in the time allotted?
In any case I would like to have some idea how to prevent these large gaps of incomplete WUs that I have never seen from developing in the first place. My results are here - http://einsteinathome.org/account/tasks
Regards
Phil
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
RE: ... In any case I would
)
Try upgrading your BOINC client to 4.45 from http://boinc.berkeley.edu/download.php. This should fix the problem (details in this thread), and also download those results which you are currently missing.
RE: RE: ... In any case I
)
I have been running 4.45 since I first installed BOINC. How would I download the results I do not have? I have rest the project, I have reinstalled the BOINC software, and of cource restarted the system and the BOINC siftware many times, and nothing has changed.
Is there some way to get the server to download these WUs?.
I am thinking of tryng the Ver 5 Alpha of BOINC to see if something there will work better.
Regards
Phil
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
RE: I have been running
)
Are you sure about that? Forgive the arrogance, but unless I can't read the results page properly, you are using version 4.43. (example http://einsteinathome.org/task/8290642 under stderr out) I must retract my advice, however, as there is no version 4.45 for a Mac available for download (4.43 is the latest release version). I am afraid I can't comment on what will happen with the alpha release of BOINC, but I'd stick with the regular release version if I were you, as the ghost WU don't cause significant harm.
Nick