By the way, please -do NOT- abort your uploads!
That is -NOT- the problem.
Aborting your uploads also aborts your tasks, which means that you lose all the work you did on them (not to mention your credit for them).
More importantly, it forces Einstein to reissue those tasks to someone else, thus delaying the completion of those scientific units. Every work unit is vital to the successful search of deep space and the advancement of man's knowledge of the universe.
p.s. I learned this the hard way a number of years ago, so I hope this information saves you from making the same mistakes I did.
;-)
like I said. I aborted like 100+ uploads. And most of them actually reported fine. It’s already uploaded server side in most cases. The client just doesn’t know. Aborting them doesn’t abort the whole task in this case. It just clears them so you can actually download new work again.
but everyone is free to make their own choice. I chose to do this so that my computers actually get back to work on the GW tasks. If you’d prefer to have your system sit idle for hours or days on end, that’s your choice *shrug*. Arguably this delays science results more than just reporting what you have and moving on.
I tend to "stand pat" with the two systems I use to crunch with, so I only check on them every few days. At times like this, I switch the box lacking work over to another project, let it download some work, then select "no new work" and check again in a few hours. My fetch cache is small, so I'm not downloading days worth of work from the secondary project. It keeps my machines busy, without dumping a ton of completed tasks which are waiting for upload. By tomorrow morning, things should be back to normal.
I understand that some of us know when an abort will not lose the work, but most do not. Just like BOINC warns, many times aborting the uploads -will- kill the work. Therefor I feel it is important to remind people reading these posts that aborting the upload might kill the work done.
I am lucky that I can keep enough tasks in my cache to work for several days before I run out of work. Most of the time the backed-off tasks eventually clear out and everything starts working again.
I only wish that I could afford fast enough machines to run out of work within a few hours. But, to those of you who can afford them, "KEEP ON TRUCKIN!" (for those unfamiliar with that old expression, it means, basically, "Hurray for the good work you are doing, and good luck continuing your work!"
I manually aborted the transfer on a bunch of my stuck GR tasks (100ish), and it didnt error out most of them. it seems the upload DOES finish server side, you just dont get the receipt back to the boinc client. it errored some of them though. oh well. whatever gets them off my system to unblock downloads and working GW again.
There are far safer ways than this to clear the stuck uploads.
The opening message from Bernd said, "... on Sundays we get a pretty high load on our main upload server ..." in describing the cause of the problem. The temporary solution was described very briefly as, "... for now we are limiting the uploads depending on the current load."
Here's my interpretation - I know nothing more than the bald statement itself. If the load reaches a certain level, automatically disallow all uploads until the load drops. When below some threshold value, turn the uploads back on again. It's a Sunday, so it's likely just a script making the decisions automatically.
We can infer that there will be regular periods when uploads are ON before going OFF again. The problem is that every time BOINC retries and is denied, the backoff grows longer. So don't madly try updating all the time, because you'll just make things worse and you get a much longer backoff as a result.
Just realise that the cycles in the load swings will probably be rather regular above and below whatever the set point is. If you judge the cycle right, you can get everything uploaded in that small window. I don't really know what the cycle time is but I'm guessing it may be something like 15 to 30 mins. If I experimented a bit It should be easy enough to find the sweet spot.
I don't have to worry about that because the control scripts I use to monitor all my hosts do notice when uploads are stuck and do force an update if necessary to correct that problem. My timezone is UTC+10 so all of this is happening on Sunday night for me and when I check the logs on Monday morning, all of the hosts were stuck at various times but all were cleared by the script actions. The logs show many pages where individual hosts were stuck but also show the problem regularly being cleared. Purely by chance the script is taking action at just the right time to hit the sweet spot.
I'm sure, anybody being prevented from downloading new tasks could clear the stuck uploads with just a little bit of guessing as to when to try. Of course, this is not a satisfactory long term fix. I do trust that Bernd will sort this out ASAP.
Cherokee150 wrote:By the
)
like I said. I aborted like 100+ uploads. And most of them actually reported fine. It’s already uploaded server side in most cases. The client just doesn’t know. Aborting them doesn’t abort the whole task in this case. It just clears them so you can actually download new work again.
but everyone is free to make their own choice. I chose to do this so that my computers actually get back to work on the GW tasks. If you’d prefer to have your system sit idle for hours or days on end, that’s your choice *shrug*. Arguably this delays science results more than just reporting what you have and moving on.
_________________________________________________________________________
I saw this issue a few weeks
)
I saw this issue a few weeks ago.
I tend to "stand pat" with the two systems I use to crunch with, so I only check on them every few days. At times like this, I switch the box lacking work over to another project, let it download some work, then select "no new work" and check again in a few hours. My fetch cache is small, so I'm not downloading days worth of work from the secondary project. It keeps my machines busy, without dumping a ton of completed tasks which are waiting for upload. By tomorrow morning, things should be back to normal.
Hope everyone has a great 2021!
Then I just need to amend
)
Then I just need to amend that sentence to:
"I guess we may just have to keep trying until Einstein's upload process starts working again." :-)
I understand that some of us
)
I understand that some of us know when an abort will not lose the work, but most do not. Just like BOINC warns, many times aborting the uploads -will- kill the work. Therefor I feel it is important to remind people reading these posts that aborting the upload might kill the work done.
I am lucky that I can keep enough tasks in my cache to work for several days before I run out of work. Most of the time the backed-off tasks eventually clear out and everything starts working again.
I only wish that I could afford fast enough machines to run out of work within a few hours. But, to those of you who can afford them, "KEEP ON TRUCKIN!" (for those unfamiliar with that old expression, it means, basically, "Hurray for the good work you are doing, and good luck continuing your work!"
;-)
SUCCESS!!! My earlier
)
SUCCESS!!!
My earlier statement came true:
"I guess we may just have to keep trying until Einstein's upload process starts working again." :-)
All my stalled uploads just worked, and all their related tasks reported!!
Einstein corrected itself and I didn't have to abort anything.
So, it's back to looking,
at the skies,
with open eyes,
for the heavens are waiting,
their secrets to reveal,
to those who appeal.
;-)
So this is why Einstein
)
So this is why Einstein uploads aren't happening.
Both my HP z210 SFF xeon with Quadro 600 and low powered generic Atom D525 with nvidia gt 710 are backlogged with completed Einstein jobs.
Temporary of course.
I'll shut down my systems for now to save power & try again tomorrow.
ubuntu wrote: So this is why
)
Try again right now, over a dozen of mine just went up because I clicked 'retry now'.
Ian&Steve C. wrote:I manually
)
There are far safer ways than this to clear the stuck uploads.
The opening message from Bernd said, "... on Sundays we get a pretty high load on our main upload server ..." in describing the cause of the problem. The temporary solution was described very briefly as, "... for now we are limiting the uploads depending on the current load."
Here's my interpretation - I know nothing more than the bald statement itself. If the load reaches a certain level, automatically disallow all uploads until the load drops. When below some threshold value, turn the uploads back on again. It's a Sunday, so it's likely just a script making the decisions automatically.
We can infer that there will be regular periods when uploads are ON before going OFF again. The problem is that every time BOINC retries and is denied, the backoff grows longer. So don't madly try updating all the time, because you'll just make things worse and you get a much longer backoff as a result.
Just realise that the cycles in the load swings will probably be rather regular above and below whatever the set point is. If you judge the cycle right, you can get everything uploaded in that small window. I don't really know what the cycle time is but I'm guessing it may be something like 15 to 30 mins. If I experimented a bit It should be easy enough to find the sweet spot.
I don't have to worry about that because the control scripts I use to monitor all my hosts do notice when uploads are stuck and do force an update if necessary to correct that problem. My timezone is UTC+10 so all of this is happening on Sunday night for me and when I check the logs on Monday morning, all of the hosts were stuck at various times but all were cleared by the script actions. The logs show many pages where individual hosts were stuck but also show the problem regularly being cleared. Purely by chance the script is taking action at just the right time to hit the sweet spot.
I'm sure, anybody being prevented from downloading new tasks could clear the stuck uploads with just a little bit of guessing as to when to try. Of course, this is not a satisfactory long term fix. I do trust that Bernd will sort this out ASAP.
Cheers,
Gary.
mikey wrote: ubuntu
)
Yes, I've repeatedly been doing that too with very little to no success. Thanks though.
The moderators latest response makes a lot of sense. I'll retry tomorrow.
Sascha Becker schrieb: Ich,
)