All Uploads completed and reported. Back to normal, for me. :-)
According to the event log, I appear to've gotten some uploads done; but the server's apparently being hammered hard enough that after a few uploads it requests my client backoff for a few minutes. My remaining upload backlog in turn is preventing me from doing any fresh downloads.
The recovery rate seems to be slowly increasing: In the ~8 hours between when I checked last night and when I got up this morning, the upload backlog on my main PC dropped from 15 screens worth of files to 14. During the 10 hours I was at work it dropped to 11 screens; a bit over twice as fast. Unless it continues to improve though, I'm still ~40 hours from being caught up. *sigh*
dunno if it's the server slowly recovering or just that people with short upload queues are getting them empty and reducing load.
Definitely continuing to improve. 12h later and I was down to a bit over 3 screens on the upload tab (maybe 250 files); current best guestimate is that I'll get fully caught up on the uploads at some point in the day while at work.
Mostly what's seemed to've happened is that while individual uploads are still erroring out semi-regularly; it takes both erroring out to make a backoff stick (if one errors and the second succeeds the backoff is cleared), so the number each upload to backoff cycle gets up is increasing steadily.
Hopefully the continuing recovery will put more of the WU types available for download again. One of my GPUs is down to backup project tasks already; and both only have about 1 remaining set of CPU tasks left.
Are things fully fixed now? While I was in the shower on my my boxes cleared all of it's non individually backed off tasks; the second cleared the last hundredish after a manual retry. In both cases all of the individually backed off tasks also went up when I triggered retries and the boxes started downloading S6 tasks again.
We set up a new upload server and enabled uploads for the time being. So far the performance is just fine - as it should be. However, we're still transferring data between various servers which is why validation and assimilation are still not online again. We experienced some additional network problems (thanks to Murphy) which slowed down the overall progress. We still hope to be done with the full restore of the project by tomorrow. Crossing fingers...
All machines but one have cleared their backlogs. I have hit the retry button on the remaining machine and is seems to be behaving as expected today. Yesterday the retry button did not work well.
Finally. All machines are clear.
Nice work guys. An excellent example of "poise under pressure".
I ditto on “good work, guys.â€Â All my transfers quietly took place as my computer started up this morning. Been there/done that and am happy you got through.
I'm down to just 2 stubborn
)
I'm down to just 2 stubborn files...from over 60.
Free your MIND, and the rest will follow!
RE: RE: RE: All Uploads
)
Definitely continuing to improve. 12h later and I was down to a bit over 3 screens on the upload tab (maybe 250 files); current best guestimate is that I'll get fully caught up on the uploads at some point in the day while at work.
Mostly what's seemed to've happened is that while individual uploads are still erroring out semi-regularly; it takes both erroring out to make a backoff stick (if one errors and the second succeeds the backoff is cleared), so the number each upload to backoff cycle gets up is increasing steadily.
Hopefully the continuing recovery will put more of the WU types available for download again. One of my GPUs is down to backup project tasks already; and both only have about 1 remaining set of CPU tasks left.
Are things fully fixed now?
)
Are things fully fixed now? While I was in the shower on my my boxes cleared all of it's non individually backed off tasks; the second cleared the last hundredish after a manual retry. In both cases all of the individually backed off tasks also went up when I triggered retries and the boxes started downloading S6 tasks again.
We set up a new server with a
)
We set up a new server with a new filesystem, and new upload handler configuration. Apparently working better than ever before.
Validators and the rest will remain turned off until the old data has been sync'ed to the new server.
BM
BM
Update: We set up a new
)
Update:
We set up a new upload server and enabled uploads for the time being. So far the performance is just fine - as it should be. However, we're still transferring data between various servers which is why validation and assimilation are still not online again. We experienced some additional network problems (thanks to Murphy) which slowed down the overall progress. We still hope to be done with the full restore of the project by tomorrow. Crossing fingers...
Stay tuned,
Oliver
Einstein@Home Project
Scotty! You've just earned
)
Scotty! You've just earned your pay for the week.
All machines but one have
)
All machines but one have cleared their backlogs. I have hit the retry button on the remaining machine and is seems to be behaving as expected today. Yesterday the retry button did not work well.
Finally. All machines are clear.
Nice work guys. An excellent example of "poise under pressure".
I ditto on “good work,
)
I ditto on “good work, guys.â€Â All my transfers quietly took place as my computer started up this morning. Been there/done that and am happy you got through.
Professional career: consultant/organizational researcher
Alternative job: Dodge-Chrysler-Jeep car site guy
Any news on BRP work for
)
Any news on BRP work for Raspberry Pi i.e. ARM
RE: Scotty! You've just
)
+1