Your E@H team would be better advised to screen all these S@H refugees more closely in future. I suspect an unnoticed foreign body was brought onboard and got into the air conditioning system, causing the failure.
Maybe we should quarantine all the S@H refugees until further notice... ;-)
I also wish to state that I am highly impressed at the speed of recovery after that long of an outage. I saw very little lag time to seeing uploads happen, reports going through, and new work being downloaded. I know that this wasn't the case for everyone, because of location, and understand getting all the servers across the world in synch takes time, but still, WOW!
This project really is on it's toes, and is really well set up. Of course they have had help by watching other projects, and then doing it several times better.
I also wish to state that I am highly impressed at the speed of recovery after that long of an outage......
This is actually the most impressive part of the whole saga. Based on the experiences with the Seti servers after a long outage, one would expect to see some difficulties in getting results uploaded and reported, and new work downloaded. In my personal experience, I had 80+ machines with several thousand results to upload and report and all hungry for new work.
Virtually all of these boxes needed to be "kick started" because they were all out of work and had communications deferred for intervals of up to 300 hours!!! There was no way I was going to let "nature take its course" :). So, one by one in rapid succession, I made sure each machine's stuck results were uploaded and then updated. It took several hours to do them all. I was probably helped by the timezone as the servers had been up for an hour or two before I started. However, I can't say I ever saw an operation that needed to be retried. Every server contact was handled with little if any abnormal delay. The servers seemed to be able to cope with whatever was being thrown at them!!!
The servers are obviously well designed for the job with plenty of spare capacity for situations like this. Congratulations to all involved!!
I spent most of a day babysitting our servers after restarting the project. At one point we had about 300 machines simultaneously uploading results and downloading new work. The only real bottleneck was validation, and I was able to fix that by running five copies of the validator at the same time.
We really try hard to keep the project up and running 100% of the time. Unfortunately we have still not received any project funding, although I am quite hopefull that the US National Science Foundation will provide funding for us in the future. If this happens we can hire a couple of professionals to help take care of our hardware and software, which should greatly improve our reliability and capability to deal with unexpected problems.
We really try hard to keep the project up and running 100% of the time. Unfortunately we have still not received any project funding, although I am quite hopefull that the US National Science Foundation will provide funding for us in the future. If this happens we can hire a couple of professionals to help take care of our hardware and software, which should greatly improve our reliability and capability to deal with unexpected problems.
Wow! Bruce - you have performed above and beyond the call of duty!
Thanks Bruce. Your E@H
)
Thanks Bruce.
Your E@H team would be better advised to screen all these S@H refugees more closely in future. I suspect an unnoticed foreign body was brought onboard and got into the air conditioning system, causing the failure.
Maybe we should quarantine all the S@H refugees until further notice... ;-)
I also wish to state that I
)
I also wish to state that I am highly impressed at the speed of recovery after that long of an outage. I saw very little lag time to seeing uploads happen, reports going through, and new work being downloaded. I know that this wasn't the case for everyone, because of location, and understand getting all the servers across the world in synch takes time, but still, WOW!
This project really is on it's toes, and is really well set up. Of course they have had help by watching other projects, and then doing it several times better.
Kudos! Keep up the impressive work.
Thanks to all that got
)
Thanks to all that got Einstein back up again!
And a special thanks to Bruce, brilliant guy... ;)
Human Stupidity Is Infinite...
RE: I also wish to state
)
This is actually the most impressive part of the whole saga. Based on the experiences with the Seti servers after a long outage, one would expect to see some difficulties in getting results uploaded and reported, and new work downloaded. In my personal experience, I had 80+ machines with several thousand results to upload and report and all hungry for new work.
Virtually all of these boxes needed to be "kick started" because they were all out of work and had communications deferred for intervals of up to 300 hours!!! There was no way I was going to let "nature take its course" :). So, one by one in rapid succession, I made sure each machine's stuck results were uploaded and then updated. It took several hours to do them all. I was probably helped by the timezone as the servers had been up for an hour or two before I started. However, I can't say I ever saw an operation that needed to be retried. Every server contact was handled with little if any abnormal delay. The servers seemed to be able to cope with whatever was being thrown at them!!!
The servers are obviously well designed for the job with plenty of spare capacity for situations like this. Congratulations to all involved!!
Cheers,
Cheers,
Gary.
Thank you very much. I
)
Thank you very much.
I spent most of a day babysitting our servers after restarting the project. At one point we had about 300 machines simultaneously uploading results and downloading new work. The only real bottleneck was validation, and I was able to fix that by running five copies of the validator at the same time.
We really try hard to keep the project up and running 100% of the time. Unfortunately we have still not received any project funding, although I am quite hopefull that the US National Science Foundation will provide funding for us in the future. If this happens we can hire a couple of professionals to help take care of our hardware and software, which should greatly improve our reliability and capability to deal with unexpected problems.
Cheers,
Bruce
Director, Einstein@Home
RE: We really try hard to
)
Wow! Bruce - you have performed above and beyond the call of duty!
We salute you!
John, I agree with your
)
John, I agree with your comments regarding Kudos for Bruce...
I was unaware of the funding situation. Cash flow is always important!!