unexpected Web downtime 2019-02-20 and 21

Shawn Kwang
Shawn Kwang
Moderator
Administrator
Joined: 3 Nov 15
Posts: 289
Credit: 1,097,420
RAC: 1,645
Topic 218236

On 2019-02-20, at about 1930 UTC there was a power outage at UWM. The E@H Web site front-end went down when the power shut off, but power has been restored. Thanks for your patience.

Update: After power was restored to UWM, the data-center which houses the E@H infrastructure had a cooling failure. The end-result is that we moved servers to a new data-center, a move which we had anticipated in one week. Instead we accelerated the schedule. The result was a the long downtime where the Web site was unavailable. However, the Web site is back up, and the project itself should have continued to run without serious interruption. Again, volunteers should not have to do anything with their clients.

Please report any problems to the Problems and Bug Reports Forum.

PS- Albert@Home, being the test server, was not moved to the new data-center. It has been shutdown for over a day due to the lack of adequate cooling. Right now there is no known schedule for its return, but we hope to work on the cooling problem today.

Einstein@Home Project

Shawn Kwang
Shawn Kwang
Moderator
Administrator
Joined: 3 Nov 15
Posts: 289
Credit: 1,097,420
RAC: 1,645

Re the Server Status page: It

Re the Server Status page: It looks like the server status page is not working; it says everything is down. This is probably due to the networking at UWM is not fully operational yet after the power outage and data-center migration. I believe the problem is that the server status checks themselves are not working, and not that the components are down. Meaning the project is mostly operational but the status says otherwise.

Einstein@Home Project

Bent Vangli
Bent Vangli
Joined: 6 Apr 11
Posts: 23
Credit: 725,285,598
RAC: 0

Not quiet. I have several

Not quiet. I have several tasks hanging in upload and claiming that the project is holding back.

Best luck in fixing. :-)

Bent, Oslo, Norway

Dataman
Dataman
Joined: 16 Feb 08
Posts: 7
Credit: 887,837,553
RAC: 20,724

Stat's were not exported this

Stat's were not exported this morning.

[img]http://boincstats.com/signature/-1/user/544/3/sig.png[/img]
[img]http://signature.statseb.fr/sig.py?id=136[/img]

Manfred Reiff
Manfred Reiff
Joined: 27 Apr 18
Posts: 5
Credit: 21,616,010
RAC: 0

Me, too. Despite upload

Me, too.

Despite upload problems I have to do crunching because the last six WUs will expire tomorrow, Feb 23 at 1:53pm.

According to the BOINC manager progress is at 100%. But I wonder why the manager says size = 790/357, 898/465, 824/391, 843/410, 787/354 and 894/461 bytes for those WUs that are still "uploaded". Because of the server problems I'm unable to say what size the manager normally shows. Maybe these informations will help to solve the problem(s).

Will I receive new WUs despite upload problems?

BTW... I made a hardcopy. If interested I will send it to you (but it is in german, not english).

 

 

Shawn Kwang
Shawn Kwang
Moderator
Administrator
Joined: 3 Nov 15
Posts: 289
Credit: 1,097,420
RAC: 1,645

Manfred Reiff wrote: Will I

Manfred Reiff wrote:

Will I receive new WUs despite upload problems?

I don't know the status of getting new WUs. Again, as pointed out, our system is not full-operational due to the lack of networking. It's high on my priority list to get things back together.

Einstein@Home Project

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 828
Credit: 745,180,970
RAC: 1,332,938

Can't get any work I assume

Can't get any work I assume because the schedulers can't be contacted.

 

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 3,938
Credit: 199,961,025
RAC: 46,081

The connection between the

The connection between the project hosts is still not working. This affects not only the server status page, but also e.g. the workunit generators and some technicalities related to file upload. I set up a local workaround that should help withe the most urgent things (file upload, work generation). But e.g. stats export will have to wait until the system is fully functional again.

BM

Manfred Reiff
Manfred Reiff
Joined: 27 Apr 18
Posts: 5
Credit: 21,616,010
RAC: 0

Hi Shawn, it seems that

Hi Shawn,

it seems that uploading WUs is working properly since a couple of minutes. All my six WUs have been successfully uploaded, I also received credits.

So I will try to crunch the remaining "old" WUs. Hopefully I will receive new WUs.

Thanks for your work!

 

 

Manfred Reiff
Manfred Reiff
Joined: 27 Apr 18
Posts: 5
Credit: 21,616,010
RAC: 0

Since 15 minutes uploading

Since 15 minutes uploading WUs is possible again (since 1655 UTC). I also earned new credits. Homepage is working fine.

At present I'm working on the remaining WUs (exp. Febr 23, 1:53pm). I hope to download new WUs soon. We will see...

Shawn Kwang
Shawn Kwang
Moderator
Administrator
Joined: 3 Nov 15
Posts: 289
Credit: 1,097,420
RAC: 1,645

I believe the UWM networking

I believe the UWM networking issue has been resolved (not by me: a colleague did the work). The result is that the connection between UWM and AEI-Hannover should be operational, and the parts of the project that Bernd mentioned should work again.

Einstein@Home Project

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.