Upload trouble 18th Jan 2019 - [Latest occurrence: 8th March]

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3059996081
RAC: 1937569

It seems to be happening more

It seems to be happening more and more frequently, but the automatic recovery seems to be applied more and more quickly. Nothing we can do at our end.

archae86
archae86
Joined: 6 Dec 05
Posts: 3165
Credit: 7409121687
RAC: 1937302

While Shawn Kwang has

While Shawn Kwang has reported in technical news that the project is largely up and should be functioning normally despite the discouraging appearance of the server status page, my three systems remain in a failure to upload state which has persisted for many hours. For me this failure to upload condition has lasted far longer than the recent pattern.

If the project is actually working okay, perhaps I should be doing something on my systems to recover. Observations? Suggestions?

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Uploads haven't been working

Uploads haven't been working for something like +20 hours now. And can't get new work either. "No work is available for Gamma-ray pulsar binary search #1 on GPUs".

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3059996081
RAC: 1937569

I tested my Binary Radio

I tested my Binary Radio Pulsar uploads about three hours ago, and the error message then was the usual 504 Gateway Timeout, So, I reached Hannover OK, but Hannover couldn't reach the next link in the chain. I don't think there's much we users can do about that, except periodically hit 'retry' on the uploads: once one goes, they'll all follow. Like sheep. Smile

poppageek
poppageek
Joined: 13 Aug 10
Posts: 259
Credit: 2473733872
RAC: 0

Yeah me too. I REALLY need to

Yeah me too. I REALLY need to take a dump. lol 

I have several machines about to run out of work. Cry

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3059996081
RAC: 1937569

Reading Shawn's comments in

Reading Shawn's comments in the Technical News area, I think the second one (about networking) applies.

It seems that each individual dot - each server - is OK, but the lines joining them haven't been drawn yet.

Either:

  1. the next automatic restart will enable one dot to find the other dot
  2. moving from one datacentre to another datacentre means changing the IP addressing and routing tables, and that hasn't propagated yet.

 

Shawn Kwang
Shawn Kwang
Joined: 3 Nov 15
Posts: 289
Credit: 3228086
RAC: 1786

Richard Haselgrove wrote:I

Richard Haselgrove wrote:
I tested my Binary Radio Pulsar uploads about three hours ago, and the error message then was the usual 504 Gateway Timeout, So, I reached Hannover OK, but Hannover couldn't reach the next link in the chain. I don't think there's much we users can do about that, except periodically hit 'retry' on the uploads: once one goes, they'll all follow. Like sheep. Smile

Thanks for the info on what's not working. I'll ping Bernd and see what needs to be done.

Thanks for everyone's patience regarding this downtime: we're working on recovering.

Einstein@Home Project

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3059996081
RAC: 1937569

Bernd has seen these before,

Bernd has seen these before, but the (still ongoing) event log messages are


22/02/2019 14:34:23 | Einstein@Home | [http] [ID#4799] Info: Connected to einstein4.aei.uni-hannover.de (130.75.116.34) port 80 (#5395)
22/02/2019 14:34:23 | Einstein@Home | [http] [ID#4799] Sent header to server: POST /EinsteinAtHome/cgi-bin/file_upload_handler_medium HTTP/1.1
22/02/2019 14:34:23 | Einstein@Home | [http] [ID#4799] Sent header to server: Expect: 100-continue
22/02/2019 14:34:23 | Einstein@Home | [http] [ID#4799] Received header from server: HTTP/1.1 100 Continue
22/02/2019 14:34:23 | Einstein@Home | [http] [ID#4799] Info: We are completely uploaded and fine
22/02/2019 14:35:23 | Einstein@Home | [http] [ID#4799] Received header from server: HTTP/1.1 504 Gateway Time-out
22/02/2019 14:35:24 | Einstein@Home | Temporarily failed upload of p2030.20170413.G43.85-01.39.C.b3s0g0.00000_3742_0_0: transient HTTP error

So, the upload reaches Hannover OK, but your trans-atlantic(?) link is broken

poppageek
poppageek
Joined: 13 Aug 10
Posts: 259
Credit: 2473733872
RAC: 0

Uploads and downloads working

Uploads and downloads working now. Cool

 

Thanks!!

Shawn Kwang
Shawn Kwang
Joined: 3 Nov 15
Posts: 289
Credit: 3228086
RAC: 1786

poppageek wrote:Uploads and

poppageek wrote:

Uploads and downloads working now. Cool

 

Bernd manually restarted the upload file handlers. I'm still investigating the networking between UWM and AEI Hannover, which we'll need for full operations. Thanks again for everyone patience as we move toward recovery.

Einstein@Home Project

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.