Project downtime tomorrow

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250444187
RAC: 35219
Topic 197746

Einstein@Home will be shut down tomorrow (Wednesday Oct 8) morning (CEST) to perform some urgently necessary database work. We expect this to take a couple of hours.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2956276454
RAC: 715959

Project downtime tomorrow

Server seems to be back up (I can post here!), but I'm getting a connection error when I try to report completed tasks.

Quote:
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Info: Connected to einstein.phys.uwm.edu (129.89.61.70) port 80 (#5142)
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Info: Adding handle: conn: 0x37dfe80
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Info: Adding handle: send: 0
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Info: Adding handle: recv: 0
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Info: Curl_addHandleToPipeline: length: 1
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Info: - Conn 5142 (0x37dfe80) send_pipe: 1, recv_pipe: 0
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: POST /EinsteinAtHome_cgi/cgi HTTP/1.1
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.4.22)
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: Host: einstein.phys.uwm.edu
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: Accept: */*
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: Accept-Language: en_GB
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: Content-Length: 190700
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: Expect: 100-continue
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server:
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Received header from server: HTTP/1.1 100 Continue
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Received header from server: HTTP/1.1 404 Not Found
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Received header from server: Date: Wed, 08 Oct 2014 15:29:15 GMT
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Info: Server Apache/2.2.3 (CentOS) is not blacklisted
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Received header from server: Server: Apache/2.2.3 (CentOS)
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Received header from server: Content-Length: 306
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Received header from server: Content-Type: text/html; charset=iso-8859-1
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Info: HTTP error before end of send, stop sending
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Received header from server:
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Info: Closing connection 5142
08/10/2014 16:34:00 | Einstein@Home | Scheduler request failed: HTTP file not found
Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250444187
RAC: 35219

Yep, the scheduler URL was

Yep, the scheduler URL was changed.

Do you happen to know how from the project side we can instruct the clients to read the new URL from the "Master URL" (i.e. index page)?

According to the client code the client should do this automatically after 10 consecutive failures, which may take a while.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2956276454
RAC: 715959

Yes, that worked. After a few

Yes, that worked. After a few manual updates (bypassing the 4-hour backoff each time), it found the new

http://einstein5.aei.uni-hannover.de/EinsteinAtHome_cgi/cgi and we're back in business, with new work downloaded and running.

Edit - I don't think you can 'instruct' the client to do anything without it contacting the scheduler first - and once that's happened, you don't need to tell it to do anything else. Just wait, and let time (and itchy trigger fingers) do the rest.

Mumak
Joined: 26 Feb 13
Posts: 325
Credit: 3519468424
RAC: 1610975

Yep, for me too - I needed to

Yep, for me too - I needed to do about 4-5 Update requests.

-----

Tom*
Tom*
Joined: 9 Oct 11
Posts: 54
Credit: 366729484
RAC: 0

Very painless as it doesn't

Very painless as it doesn't need to timeout just gets a file not found
5th update gets the master file.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250444187
RAC: 35219

That was a pretty long day

That was a pretty long day for us. The basic things should be working again. Some minor things (stats export, scheduler log publishing, db purging) don't work yet, but we'll do this after getting some sleep. Tomorrow I may also give a more extensive report on what we actually did.

BM

BM

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7220654931
RAC: 940689

RE: 5th update gets the

Quote:
5th update gets the master file.


Perhaps there is a difference depending on whether work is being requested.

Three of my PC's that wanted work seemed to take about 5 update requests each, but my laptop, which was off all night and had work to report but none to request, logged eleven "Scheduler request failed: HTTP file not found" entries before finally doing the "Fetching scheduler list, Master file download succeeded" pair, after which the next update request succeeded.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2956276454
RAC: 715959

RE: RE: 5th update gets

Quote:
Quote:
5th update gets the master file.

Perhaps there is a difference depending on whether work is being requested.

Three of my PC's that wanted work seemed to take about 5 update requests each, but my laptop, which was off all night and had work to report but none to request, logged eleven "Scheduler request failed: HTTP file not found" entries before finally doing the "Fetching scheduler list, Master file download succeeded" pair, after which the next update request succeeded.


Computers which were active during the (European day / American night) probably got through their first few attempts during the 'down for maintenance' period, so fewer were needed to reach the "after 10 consecutive failures" trigger that Bernd mentioned. If the machine has been off, you need to do them all yourself.

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1886
Credit: 1406974592
RAC: 1173807

I had several tasks waiting

I had several tasks waiting to be sent back and after a few tries it started to work again and sent and received once again.

(and I am back to having all 7 hosts running again)

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6588
Credit: 316063178
RAC: 333283

Minor problem with thread

Minor problem with thread marking : just on reading a thread it wasn't marked as read, but the "Mark all threads as read" button fixed it.

But now having just tested again via reading, it's fine now. Oh well .... :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.