Project downtime tomorrow

ForumsTechnical News

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 3629
Credit: 133271062
RAC: 106872
Topic 197746

Einstein@Home will be shut down tomorrow (Wednesday Oct 8) morning (CEST) to perform some urgently necessary database work. We expect this to take a couple of hours.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 1721
Credit: 68432904
RAC: 57522

Project downtime tomorrow

Server seems to be back up (I can post here!), but I'm getting a connection error when I try to report completed tasks.

Quote:
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Info: Connected to einstein.phys.uwm.edu (129.89.61.70) port 80 (#5142)
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Info: Adding handle: conn: 0x37dfe80
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Info: Adding handle: send: 0
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Info: Adding handle: recv: 0
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Info: Curl_addHandleToPipeline: length: 1
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Info: - Conn 5142 (0x37dfe80) send_pipe: 1, recv_pipe: 0
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: POST /EinsteinAtHome_cgi/cgi HTTP/1.1
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.4.22)
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: Host: einstein.phys.uwm.edu
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: Accept: */*
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: Accept-Language: en_GB
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: Content-Length: 190700
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server: Expect: 100-continue
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Sent header to server:
08/10/2014 16:33:57 | Einstein@Home | [http] [ID#1] Received header from server: HTTP/1.1 100 Continue
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Received header from server: HTTP/1.1 404 Not Found
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Received header from server: Date: Wed, 08 Oct 2014 15:29:15 GMT
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Info: Server Apache/2.2.3 (CentOS) is not blacklisted
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Received header from server: Server: Apache/2.2.3 (CentOS)
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Received header from server: Content-Length: 306
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Received header from server: Content-Type: text/html; charset=iso-8859-1
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Info: HTTP error before end of send, stop sending
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Received header from server:
08/10/2014 16:33:59 | Einstein@Home | [http] [ID#1] Info: Closing connection 5142
08/10/2014 16:34:00 | Einstein@Home | Scheduler request failed: HTTP file not found

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 3629
Credit: 133271062
RAC: 106872

Yep, the scheduler URL was

Yep, the scheduler URL was changed.

Do you happen to know how from the project side we can instruct the clients to read the new URL from the "Master URL" (i.e. index page)?

According to the client code the client should do this automatically after 10 consecutive failures, which may take a while.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 1721
Credit: 68432904
RAC: 57522

Yes, that worked. After a few

Yes, that worked. After a few manual updates (bypassing the 4-hour backoff each time), it found the new

http://einstein5.aei.uni-hannover.de/EinsteinAtHome_cgi/cgi and we're back in business, with new work downloaded and running.

Edit - I don't think you can 'instruct' the client to do anything without it contacting the scheduler first - and once that's happened, you don't need to tell it to do anything else. Just wait, and let time (and itchy trigger fingers) do the rest.

Mumak
Joined: 26 Feb 13
Posts: 204
Credit: 227036985
RAC: 253889

Yep, for me too - I needed to

Yep, for me too - I needed to do about 4-5 Update requests.

Tom__3
Tom*
Joined: 9 Oct 11
Posts: 52
Credit: 45729867
RAC: 42631

Very painless as it doesn't

Very painless as it doesn't need to timeout just gets a file not found
5th update gets the master file.

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 3629
Credit: 133271062
RAC: 106872

That was a pretty long day

That was a pretty long day for us. The basic things should be working again. Some minor things (stats export, scheduler log publishing, db purging) don't work yet, but we'll do this after getting some sleep. Tomorrow I may also give a more extensive report on what we actually did.

BM

BM

archae86
archae86
Joined: 6 Dec 05
Posts: 1866
Credit: 400580812
RAC: 787210

RE: 5th update gets the

Quote:
5th update gets the master file.


Perhaps there is a difference depending on whether work is being requested.

Three of my PC's that wanted work seemed to take about 5 update requests each, but my laptop, which was off all night and had work to report but none to request, logged eleven "Scheduler request failed: HTTP file not found" entries before finally doing the "Fetching scheduler list, Master file download succeeded" pair, after which the next update request succeeded.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 1721
Credit: 68432904
RAC: 57522

RE: RE: 5th update gets

Quote:
Quote:
5th update gets the master file.

Perhaps there is a difference depending on whether work is being requested.

Three of my PC's that wanted work seemed to take about 5 update requests each, but my laptop, which was off all night and had work to report but none to request, logged eleven "Scheduler request failed: HTTP file not found" entries before finally doing the "Fetching scheduler list, Master file download succeeded" pair, after which the next update request succeeded.


Computers which were active during the (European day / American night) probably got through their first few attempts during the 'down for maintenance' period, so fewer were needed to reach the "after 10 consecutive failures" trigger that Bernd mentioned. If the machine has been off, you need to do them all yourself.

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1073
Credit: 289007760
RAC: 251373

I had several tasks waiting

I had several tasks waiting to be sent back and after a few tries it started to work again and sent and received once again.

(and I am back to having all 7 hosts running again)

Mike Hewson
Mike Hewson
Joined: 1 Dec 05
Posts: 5150
Credit: 42435366
RAC: 14819

Minor problem with thread

Minor problem with thread marking : just on reading a thread it wasn't marked as read, but the "Mark all threads as read" button fixed it.

But now having just tested again via reading, it's fine now. Oh well .... :-)

Cheers, Mike.

pascal_sig.jpg

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.