"Project is down" for 19 hours now

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1,303
Credit: 415,778,917
RAC: 83,432

4/23/2009 1:29:21

Message 92510 in response to message 92485

4/23/2009 1:29:21 PM|Einstein@Home|Sending scheduler request: Requested by user. Requesting 1138606 seconds of work, reporting 19 completed tasks
4/23/2009 1:29:26 PM|Einstein@Home|Scheduler request succeeded: got 0 new tasks
4/23/2009 1:29:26 PM|Einstein@Home|Message from server: Project is temporarily shut down for maintenance

Well I guess it will be ok since this dual machine still has 10 more as long as the due date for all the finished ones won't be making this pc feel bad.

But I have 2 others running astropulse and one of them doesn't like that for some reason (AMD 3200) and then my 4th worker lives in another State hundreds of miles away and I have to email it again to tell it why it is sitting there idle doing nothing.

In fact I just did a clean install of XP pro on the AMD machine and it lost it's E work and now has not been able to contact the server so it has nothing but zero's in it's project stats after all it's years of work.

(not sure why it doesn't like astropulse but it sure is a clean machine right now)

 

paul milton
paul milton
Joined: 16 Sep 05
Posts: 329
Credit: 35,825,044
RAC: 0

4/24/2009 10:46:31

Message 92511 in response to message 92510

4/24/2009 10:46:31 AM|Einstein@Home|Sending scheduler request: To report completed tasks. Requesting 209682 seconds of work, reporting 3 completed tasks
4/24/2009 10:48:36 AM|Einstein@Home|Scheduler request failed: HTTP internal server error

hmm, some how, i dont think where quite back yet.

seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.

archae86
archae86
Joined: 6 Dec 05
Posts: 2,813
Credit: 3,227,268,516
RAC: 2,604,246

Note the update on the front

Note the update on the front Einstein page as of 14:47 UTC April 24 reads:

Quote:
Apr 24, 2009
We have completed the database maintenance and are restarting the Einstein@Home project. Hopefully this will go well, but please be patient as it may take some time before everything is working smoothly again. We are advancing the reporting deadlines for work in progress by 10 days, so that work already completed is not marked as unreturned. Because of the work on the database, contributors will NOT be able to see the status of work that was already completed in the past. However in the future, as before, contributors WILL be able to track this status. No contributor credits have been forgotten or lost.

I triggered an update request while making this post, and it failed with

Quote:
Scheduler request failed: HTTP internal server error

instead of the message:

Quote:
Message from server: Project is temporarily shut down for maintenance

Another host, which had put itself on 24 hour wait because of repeated failures to fetch a scheduler list was able to get a scheduler list on a forced update, but then got no work with the message:

Quote:

Message from server: Server can't open database


A third host, also on scheduler list hold, actually got through, was assigned work, and downloaded perhaps a dozen WU's. Once initiated, the downloads went well, typically with two flowing at once at rates of perhaps 125 kbytes/sec.

So, some I see much more movement, and possibly my failures this morning are just symptoms of excess load during the recovery.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,516
Credit: 454,599,660
RAC: 44,852

Hi! Things seem to clear

Hi!

Things seem to clear up, my hosts are able to get new work and report completed work one by one.

Now, there will be some messages that you'll see and which are related to the outage, most of them harmless:

Quote:

Completed result xxxx refused: result already reported

This means that the client is reporting a result as completed for a second time. This can happen when the server took so long to do its job marking the results as completed during the first contact with the client that the client had already given up or lost the connection in the meantime. So the server and client are out of sync: the server has marked the results as completed already, but the client never got a confirmation about this, so to be on the safe side the client assumes it has to retry. No harm done, after a second try server and client are in sync again, nothing was lost.

Quote:

Resent lost result xxxxx

This must be something similar: the client requested new work, the server even managed to generate new work and marked it as assigned to your hosts, but the message never got thru to your client. When client and server communicate the next time, the server notices that the client is out of sync and resends the results. This message is a bit misleading as the term "result" suggests the outcome of the computation was lost. However, in BOINC-speak, a "result" is synonymous to a "task". The "completed result" is sent back to the server later.

I also get some timeout, http and even "temporarily shut down for maintenance" messages now, but in general the scheduler seems to work ok now. As more and more clients unload cache-loads of completed results to the server, it should get better by the hour...hopefully. Keep your fingers crossed.

EDIT:

Another frequent message:

Quote:

Temporarily failed download of ....

With so many hosts trying to refill their work caches, the download mirrors are under a lot of stress. The BOINC client will automatically retry to download those files, no intervention is needed.

CU

Bikeman

Jord
Joined: 26 Jan 05
Posts: 2,952
Credit: 5,673,662
RAC: 375

RE: typically with two

Message 92514 in response to message 92512

Quote:
typically with two flowing at once at rates of perhaps 125 kbytes/sec.


I must've been on the right side of the downloads, then... ;-)

example given:
24-Apr-09 14:39:02 Einstein@Home [file_xfer_debug] Throughput 569020 bytes/sec
24-Apr-09 14:39:10 Einstein@Home [file_xfer_debug] Throughput 617746 bytes/sec
24-Apr-09 14:39:20 Einstein@Home [file_xfer_debug] Throughput 596606 bytes/sec

paul milton
paul milton
Joined: 16 Sep 05
Posts: 329
Credit: 35,825,044
RAC: 0

yeah i things are gonna be

Message 92515 in response to message 92514

yeah i things are gonna be buggy for a while..

4/24/2009 12:27:01 PM|Einstein@Home|Sending scheduler request: To report completed tasks. Requesting 211960 seconds of work, reporting 3 completed tasks
4/24/2009 12:27:06 PM|Einstein@Home|Scheduler request succeeded: got 1 new tasks
4/24/2009 12:27:06 PM|Einstein@Home|Message from server: Completed result h1_0296.50_S5R4__86_S5R5a_1 refused: result already reported as success
4/24/2009 12:27:06 PM|Einstein@Home|Message from server: Completed result h1_0296.50_S5R4__73_S5R5a_1 refused: result already reported as success
4/24/2009 12:27:06 PM|Einstein@Home|Message from server: Completed result h1_0296.50_S5R4__56_S5R5a_0 refused: result already reported as success
4/24/2009 12:27:06 PM|Einstein@Home|Message from server: Resent lost result h1_0621.40_S5R4__429_S5R5a_1
4/24/2009 12:27:09 PM|Einstein@Home|Started download of skygrid_0630Hz_S5R5.dat
4/24/2009 12:27:09 PM|Einstein@Home|Started download of h1_0621.40_S5R4
4/24/2009 12:27:31 PM||Project communication failed: attempting access to reference site
4/24/2009 12:27:31 PM|Einstein@Home|Temporarily failed download of skygrid_0630Hz_S5R5.dat: connect() failed
4/24/2009 12:27:31 PM|Einstein@Home|Temporarily failed download of h1_0621.40_S5R4: connect() failed
4/24/2009 12:27:31 PM|Einstein@Home|Started download of l1_0621.40_S5R4
4/24/2009 12:27:31 PM|Einstein@Home|Started download of h1_0621.45_S5R4
4/24/2009 12:27:32 PM||Internet access OK - project servers may be temporarily down.
4/24/2009 12:27:52 PM||Project communication failed: attempting access to reference site
4/24/2009 12:27:52 PM|Einstein@Home|Temporarily failed download of h1_0621.45_S5R4: connect() failed
4/24/2009 12:27:52 PM|Einstein@Home|Started download of l1_0621.45_S5R4
4/24/2009 12:27:53 PM||Internet access OK - project servers may be temporarily down.
4/24/2009 12:28:06 PM|Einstein@Home|Sending scheduler request: To fetch work. Requesting 135527 seconds of work, reporting 0 completed tasks
4/24/2009 12:28:11 PM|Einstein@Home|Scheduler request succeeded: got 2 new tasks
4/24/2009 12:28:11 PM|Einstein@Home|Got server request to delete file h1_0216.30_S5R4
4/24/2009 12:28:11 PM|Einstein@Home|Got server request to delete file l1_0216.30_S5R4
sniped 18 of the above

no biggy, im just glad where back :)

seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 1,935
Credit: 259,925,712
RAC: 238,474

I'm finding that some of my

I'm finding that some of my boxes are getting work for the first time since Windows application v3.05 was released - so that has to be downloaded too, which all adds to the congestion. But it's all getting through, slowly.

J Langley
J Langley
Joined: 30 Dec 05
Posts: 50
Credit: 58,338
RAC: 0

RE: RE: Agreed, there is

Message 92517 in response to message 92505

Quote:
Quote:
Agreed, there is nothing that says you have to use MySQL for the BOINC database.

There is: http://boinc.berkeley.edu/trac/wiki/SoftwarePrereqsUnix and http://boinc.berkeley.edu/trac/wiki/DataBase

There is nothing that says that you have to use MySQL for the science database.

Doesn't MySQL's concept of "Storage Engine" allow the default DB (InnoDB?) to be replaced by another (e.g. DB2)? Okay this doesn't remove the software cost issue, and presumably wouldn't be as fast as BOINC talking to the other DB natively, but it might be a useful alternative for some projects.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,516
Credit: 454,599,660
RAC: 44,852

There are a couple of

There are a couple of "storage engines" that you can use with MySQL, but you cannot just combine MySQL with any RDBMS engine like Oracle. IBM made an effort to open "DB2 for i" for MySQL, I think this is still in beta test.

As for the future of MySQL now that it will be owned by Oracle: There's an interesting development as a former MySQL founder has spawned a free and independent branch of MySQL , MariaDB, which is meant to be compatible with the "regular" MySQL. Kind of CentOs Linux and Red Hat Linux... . So MySQL is here to stay, one way or the other.

CU
Bikeman

J Langley
J Langley
Joined: 30 Dec 05
Posts: 50
Credit: 58,338
RAC: 0

RE: There are a couple of

Message 92519 in response to message 92518

Quote:
There are a couple of "storage engines" that you can use with MySQL, but you cannot just combine MySQL with any RDBMS engine like Oracle. IBM made an effort to open "DB2 for i" for MySQL, I think this is still in beta test.

You're right, I thought there was an production-level release, but it looks like it's still in Beta. Quite a few other engines to choose from though: http://dev.mysql.com/doc/refman/5.1/en/pluggable-storage-overview.html

Still, if BOINC used ODBC (or equivalent), there wouldn't be a MySQL limitation at all...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.