Project downtime tomorrow

Herman van Kempen

Joined: 21 May 09

Posts: 18

Credit: 380406532

RAC: 28064

After running outof work I

8 Oct 2014 22:32:45 UTC

Message 124082

(moderation:

)

After running outof work I get this:

9-10-2014 0:05:25 | Einstein@Home | Requesting new tasks for CPU and ATI
9-10-2014 0:05:35 | Einstein@Home | Scheduler request failed: HTTP file not found

As I am not a specialist in programming, perhaps one can indicate where I have to change to the new scheduler URL
It would have been more user-friendly if this information had been given before the system shutdown yesterday.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7220854931

RAC: 950886

RE: perhaps one can

8 Oct 2014 22:41:36 UTC

Message 124083 in response to message 124082

(moderation:

)

Quote:

perhaps one can indicate where I have to change to the new scheduler URL

As mentioned in previous posts in this thread, referenced also by a thread in the problems and bug reports board , the application will accumulate about 10 failures, then automatically get a new scheduler list, after which normal function resumes.

If you are in a hurry, you can just click Update a few times. Otherwise it will fix itself in time.

Herman van Kempen

Joined: 21 May 09

Posts: 18

Credit: 380406532

RAC: 28064

It works!! I should have been

8 Oct 2014 22:56:42 UTC

Message 124084

(moderation:

)

It works!! I should have been more patient.
Thank you very much archae86

David S

Joined: 6 Dec 05

Posts: 2473

Credit: 22936222

RAC: 0

RE: RE: RE: 5th update

8 Oct 2014 23:06:51 UTC

Message 124085 in response to message 124079

(moderation:

)

Quote:

Quote:
Quote:
5th update gets the master file.

Perhaps there is a difference depending on whether work is being requested.

Three of my PC's that wanted work seemed to take about 5 update requests each, but my laptop, which was off all night and had work to report but none to request, logged eleven "Scheduler request failed: HTTP file not found" entries before finally doing the "Fetching scheduler list, Master file download succeeded" pair, after which the next update request succeeded.

Computers which were active during the (European day / American night) probably got through their first few attempts during the 'down for maintenance' period, so fewer were needed to reach the "after 10 consecutive failures" trigger that Bernd mentioned. If the machine has been off, you need to do them all yourself.

My primary cruncher is always on, but it wasn't asking for new work, so it may not have tried at all during the outage to report the three it had finished. I had to kick it eleven times before it downloaded the Master file.

David

Miserable old git
Patiently waiting for the asteroid with my name on it.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117550630072

RAC: 35322095

RE: RE: RE: 5th update

8 Oct 2014 23:25:19 UTC

Message 124086 in response to message 124079

(moderation:

)

Quote:

Quote:
Quote:
5th update gets the master file.

Perhaps there is a difference depending on whether work is being requested.

Three of my PC's that wanted work seemed to take about 5 update requests each, but my laptop, which was off all night and had work to report but none to request, logged eleven "Scheduler request failed: HTTP file not found" entries before finally doing the "Fetching scheduler list, Master file download succeeded" pair, after which the next update request succeeded.

Computers which were active during the (European day / American night) probably got through their first few attempts during the 'down for maintenance' period, so fewer were needed to reach the "after 10 consecutive failures" trigger that Bernd mentioned. If the machine has been off, you need to do them all yourself.

The version of BOINC matters as well. The bulk of my hosts don't ever request work 'on their own'. Their cache settings are manipulated from an external script that makes sure they have up-to-date common data files before making a work request. These controlled work requests are rather infrequent. Those machines on more 'current' versions of BOINC will report work soon after completion and hence will have made a number of contacts anyway without requesting work but those on v6 BOINCs will not have made contact. They report about once per day when not requesting work.

I've just 'updated' machines at home and those on v6 needed a full 12 clicks whilst those on 7.2.42 needed just a couple. I'll have to head off shortly and attend to a very much larger group at a different location. Fortunately most of them are on 7.2.42 so just completing and reporting tasks should get them out of trouble on their own.

Cheers,
Gary.

Mike.Gibson

Joined: 17 Dec 07

Posts: 21

Credit: 3759410

RAC: 0

08/10/2014 23:03:09 |

9 Oct 2014 0:56:33 UTC

Message 124087

(moderation:

)

08/10/2014 23:03:09 | Einstein@Home | Scheduler request failed: HTTP file not found

I now have 13 units waiting to report. All have uploaded.

I have "No new tasks" set and 24 hours work left.

Version 7.2.47

Mike

Mike.Gibson

Joined: 17 Dec 07

Posts: 21

Credit: 3759410

RAC: 0

Switched to "Allow new tasks"

9 Oct 2014 1:09:00 UTC

Message 124088 in response to message 124087

(moderation:

)

Switched to "Allow new tasks" and all reported and new tasks downloaded.

The problem seems to be linked to the "No new tasks" setting.

Mike

Mumak

Joined: 26 Feb 13

Posts: 325

Credit: 3519621742

RAC: 1609222

RE: Tomorrow I may also

9 Oct 2014 5:48:47 UTC

Message 124089 in response to message 124077

(moderation:

)

Quote:

Tomorrow I may also give a more extensive report on what we actually did.

We'd certainly appreciate such report :-)

-----

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250445573

RAC: 35247

Basically we have been

9 Oct 2014 7:55:46 UTC

Message 124090 in response to message 124089

(moderation:

)

Basically we have been running on the spare wheel with the DB server for about a year. There were three identical servers set up @UWM, two of which already stopped working without a clear sign of what went wrong (hardware, OS, software, whatever) or how to fix these problems. The third was (still) running our master DB. Our fingers hurt from being crossed.

The end of the S6CasA "run" and thus the absence of "locality" work for a few weeks gave us the opportunity to move the active master DB to AEI (Hannover), where we got three newer and much more powerful DB servers as part of our "fallback infrastructure" that is meant to take over when something really bad happens to the UWM side.

The actual move, however, still came a bit rushed to avoid foreseeable difficulties next week (team challenge, vacations). Given the circumstances, all in all it went pretty smooth and within our plans.

For reliability reasons the "scheduler" had to be moved with the DB, so that's why the scheduler URL changed. We knew that the Clients should automatically adjust to that change, however we haven't been aware of how long it would take them. So currently we still have less than half the request rate on the new scheduler that we were used to from the old one. It will probably take until next week before we see a remotely comparable load on the AEI machines to what we saw at UWM.

Elektra*

Joined: 4 Sep 05

Posts: 948

Credit: 1124049

RAC: 0

Will there be some "grace"

9 Oct 2014 9:19:28 UTC

Message 124091

(moderation:

)

Will there be some "grace" time for tasks being submitted immediately before deadline and being unable to be reported as ready because of the delay with updating the scheduler URL? I think a lot of guys will have bunkered a lot of work for the forthcoming team challenge and won't be able to report their finished tasks in time when enabling network communication for the first time after URL change just at challenge start AND just before hitting the deadline.

Love, Michi

Project downtime tomorrow

Forums › Technical News

Comment viewing options

Forums › Technical News