9-10-2014 0:05:25 | Einstein@Home | Requesting new tasks for CPU and ATI
9-10-2014 0:05:35 | Einstein@Home | Scheduler request failed: HTTP file not found
As I am not a specialist in programming, perhaps one can indicate where I have to change to the new scheduler URL
It would have been more user-friendly if this information had been given before the system shutdown yesterday.
perhaps one can indicate where I have to change to the new scheduler URL
As mentioned in previous posts in this thread, referenced also by a thread in the problems and bug reports board , the application will accumulate about 10 failures, then automatically get a new scheduler list, after which normal function resumes.
If you are in a hurry, you can just click Update a few times. Otherwise it will fix itself in time.
Perhaps there is a difference depending on whether work is being requested.
Three of my PC's that wanted work seemed to take about 5 update requests each, but my laptop, which was off all night and had work to report but none to request, logged eleven "Scheduler request failed: HTTP file not found" entries before finally doing the "Fetching scheduler list, Master file download succeeded" pair, after which the next update request succeeded.
Computers which were active during the (European day / American night) probably got through their first few attempts during the 'down for maintenance' period, so fewer were needed to reach the "after 10 consecutive failures" trigger that Bernd mentioned. If the machine has been off, you need to do them all yourself.
My primary cruncher is always on, but it wasn't asking for new work, so it may not have tried at all during the outage to report the three it had finished. I had to kick it eleven times before it downloaded the Master file.
David
Miserable old git
Patiently waiting for the asteroid with my name on it.
Perhaps there is a difference depending on whether work is being requested.
Three of my PC's that wanted work seemed to take about 5 update requests each, but my laptop, which was off all night and had work to report but none to request, logged eleven "Scheduler request failed: HTTP file not found" entries before finally doing the "Fetching scheduler list, Master file download succeeded" pair, after which the next update request succeeded.
Computers which were active during the (European day / American night) probably got through their first few attempts during the 'down for maintenance' period, so fewer were needed to reach the "after 10 consecutive failures" trigger that Bernd mentioned. If the machine has been off, you need to do them all yourself.
The version of BOINC matters as well. The bulk of my hosts don't ever request work 'on their own'. Their cache settings are manipulated from an external script that makes sure they have up-to-date common data files before making a work request. These controlled work requests are rather infrequent. Those machines on more 'current' versions of BOINC will report work soon after completion and hence will have made a number of contacts anyway without requesting work but those on v6 BOINCs will not have made contact. They report about once per day when not requesting work.
I've just 'updated' machines at home and those on v6 needed a full 12 clicks whilst those on 7.2.42 needed just a couple. I'll have to head off shortly and attend to a very much larger group at a different location. Fortunately most of them are on 7.2.42 so just completing and reporting tasks should get them out of trouble on their own.
Basically we have been running on the spare wheel with the DB server for about a year. There were three identical servers set up @UWM, two of which already stopped working without a clear sign of what went wrong (hardware, OS, software, whatever) or how to fix these problems. The third was (still) running our master DB. Our fingers hurt from being crossed.
The end of the S6CasA "run" and thus the absence of "locality" work for a few weeks gave us the opportunity to move the active master DB to AEI (Hannover), where we got three newer and much more powerful DB servers as part of our "fallback infrastructure" that is meant to take over when something really bad happens to the UWM side.
The actual move, however, still came a bit rushed to avoid foreseeable difficulties next week (team challenge, vacations). Given the circumstances, all in all it went pretty smooth and within our plans.
For reliability reasons the "scheduler" had to be moved with the DB, so that's why the scheduler URL changed. We knew that the Clients should automatically adjust to that change, however we haven't been aware of how long it would take them. So currently we still have less than half the request rate on the new scheduler that we were used to from the old one. It will probably take until next week before we see a remotely comparable load on the AEI machines to what we saw at UWM.
Will there be some "grace" time for tasks being submitted immediately before deadline and being unable to be reported as ready because of the delay with updating the scheduler URL? I think a lot of guys will have bunkered a lot of work for the forthcoming team challenge and won't be able to report their finished tasks in time when enabling network communication for the first time after URL change just at challenge start AND just before hitting the deadline.
After running outof work I
)
After running outof work I get this:
9-10-2014 0:05:25 | Einstein@Home | Requesting new tasks for CPU and ATI
9-10-2014 0:05:35 | Einstein@Home | Scheduler request failed: HTTP file not found
As I am not a specialist in programming, perhaps one can indicate where I have to change to the new scheduler URL
It would have been more user-friendly if this information had been given before the system shutdown yesterday.
RE: perhaps one can
)
As mentioned in previous posts in this thread, referenced also by a thread in the problems and bug reports board , the application will accumulate about 10 failures, then automatically get a new scheduler list, after which normal function resumes.
If you are in a hurry, you can just click Update a few times. Otherwise it will fix itself in time.
It works!! I should have been
)
It works!! I should have been more patient.
Thank you very much archae86
RE: RE: RE: 5th update
)
My primary cruncher is always on, but it wasn't asking for new work, so it may not have tried at all during the outage to report the three it had finished. I had to kick it eleven times before it downloaded the Master file.
David
Miserable old git
Patiently waiting for the asteroid with my name on it.
RE: RE: RE: 5th update
)
The version of BOINC matters as well. The bulk of my hosts don't ever request work 'on their own'. Their cache settings are manipulated from an external script that makes sure they have up-to-date common data files before making a work request. These controlled work requests are rather infrequent. Those machines on more 'current' versions of BOINC will report work soon after completion and hence will have made a number of contacts anyway without requesting work but those on v6 BOINCs will not have made contact. They report about once per day when not requesting work.
I've just 'updated' machines at home and those on v6 needed a full 12 clicks whilst those on 7.2.42 needed just a couple. I'll have to head off shortly and attend to a very much larger group at a different location. Fortunately most of them are on 7.2.42 so just completing and reporting tasks should get them out of trouble on their own.
Cheers,
Gary.
08/10/2014 23:03:09 |
)
08/10/2014 23:03:09 | Einstein@Home | Scheduler request failed: HTTP file not found
I now have 13 units waiting to report. All have uploaded.
I have "No new tasks" set and 24 hours work left.
Version 7.2.47
Mike
Switched to "Allow new tasks"
)
Switched to "Allow new tasks" and all reported and new tasks downloaded.
The problem seems to be linked to the "No new tasks" setting.
Mike
RE: Tomorrow I may also
)
We'd certainly appreciate such report :-)
-----
Basically we have been
)
Basically we have been running on the spare wheel with the DB server for about a year. There were three identical servers set up @UWM, two of which already stopped working without a clear sign of what went wrong (hardware, OS, software, whatever) or how to fix these problems. The third was (still) running our master DB. Our fingers hurt from being crossed.
The end of the S6CasA "run" and thus the absence of "locality" work for a few weeks gave us the opportunity to move the active master DB to AEI (Hannover), where we got three newer and much more powerful DB servers as part of our "fallback infrastructure" that is meant to take over when something really bad happens to the UWM side.
The actual move, however, still came a bit rushed to avoid foreseeable difficulties next week (team challenge, vacations). Given the circumstances, all in all it went pretty smooth and within our plans.
For reliability reasons the "scheduler" had to be moved with the DB, so that's why the scheduler URL changed. We knew that the Clients should automatically adjust to that change, however we haven't been aware of how long it would take them. So currently we still have less than half the request rate on the new scheduler that we were used to from the old one. It will probably take until next week before we see a remotely comparable load on the AEI machines to what we saw at UWM.
BM
BM
Will there be some "grace"
)
Will there be some "grace" time for tasks being submitted immediately before deadline and being unable to be reported as ready because of the delay with updating the scheduler URL? I think a lot of guys will have bunkered a lot of work for the forthcoming team challenge and won't be able to report their finished tasks in time when enabling network communication for the first time after URL change just at challenge start AND just before hitting the deadline.
Love, Michi