Scheduler / Network Problem since a few days

csbyseti
csbyseti
Joined: 18 May 06
Posts: 8
Credit: 708101830
RAC: 429279
Topic 192223

Hallo @all,

i've got connection problems since 2-3 days (19?.12.2006). The scheduler isn't reachable or the netwotk part before the scheduler doesn't work some times.
I'm not the only Person with these error's, it must be a problem in the direction of the Scheduler.

Even if one Computer get's an connection, the next one got few min later an error. Some Machines got out of work (these one's with smaller Buffer (Boinc feature?))

>22.12.2006 11:22:11|Einstein@Home|Requesting 167602 seconds of new work, and reporting 10 results
22.12.2006 11:22:21|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
22.12.2006 11:22:21|Einstein@Home|Message from server: Completed result l1_0362.0_S5R1__1_S5R1a_0 refused: successful result ALREADY reported for this work
22.12.2006 11:22:21|Einstein@Home|Message from server: Completed result h1_0343.5_S5R1__10749_S5R1a_1 refused: successful result ALREADY reported for this work
22.12.2006 11:22:21|Einstein@Home|Message from server: Completed result h1_0343.5_S5R1__10747_S5R1a_0 refused: successful result ALREADY reported for this work
22.12.2006 11:22:21|Einstein@Home|Message from server: Completed result h1_0343.5_S5R1__10746_S5R1a_0 refused: successful result ALREADY reported for this work
22.12.2006 11:22:21|Einstein@Home|Message from server: Resent lost result h1_0377.5_S5R1__21620_S5R1a_1
22.12.2006 11:22:23||Rescheduling CPU: files downloaded
22.12.2006 11:23:21|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
22.12.2006 11:23:21|Einstein@Home|Reason: To fetch work
22.12.2006 11:23:21|Einstein@Home|Requesting 161475 seconds of new work
22.12.2006 11:23:43||Network error: couldn't connect to server
22.12.2006 11:23:46|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi failed: http error
22.12.2006 11:23:46|Einstein@Home|No schedulers responded

One Exampel.

>22.12.2006 04:51:45|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
22.12.2006 04:51:45|Einstein@Home|Reason: To fetch work
22.12.2006 04:51:45|Einstein@Home|Requesting 63885 seconds of new work, and reporting 4 results
22.12.2006 04:51:55|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
22.12.2006 04:51:55|Einstein@Home|Message from server: Resent lost result h1_0399.5_S5R1__32306_S5R1a_2
22.12.2006 04:51:55|Einstein@Home|Message from server: Resent lost result h1_0399.5_S5R1__32305_S5R1a_2
22.12.2006 04:51:55|Einstein@Home|Message from server: Resent lost result h1_0399.5_S5R1__32304_S5R1a_1
22.12.2006 04:51:55|Einstein@Home|Message from server: Resent lost result h1_1488.0_S5R1__1257_S5R1a_1
22.12.2006 04:51:57|Einstein@Home|Started download of h1_0399.5_S5R1
22.12.2006 04:51:57|Einstein@Home|Started download of grid_0400_h_T15_S5R1.dat
22.12.2006 04:52:02|Einstein@Home|Finished download of grid_0400_h_T15_S5R1.dat
22.12.2006 04:52:02|Einstein@Home|Throughput 55646 bytes/sec

Another Exampel. Lost Result but download with no errors. All Einstein machines got errors but no Seti machines.
It would be nice if the Problem could be soleved before the long christmas weekend.

Merry christmas to all 'Einstein'-people

christoph

KSMarksPsych
KSMarksPsych
Moderator
Joined: 15 Oct 05
Posts: 2702
Credit: 4090227
RAC: 0

Scheduler / Network Problem since a few days

The second part of your log is normal.

If a WU gets lost in cyberspace for whatever reason, the server has the ability to check the information your local machine send back in it's work request with what the server thinks the local machine should have on it.

If there is a discrepancy, the server will resend the lost results. And everyone will go on their merry way :)

Note: Not all projects have this enabled.

As for the first part, I've seen those messages from others before. It's again where the local machine and server get out of sync. I'm not sure what other's have done to get rid of those messages.

Personally if I saw they got credit, I'd run down my cache of E@H and then reset the project. But before you try that see what others have to say.

Kathryn :o)

Einstein@Home Moderator

Svenie25
Svenie25
Joined: 21 Mar 05
Posts: 139
Credit: 2436862
RAC: 0

I´ve got the same problems

I´ve got the same problems as Christoph. They started at the same time as the server makes problems. I mean the particular switch off of the scheduler. Bruce wrote something about it here. Hope, it will be fixed soon.

Ananas
Ananas
Joined: 22 Jan 05
Posts: 272
Credit: 2500681
RAC: 0

The messages from the

The messages from the scheduler seem to be randomly. I doubt that it is really switching between shutdown / enabled / unavailable all the time, within a few minutes.

It somehow looks as if there was a caching firewall between the web and the scheduler server. Maybe a load ballancer issue?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.