4 days without work

Dimas Bastardo
Dimas Bastardo
Joined: 9 Feb 05
Posts: 7
Credit: 107992
RAC: 0
Topic 189591

Quote:
22/07/2005 08:59:56 a.m.|Einstein@Home|Sending scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
22/07/2005 08:59:56 a.m.|Einstein@Home|Requesting 0 seconds of work, returning 0 results
22/07/2005 08:59:57 a.m.|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded

In 4 days this is the only message received from Einstein. Other projects working fine, and there is no changes in the client (4.45) or computers.

Any idea?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109387193469
RAC: 35925505

4 days without work

My guess is that you had a bunch of EAH work that was in danger of exceeding the deadline and so BOINC crunched it all first. Now the other projects are catching up and EAH is being prevented from downloading more work until the "debt has been repaid".

To confirm this you need to tell us the project mix, the resource shares and the connect to network setting that you are using. Please tell us those three things.

Also please be assured that if you give BOINC enough time and be patient, the system will sort itself out in due course and your resource shares will be honoured after a time.

Cheers,
Gary.

Dimas Bastardo
Dimas Bastardo
Joined: 9 Feb 05
Posts: 7
Credit: 107992
RAC: 0

First, thanks for your

First, thanks for your response.

I have three computers working BOINC in alwas run mode; two with seti, einstein, protein and climate, but one with no request of work from protein and climate.

I´m not using separate preferences for each project, the conection rutine is set for 5 days (cathing some extra work), but usually i´ll do manually daily. All projects have resources share in 100.

The last one machine is only with seti for now, testing the work in linux.

Thanks again...

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

For Einstein@Home, 5 days is

For Einstein@Home, 5 days is not a good choice. With the setting there it is possible to get into trouble. I am not saying that this IS the problem, but one of the things that sends the later versions into a snit is having a connect every setting that is too large.

That being said, using update every day should not harm things. But, if you play with the numbers to get it to act "right" you will be stuck here forever. In general, the new CPU Scheduler should settle down in about a week and from there run ok. There are a couple special cases where it gets itself in trouble that it has a hard time getting out of it. But I am not the guy to talk to about this ... hopefully JM VII will see this and offer a suggestion ...

John McLeod VII
John McLeod VII
Moderator
Joined: 10 Nov 04
Posts: 547
Credit: 632255
RAC: 0

There are three causes for

There are three causes for the 0 work request and without more information, I can't tell the difference.

The host can have too much work to complete by deadline comfortably, causing no work to be downloaded from any project. (4.45 EDF simulation has a time to completion > 0.8 * time to a deadline for some deadline, or > 4 projects on hand, or total required time frac > 0.8). (4.7x RR simulation has a time to completion > 0.9 * time to deadline for some deadline with or without the runnable resource fractions reduced to accomodate the next project to have a work request).

The project can have enough work on a host that has enough work to get through to the next connection. (More than connect time work total, and more than resource frac * connect time for the project).

The project can have used more CPU time than its share recently, and thus be blocked from downloading work. (large negative LT debt for the project).

In any case, these should go away after some time and the project should again be asked for work.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109387193469
RAC: 35925505

RE: I´m not using separate

Message 14380 in response to message 14377

Quote:

I´m not using separate preferences for each project, the conection rutine is set for 5 days ....

The only thing you need to do to stop the original problem mentioned in your first post is reduce your 5 day connect setting. Why don't you just try it out at 1 day and sit back and watch over a week or so how beautifully it stabilizes and starts honouring your resource share with plenty of fresh work on hand from all projects capable of supplying work. Once it all settles you could then experiment and slowly increase the connect interval to say 1.5 days or so if you really must have excessive work on hand going stale :).

Cheers,
Gary.

Dimas Bastardo
Dimas Bastardo
Joined: 9 Feb 05
Posts: 7
Credit: 107992
RAC: 0

Well, thank for the interest,

Well, thank for the interest, rigth now i changed the conection adjust to 1 day, and i keep checking the behavior of EAH when crunch the pending work from another projects (2 or 3 days i guess).

I still can´t understand why this happen in first place, because the 5 days adjust was working fine for a whole month or so.

One question: the boinc system (client, benchmarks, scheds, general adjustments, etc.) don't asure a reasonable rutine for complete the work in function of the resources sharing?

Thanks...

Richard M
Richard M
Joined: 11 Nov 04
Posts: 78
Credit: 221682934
RAC: 1199613

You have not clicked the "No

You have not clicked the "No New Work" button by mistake have you? :)

(Just guessing)
Richard

Click the Sig!

Dimas Bastardo
Dimas Bastardo
Joined: 9 Feb 05
Posts: 7
Credit: 107992
RAC: 0

RE: You have not clicked

Message 14383 in response to message 14382

Quote:

You have not clicked the "No New Work" button by mistake have you? :)

(Just guessing)
Richard

Thanks Richard, i checked this at first time.

Aparently the connection adjust do the trick. The machine working with only EAH and SAH download new job:

Quote:
22/07/2005 04:53:58 p.m.|Einstein@Home|Scheduler request to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
22/07/2005 04:53:58 p.m.|Einstein@Home|Got server request to delete file H1_1286.0
22/07/2005 04:53:59 p.m.||Using earliest-deadline-first scheduling because computer is overcommitted.
22/07/2005 04:54:00 p.m.|Einstein@Home|Started download of earth_05_09
22/07/2005 04:54:00 p.m.|Einstein@Home|Started download of sun_05_09
22/07/2005 04:56:05 p.m.|Einstein@Home|Finished download of earth_05_09
22/07/2005 04:56:05 p.m.|Einstein@Home|Throughput 21828 bytes/sec
22/07/2005 04:56:05 p.m.|Einstein@Home|Started download of Config_H_S4hA
22/07/2005 04:56:06 p.m.|Einstein@Home|Finished download of sun_05_09
22/07/2005 04:56:06 p.m.|Einstein@Home|Throughput 2171 bytes/sec
22/07/2005 04:56:06 p.m.|Einstein@Home|Finished download of Config_H_S4hA
22/07/2005 04:56:06 p.m.|Einstein@Home|Throughput 364 bytes/sec
22/07/2005 04:56:06 p.m.|Einstein@Home|Started download of w1_0873.0
22/07/2005 05:01:51 p.m.|Einstein@Home|Finished download of w1_0873.0
22/07/2005 05:01:51 p.m.|Einstein@Home|Throughput 24279 bytes/sec
22/07/2005 05:01:51 p.m.||request_reschedule_cpus: files downloaded
22/07/2005 05:01:51 p.m.||request_reschedule_cpus: files downloaded
22/07/2005 05:01:51 p.m.||request_reschedule_cpus: files downloaded
22/07/2005 05:01:51 p.m.||request_reschedule_cpus: files downloaded
22/07/2005 05:01:51 p.m.||request_reschedule_cpus: files downloaded
22/07/2005 05:01:51 p.m.||request_reschedule_cpus: files downloaded
22/07/2005 05:01:51 p.m.||request_reschedule_cpus: files downloaded
22/07/2005 05:01:51 p.m.||request_reschedule_cpus: files downloaded

Still waiting for the machine that work with 4 projects.

Thanks to all...

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109387193469
RAC: 35925505

RE: Still waiting for the

Message 14384 in response to message 14383

Quote:


Still waiting for the machine that work with 4 projects.

Thanks to all...

OK, I can see that the machine with 4 projects did a bunch of EAH work ending at around the 15th July with no new work since then. The crunching of that EAH work was probably done at an accelerated rate so as to avoid missing deadlines and this would have left EAH with a negative LTD. As the other projects crunch and because you have now reduced your queue to 1 day, your work on hand will reduce and become more manageable and the EAH negative LTD will reduce and eventually become positive. At this point new EAH work will be downloaded and you will start cycling between projects according to your resource share. Hopefully this will only take a few more days and everything will be looking good and stable.

How much work do you have for the other three projects?

Cheers,
Gary.

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

RE: Thanks to

Message 14385 in response to message 14383

Quote:
Thanks to all...


Dimas,

Just keep in mind that the 4.4x CPU scheduler does not mimic the older version. So, you can go crazy trying to force it to work that way. I don't have much experience with BOINC on slower systems, but mine have "balanced" themselves with a decent selection of work ... there are those occasional "Kodak Moments" when one project or another is missing in action, but those are rare now.

I am glad you seem to be back in production.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.