Minor work request issue.

Novasen169
Novasen169
Joined: 14 May 06
Posts: 43
Credit: 2767204
RAC: 0
Topic 194076

This keeps happening to me:

09/12/2008 20:46:25|Einstein@Home|Computation for task h1_0776.05_S5R4__606_S5R4a_1 finished
09/12/2008 20:46:25|climateprediction.net|Restarting task hadam3h_c_89s31_2000_2000_4_2 using hadam3 version 601
09/12/2008 20:46:27|Einstein@Home|Started upload of h1_0776.05_S5R4__606_S5R4a_1_0
09/12/2008 20:46:28|Einstein@Home|Sending scheduler request: To fetch work. Requesting 13479 seconds of work, reporting 0 completed tasks
09/12/2008 20:46:33|Einstein@Home|Scheduler request completed: got 1 new tasks
09/12/2008 20:46:41|Einstein@Home|Finished upload of h1_0776.05_S5R4__606_S5R4a_1_0

With a little explanation:
I have a dual core, with Einstein running and just finishing some climateprediction WUs. I don't like keeping a large cache, because I don't want to keep the people waiting that have problems with high pending credit. For no real reason I'm also trying to keep a low turnaround time.

So what happens is, normally it has 2 WUs running. Then I want it to download a new one, say, one hour before one of them finishing. Though what happens is... the task completes, there's no new Einstein WU (though there is debt) and it starts CPDN. Then when it starts uploading, it also requests new work! That is just stupid because it doesn't solve the issue of CPDN starting while it shouldn't and it also doesn't report the WU that's just finished.
I tried varying the cache a bit, but setting it to 0.1, 0.2 or 0.3 days doesn't seem to solve this (it was 0.1 default).
Any clues on how I could fix this?
Of course, it's a minor issue, it's mostly curiosity :)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118290596074
RAC: 25067535

Minor work request issue.

Quote:
Any clues on how I could fix this?

The thing that makes it difficult to give advice is that you have only mentioned two of the very large number of projects showing in the list of projects that you support. I'm assuming that perhaps the other projects are currently set to "no new tasks" although some of them do have relatively large RACs, indicating at least some recent activity.

If E@H and CPDN are the only projects with actual tasks on board, a lot will depend on the actual resource shares you have allocated to all projects that could potentially download new work and to the debt positions of all such projects. You mention that you are "just finishing" some CPDN tasks. Can we therefore assume that you have CPDN set to NNT so that the current tasks, when finished, wont be replaced? What is the estimated time left on those CPDN tasks and are they under any deadline pressure?

It's not entirely clear but your description seems to suggest that you want both cores to be running E@H apart from whatever small share is required to finish off CPDN. You want to have a spare E@H task delivered around an hour or so before it is needed. The problem as you see it is that BOINC doesn't want to keep a spare E@H task, preferring instead to give CPDN a run as soon as an E@H task finishes. If this is the case then the answer is probably that BOINC is simply doing what it is supposed to do - honouring the resource shares that you have set for each project. BOINC would know that it intends to run CPDN next and so it doesn't need a spare E@H task for the moment. Lowering the CPDN resource share might be a way to increase the liklihood that BOINC will keep a spare E@H task around.

If you let us know the resource shares of active projects and estimated run times of all tasks on board, we should be able to give better advice.

Cheers,
Gary.

Novasen169
Novasen169
Joined: 14 May 06
Posts: 43
Credit: 2767204
RAC: 0

Hmm yea, as usual the

Hmm yea, as usual the solution is pretty simple it seems.

I am only running Einstein / CPDN, I tend to switch a lot between projects but I decided to switch to Einstein alone this time. I figured it would be a waste to cancel those CPDN WUs though, so indeed, I put it to no new work.

There is no deadline pressure on those WUs, they should easily finish in time.
Though the resource share is 89/10/1 Einstein/CPDN/Simap (Simap just while waiting for the new batch of course). An Einstein task typically takes 7 hours for me, and from what it looks like it usually runs that without switching to CPDN, so that could indeed mean it has build up dept to CPDN by then, especially with two cores, since that's 14 hours of computing and with my setting to switch every two hours that's 2/16 = 12,5% of time it would dedicate to CPDN, which is actually pretty close to the setting...

Didn't think it would be that easy, then again, isn't it a bit weird that the scheduler request keeps happening between the start and the end of the upload?
Over night it happened twice again, there aren't even any scheduler requests in the messages that aren't between the start and the end of an upload =/
It's always like: core 1 finishes a task, switch to cpdn, start upload, scheduler request, end upload, core 2 finishes a task, switch to the new einstein task, start upload, scheduler request, end upload, core 1 switches to einstein, and that all over again (of course with a few hours in between some of the steps).
It's not a problem of course, it's just that I can't explain it, curiosity :P

Edited for flawed math.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2978490899
RAC: 784166

There is one clear and

There is one clear and consistent cause of this "request new work on task completion, before upload finishes". It happens when the recalculation of the queue size tips BOINC below its work fetch threshhold - typically when TDCF is decreasing. I'm surprised to see it on Einstein, though, since the initial work estimates are too low and TCDF should have increased, not decreased.

There's a discussion at BOINC: Is it possible to increase delay between result upload and reporting?, and a trac ticket [trac]#728[/trac]. You could add your thoughts there.

mikey
mikey
Joined: 22 Jan 05
Posts: 12774
Credit: 1855440311
RAC: 1065735

RE: This keeps happening to

Quote:

This keeps happening to me:

09/12/2008 20:46:25|Einstein@Home|Computation for task h1_0776.05_S5R4__606_S5R4a_1 finished
09/12/2008 20:46:25|climateprediction.net|Restarting task hadam3h_c_89s31_2000_2000_4_2 using hadam3 version 601
09/12/2008 20:46:27|Einstein@Home|Started upload of h1_0776.05_S5R4__606_S5R4a_1_0
09/12/2008 20:46:28|Einstein@Home|Sending scheduler request: To fetch work. Requesting 13479 seconds of work, reporting 0 completed tasks
09/12/2008 20:46:33|Einstein@Home|Scheduler request completed: got 1 new tasks
09/12/2008 20:46:41|Einstein@Home|Finished upload of h1_0776.05_S5R4__606_S5R4a_1_0

With a little explanation:
I have a dual core, with Einstein running and just finishing some climateprediction WUs. I don't like keeping a large cache, because I don't want to keep the people waiting that have problems with high pending credit. For no real reason I'm also trying to keep a low turnaround time.

So what happens is, normally it has 2 WUs running. Then I want it to download a new one, say, one hour before one of them finishing. Though what happens is... the task completes, there's no new Einstein WU (though there is debt) and it starts CPDN. Then when it starts uploading, it also requests new work! That is just stupid because it doesn't solve the issue of CPDN starting while it shouldn't and it also doesn't report the WU that's just finished.
I tried varying the cache a bit, but setting it to 0.1, 0.2 or 0.3 days doesn't seem to solve this (it was 0.1 default).
Any clues on how I could fix this?
Of course, it's a minor issue, it's mostly curiosity :)

One thing you aren't considering is the time the Projects Servers need to take for your pc to talk to it. Everytime you, or I or anyone else, connects to the Server it takes time away from whatever else it is doing and slows it down. With many thousands, or more, users that can mean a lot of time. When I was on Seti they said it made a HUGE difference when they stopped the program from connecting so many times. When they combined the steps to both receive the completed data and send the new data to us users, the servers sped right back up and they didn't need to buy new ones. When the steps were separated, ie 2 separate functions, the server could not keep up with the other things it needs to do. As you are well aware costs can kill a project so saving money was a big priority for the Boinc Project as a whole. That is also why the "communication deferred" message now comes up after a connection.

Novasen169
Novasen169
Joined: 14 May 06
Posts: 43
Credit: 2767204
RAC: 0

RE: One thing you aren't

Message 88914 in response to message 88913

Quote:
One thing you aren't considering is the time the Projects Servers need to take for your pc to talk to it. Everytime you, or I or anyone else, connects to the Server it takes time away from whatever else it is doing and slows it down. With many thousands, or more, users that can mean a lot of time. When I was on Seti they said it made a HUGE difference when they stopped the program from connecting so many times. When they combined the steps to both receive the completed data and send the new data to us users, the servers sped right back up and they didn't need to buy new ones. When the steps were separated, ie 2 separate functions, the server could not keep up with the other things it needs to do. As you are well aware costs can kill a project so saving money was a big priority for the Boinc Project as a whole. That is also why the "communication deferred" message now comes up after a connection.


I did consider this actually. If it waits for the upload to finish, then request new work, it would also get new work and report a result at the same time, wouldn't it? It would just be faster.

Quote:

There is one clear and consistent cause of this "request new work on task completion, before upload finishes". It happens when the recalculation of the queue size tips BOINC below its work fetch threshhold - typically when TDCF is decreasing. I'm surprised to see it on Einstein, though, since the initial work estimates are too low and TCDF should have increased, not decreased.

There's a discussion at BOINC: Is it possible to increase delay between result upload and reporting?, and a trac ticket [trac]#728[/trac]. You could add your thoughts there.


Thanks! That's the answer I was looking for.

After this explanation I expect it has to do something with a bug I have, which is causing time to completion to show as 0:01 at all times. Of course this is no real problem (I know how long my tasks last and it pretty constant).

Thanks for clearing this up and increasing my understanding a little bit more :)

mikey
mikey
Joined: 22 Jan 05
Posts: 12774
Credit: 1855440311
RAC: 1065735

RE: RE: One thing you

Message 88915 in response to message 88914

Quote:
Quote:
One thing you aren't considering is the time the Projects Servers need to take for your pc to talk to it. Everytime you, or I or anyone else, connects to the Server it takes time away from whatever else it is doing and slows it down. With many thousands, or more, users that can mean a lot of time. When I was on Seti they said it made a HUGE difference when they stopped the program from connecting so many times. When they combined the steps to both receive the completed data and send the new data to us users, the servers sped right back up and they didn't need to buy new ones. When the steps were separated, ie 2 separate functions, the server could not keep up with the other things it needs to do. As you are well aware costs can kill a project so saving money was a big priority for the Boinc Project as a whole. That is also why the "communication deferred" message now comes up after a connection.

I did consider this actually. If it waits for the upload to finish, then request new work, it would also get new work and report a result at the same time, wouldn't it? It would just be faster.

Yes you are correct, but the point I was making that opening and closing the port every time someone connects is the slowdown. Combining those tasks is what makes the Server faster over time. Also since there are only so many connections available to the Server at any one point in time, each connection also takes up valuable connection resources. Using each connection to its max best advantage is the ideal thing.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2978490899
RAC: 784166

RE: RE: RE: One thing

Message 88916 in response to message 88915

Quote:
Quote:
Quote:
One thing you aren't considering is the time the Projects Servers need to take for your pc to talk to it. Everytime you, or I or anyone else, connects to the Server it takes time away from whatever else it is doing and slows it down. With many thousands, or more, users that can mean a lot of time. When I was on Seti they said it made a HUGE difference when they stopped the program from connecting so many times. When they combined the steps to both receive the completed data and send the new data to us users, the servers sped right back up and they didn't need to buy new ones. When the steps were separated, ie 2 separate functions, the server could not keep up with the other things it needs to do. As you are well aware costs can kill a project so saving money was a big priority for the Boinc Project as a whole. That is also why the "communication deferred" message now comes up after a connection.

I did consider this actually. If it waits for the upload to finish, then request new work, it would also get new work and report a result at the same time, wouldn't it? It would just be faster.

Yes you are correct, but the point I was making that opening and closing the port every time someone connects is the slowdown. Combining those tasks is what makes the Server faster over time. Also since there are only so many connections available to the Server at any one point in time, each connection also takes up valuable connection resources. Using each connection to its max best advantage is the ideal thing.


I don't think that there's any way that BOINC can combine multiple file uploads or downloads into a single 'open port' session, and I don't think there would be much to gain if they tried - IIRC, the major speed increase you refer to was when they switched from CGI to FastCGI on the server - something which the BOINC client/user isn't involved in at all.

The point at which aggregating requests comes into play is the server 'cost' of opening and closing the database connection when scheduler requests to report or request new work are made - see Rom Walton's blog. The server load decrease would come from delaying the first scheduler call (to fetch work) and aggregating it with the second (to report completed work).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.