BOINC Priorities

Gerry Rough
Gerry Rough
Joined: 1 Mar 05
Posts: 102
Credit: 1847066
RAC: 0

RE: Again, maybe I just

Message 90492 in response to message 90491

Quote:

Again, maybe I just don't understand what you're saying? Or seeing? You seem to be saying that if you have, for example, 4 cores, 3 projects and 10 tasks cached for each project (30 tasks total) then BOINC will eventually start and preempt each and every one of those 30 tasks but not restart a task that has status "ready to start". I am definitely not seeing that. Each project has, at most, X tasks with status "running" or "waiting to run", where X = the number of cores. I've tried it with BOINC versions 5.10.45, 6.2.15 and 6.4.3.

Since I reset all the debts to 0 at the start of this experiment, I will let the experiment run for a few more days, maybe even weeks, just to see if mounting debts somehow affect it and cause me to have greater than X started tasks per project where X = number of cores.

You're close but not quite. BOINC does restart tasks that are waiting, but there are times when it starts a new task when there is one already waiting to resume for that project. I've never noticed any pattern from which to gain any cause, so I don't know why it does this. I'm hoping someone else who knows more about BOINC can replicate or tell of his own experience. The best thing I can think of is for someone to tell me how to make a screenshot and post it on this message board. That way you can see what I'm seeing.

Edit: just looked up how to make a screenshot. Will post when I can crop it off for brevity.

Edit 2: I now can make the screenshot, but don't know how to post it. Let me know, I have the cropped off shot that makes my point.


(Click for detailed stats)

Erik
Erik
Joined: 14 Feb 06
Posts: 2815
Credit: 2645600
RAC: 0

RE: Edit 2: I now can make

Message 90493 in response to message 90492

Quote:
Edit 2: I now can make the screenshot, but don't know how to post it. Let me know, I have the cropped off shot that makes my point.

You'll need to use an image hosting service to post your screenshot (unless you have a server to use). Photobucket and ImageShack are a couple of popular free services.

Dagorath
Dagorath
Joined: 22 Apr 06
Posts: 146
Credit: 226423
RAC: 0

RE: RE: Again, maybe I

Message 90494 in response to message 90492

Quote:
Quote:

Again, maybe I just don't understand what you're saying? Or seeing? You seem to be saying that if you have, for example, 4 cores, 3 projects and 10 tasks cached for each project (30 tasks total) then BOINC will eventually start and preempt each and every one of those 30 tasks but not restart a task that has status "ready to start". I am definitely not seeing that. Each project has, at most, X tasks with status "running" or "waiting to run", where X = the number of cores. I've tried it with BOINC versions 5.10.45, 6.2.15 and 6.4.3.

Since I reset all the debts to 0 at the start of this experiment, I will let the experiment run for a few more days, maybe even weeks, just to see if mounting debts somehow affect it and cause me to have greater than X started tasks per project where X = number of cores.

You're close but not quite. BOINC does restart tasks that are waiting, but there are times when it starts a new task when there is one already waiting to resume for that project.

It definitely should not do that, not even sometimes, but I am wondering if you've actually seen that happen or whether you are just deducing that that is what must have happened. I mean you might walk into your computer room and see Einstein task B with deadline 6 days away and 20 minutes CPU time accumulated running while Einstein task A with deadline 5 days away and 2 hours accumulated CPU time is waiting to resume. I see that too but if I look in Messages I always see that A was running when BOINC decided to switch a core to Rosetta and since there were no other Rosetta tasks waiting to run BOINC had no other choice but to start B, a fresh one. (Sorry to bring this point up again as we already covered it in a previous post but I'm just double checking)

Bottom line is that if you haven't been suspending tasks manually after they've started then the sum of tasks running and tasks waiting to run for any project can never be more than the number of cores. If it is then there is a bug. Show us a screenshot with 5 Einstein tasks either running or waiting to run on your quad core and I'll be convinced that there is either a bug or you're suspending tasks just to make it look that way :)

Quote:
Edit 2: I now can make the screenshot, but don't know how to post it. Let me know, I have the cropped off shot that makes my point.

You have to upload the pic to a free pic hosting service like image shack. They'll give you a URL that you can put in a link in a post here.

Gerry Rough
Gerry Rough
Joined: 1 Mar 05
Posts: 102
Credit: 1847066
RAC: 0

RE: It definitely should

Message 90495 in response to message 90494

Quote:

It definitely should not do that, not even sometimes, but I am wondering if you've actually seen that happen or whether you are just deducing that that is what must have happened. I mean you might walk into your computer room and see Einstein task B with deadline 6 days away and 20 minutes CPU time accumulated running while Einstein task A with deadline 5 days away and 2 hours accumulated CPU time is waiting to resume. I see that too but if I look in Messages I always see that A was running when BOINC decided to switch a core to Rosetta and since there were no other Rosetta tasks waiting to run BOINC had no other choice but to start B, a fresh one. (Sorry to bring this point up again as we already covered it in a previous post but I'm just double checking)

Bottom line is that if you haven't been suspending tasks manually after they've started then the sum of tasks running and tasks waiting to run for any project can never be more than the number of cores. If it is then there is a bug. Show us a screenshot with 5 Einstein tasks either running or waiting to run on your quad core and I'll be convinced that there is either a bug or you're suspending tasks just to make it look that way :)

I've never seen this happen: more WUs waiting than number of cores. The most I've ever seen is three with 4 cores.

Sent you a PM. You can post the image here for others to see if you want.


(Click for detailed stats)

Dagorath
Dagorath
Joined: 22 Apr 06
Posts: 146
Credit: 226423
RAC: 0

RE: RE: It definitely

Message 90496 in response to message 90495

Quote:
Quote:

It definitely should not do that, not even sometimes, but I am wondering if you've actually seen that happen or whether you are just deducing that that is what must have happened. I mean you might walk into your computer room and see Einstein task B with deadline 6 days away and 20 minutes CPU time accumulated running while Einstein task A with deadline 5 days away and 2 hours accumulated CPU time is waiting to resume. I see that too but if I look in Messages I always see that A was running when BOINC decided to switch a core to Rosetta and since there were no other Rosetta tasks waiting to run BOINC had no other choice but to start B, a fresh one. (Sorry to bring this point up again as we already covered it in a previous post but I'm just double checking)

Bottom line is that if you haven't been suspending tasks manually after they've started then the sum of tasks running and tasks waiting to run for any project can never be more than the number of cores. If it is then there is a bug. Show us a screenshot with 5 Einstein tasks either running or waiting to run on your quad core and I'll be convinced that there is either a bug or you're suspending tasks just to make it look that way :)

I've never seen this happen: more WUs waiting than number of cores. The most I've ever seen is three with 4 cores.

Then that tells me there is probably no bug but I guess you're saying like "it's working as designed but it should work better".

OK here's your screenshot. I tried putting it on Image Shack but they insisted on down-sizing it to where it was too small to read. So it's on my own limited bandwidth server and won't stay there for long.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109391103430
RAC: 35888137

Gerry, I've pulled up your

Gerry, I've pulled up your original question which can now be answered, thanks to your screenshot that Dagorath has kindly hosted.

Quote:
... why BOINC does not finish the WU it has already started on the project is working on, rather than starting a new WU and crunching on that first.


In the screenshot, there are 4 tasks labelled A, X, Y, Z. I assume you want to know why Z was started at all when A was sitting there, available to run, at 90% complete and with only 10% (28m50s) to go?

The answer is very simple. Because Z is under more deadline pressure than A is.

The deadlines for A and Z are quite close to each other (only 20 mins apart). BOINC has estimated that both are under deadline pressure and because (at the time BOINC made the decision) Z had 100% of its crunching still ahead of it and A only had 10% to go, Z will always be more at risk of failing the deadline than A will be until both are at approximately the same distance from completion. As that point is approached, and if BOINC has a further opportunity to re-evaluate which tasks are now at most risk, I wouldn't be at all surprised to see BOINC start up the crunching of one of the tasks either above or below Z since you have a whole lot more tasks which must be under very similar deadline pressure to what Z is.

Quote:
Should not BOINC give priority to those WUs it has already started first?


Emphatically, NO! Under these conditions (multiple tasks under deadline pressure) BOINC will always do those tasks first which are most at risk - and this is how it should be.

The proper course of action to rectify the perceived problem is to rearrange your preferences so as not to force BOINC into high priority mode all the time. You seem to have about seven active projects and you seem to be keeping several days of cache on board all the time. The scrollbar in the screenshot seems to indicate that you have about 5 pages of tasks with only one page actually visible. Why do you need so much work queued up and going stale? This is always going to make things very difficult for BOINC to manage.

If you only had one or two projects, a couple of days cache would be fair enough. With seven projects, anything more than about 0.5 days is asking for trouble. If it were my host I'd be starting at about 0.1 days for a while until everything settled down and would then start increasing it gradually while BOINC was able to manage things without having to resort to high priority.

The other thing to realise is that giving low resource share to a project with very long running tasks is also going to force BOINC into high priority mode. In one of your previous messages you mentioned Orbit at 5% share and tasks taking 600 hours. I know nothing about orbit, but unless the deadline is extremely long, this sounds like a combination likely to give BOINC quite a headache :-).

Cheers,
Gary.

Gerry Rough
Gerry Rough
Joined: 1 Mar 05
Posts: 102
Credit: 1847066
RAC: 0

RE: The answer is very

Message 90498 in response to message 90497

Quote:

The answer is very simple. Because Z is under more deadline pressure than A is.

The deadlines for A and Z are quite close to each other (only 20 mins apart). BOINC has estimated that both are under deadline pressure and because (at the time BOINC made the decision) Z had 100% of its crunching still ahead of it and A only had 10% to go, Z will always be more at risk of failing the deadline than A will be until both are at approximately the same distance from completion. As that point is approached, and if BOINC has a further opportunity to re-evaluate which tasks are now at most risk, I wouldn't be at all surprised to see BOINC start up the crunching of one of the tasks either above or below Z since you have a whole lot more tasks which must be under very similar deadline pressure to what Z is.

Shoot!! I was hoping that I had stumbled upon the soft underbelley of of a conspiracy gone bad, that there would be recriminations from every corner of the globe, and that through my brave but inauspicious questioning, mankind would reward me with a paragraph in macreconomic text books for generations to come. Oh well! :o)

Quote:

The proper course of action to rectify the perceived problem is to rearrange your preferences so as not to force BOINC into high priority mode all the time. You seem to have about seven active projects and you seem to be keeping several days of cache on board all the time. The scrollbar in the screenshot seems to indicate that you have about 5 pages of tasks with only one page actually visible. Why do you need so much work queued up and going stale? This is always going to make things very difficult for BOINC to manage.

If you only had one or two projects, a couple of days cache would be fair enough. With seven projects, anything more than about 0.5 days is asking for trouble. If it were my host I'd be starting at about 0.1 days for a while until everything settled down and would then start increasing it gradually while BOINC was able to manage things without having to resort to high priority.

The other thing to realise is that giving low resource share to a project with very long running tasks is also going to force BOINC into high priority mode. In one of your previous messages you mentioned Orbit at 5% share and tasks taking 600 hours. I know nothing about orbit, but unless the deadline is extremely long, this sounds like a combination likely to give BOINC quite a headache :-).

I think you are right. I thought that changing my preferences to 2 days instead of the default .1 days would not have the consequences that it apparently does with so many projects. I will have to change that. The reason the queue is so large is because of two projects that have lots of short WUs.

As to WCG in the screenshot, why did it download so many WUs when it should have calculated that there are too many hours of crunching for that level of resource share? I noticed that several days ago. That's odd!! I've seen BOINC do this before. Any ideas? The sceenshot is here.


(Click for detailed stats)

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

RE: Gerry, I've pulled up

Message 90499 in response to message 90497

Quote:

Gerry, I've pulled up your original question which can now be answered, thanks to your screenshot that Dagorath has kindly hosted.

Quote:
... why BOINC does not finish the WU it has already started on the project is working on, rather than starting a new WU and crunching on that first.

In the screenshot, there are 4 tasks labelled A, X, Y, Z. I assume you want to know why Z was started at all when A was sitting there, available to run, at 90% complete and with only 10% (28m50s) to go?

The answer is very simple. Because Z is under more deadline pressure than A is.

The only argument I have with your analysis is that I also have seen this issue and I do run with a very limited cache. Generally speaking I have only on rare occasions gone with a cache larger than 0.5 days.

I understand the simplistic calculation is that the newly started task is under greater "pressure", but in my case, this pressure does not actually exist. In other words, the simplistic calculation does not take cognizance of the fact that there is enough time to complete all the tasks cached and then some before I come within a week of the deadline on either task. In other words, there is no deadline pressure at all ...

Again, I agree in the grand scheme of life it makes little difference that more models are placed at risk by having 10 tasks partially done rather than 5 completed and 5 in work, but, it does clutter up ones system, fills the available RAM, and is in general, sloppy ... If there were true pressure I have no problem with the decision ... I simply have a problem with it when this is not the actuality.

mikey
mikey
Joined: 22 Jan 05
Posts: 11889
Credit: 1828092752
RAC: 205918

RE: I think you are right.

Message 90500 in response to message 90498

Quote:

I think you are right. I thought that changing my preferences to 2 days instead of the default .1 days would not have the consequences that it apparently does with so many projects. I will have to change that. The reason the queue is so large is because of two projects that have lots of short WUs.

As to WCG in the screenshot, why did it download so many WUs when it should have calculated that there are too many hours of crunching for that level of resource share? I noticed that several days ago. That's odd!! I've seen BOINC do this before. Any ideas? The sceenshot is here.

I think if you check you will find that you 'owe' time to WCG, meaning Boinc is trying to give it the % of crunching time you have specified but it is currently out of whack so Boinc is downloading more work for that project and will work on it more than the other projects for a bit.

mikey
mikey
Joined: 22 Jan 05
Posts: 11889
Credit: 1828092752
RAC: 205918

RE: RE: The answer is

Message 90501 in response to message 90499

Quote:
Quote:
The answer is very simple. Because Z is under more deadline pressure than A is.

The only argument I have with your analysis is that I also have seen this issue and I do run with a very limited cache. Generally speaking I have only on rare occasions gone with a cache larger than 0.5 days.

I understand the simplistic calculation is that the newly started task is under greater "pressure", but in my case, this pressure does not actually exist. In other words, the simplistic calculation does not take cognizance of the fact that there is enough time to complete all the tasks cached and then some before I come within a week of the deadline on either task. In other words, there is no deadline pressure at all ...

Again, I agree in the grand scheme of life it makes little difference that more models are placed at risk by having 10 tasks partially done rather than 5 completed and 5 in work, but, it does clutter up ones system, fills the available RAM, and is in general, sloppy ... If there were true pressure I have no problem with the decision ... I simply have a problem with it when this is not the actuality.

I think this is a Boinc issue Paul. I have seen this myself in the past, false readings by Boinc that time is an issue I mean. I can crunch units in about 2 to 3 hours on some systems and I usually keep a 1 day plus a 0.25 day cache. Sometimes Boinc will see 6 units on my machine and think 'oh no I can't finish them in time, I better go into high priority mode' and it does and of course a few hours later I am downloading new work! Boinc is NOT perfect, despite what Dr. A and others say, and has its quirky workings.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.