BOINC Priorities

Dagorath
Dagorath
Joined: 22 Apr 06
Posts: 146
Credit: 226,423
RAC: 0

RE: Boinc is NOT perfect,

Message 90502 in response to message 90501

Quote:
Boinc is NOT perfect, despite what Dr. A and others say, and has its quirky workings.

I have never heard or read Dr. A claim that BOINC is perfect. I have heard/read him say there may be other explanations for the quirky behavior we see, explanations other than the ones we believe are the only possible explanation. Let's not put words in Dr. A's mouth and lets not jump to conclusions about what actually happened on Gerry's computer.

Gary Roberts has offered the "it had more deadline pressure" explanation. I think there MAY be another explanation for why Gerry's task Z is crunching when his task A is not. The explanation is very simple... maybe it just looks like task A was overlooked.... perhaps A was crunching just a few minutes before Gerry took the screenshot. In other words, maybe the only reason BOINC started Z was because the debts or some other mechanism said "we need to be crunching more WCG now" but A was already crunching so there was no other choice but to start Z.

Furthermore, Gary Roberts' "it has more deadline pressure" theory predicts that Gerry's computer should, one by one, start ALL of the WCG tasks in the cache. But Gerry has told us that hasn't happened so Roberts' theory seems to fail.

So I will reiterate... if you see more than NC tasks of a project with status "running" or "waiting to run" where NC = the number of cores then there is definitely a bug. But you CAN have up to NC tasks started for the simple reason/mechanism I have stated above which doesn't need a bug to explain it. In fact, Gerry has already stated that he has NEVER seen more than N tasks started for any project. Has ANYBODY seen BOINC start more than NC tasks for a project without manual intervention? By manual intervention I mean such as suspending tasks that have already started.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,210
Credit: 43,611,564,184
RAC: 44,033,237

RE: I thought that changing

Message 90503 in response to message 90498

Quote:
I thought that changing my preferences to 2 days instead of the default .1 days would not have the consequences that it apparently does with so many projects.


I'm no expert on BOINC work fetch policy and the implementation details thereof. To me, BOINC is simply a means to an end - to keep my machine supplied with work and to return the results in a timely manner. For that reason, I tend to stay with a version that works for me and so I don't get to see and compare what happens as John Mcleod VII continues to develop and refine (and sometimes completely change tack, so it seems) with how the work fetch policy performs. I believe that there have been many changes in the details over time and therefore it's quite important to realise that work fetch behaviour can be considerably different depending on the particular BOINC version you are using.

I'm sure there would be versions where 7 projects and a 2-day cache setting probably would not cause the behaviour you are seeing. However the safe course of action is to keep the cache smaller rather than larger.

Quote:
As to WCG in the screenshot, why did it download so many WUs when it should have calculated that there are too many hours of crunching for that level of resource share?


The WCG tasks in total amount to about 50-60 hours of crunching. You don't list anywhere, if I recall correctly, the complete list of projects and the respective resource shares for that computer. If the share for WCG was around 30% or so, the number of tasks in your cache wouldn't seem to be unusual - particularly if any other projects weren't able to supply their full quota of work at the time those tasks were downloaded. There are simply too many unknown factors to give a rigorous answer to this question.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,210
Credit: 43,611,564,184
RAC: 44,033,237

Hi Paul, RE: The

Message 90504 in response to message 90499

Hi Paul,

Quote:
The only argument I have with your analysis is that I also have seen this issue and I do run with a very limited cache.


When you say "this issue", I presume you are referring to BOINC making the decision to run tasks at high priority when there was no need for it to do so - ie there was really no pressure on any deadline with such a small cache size. If that's the case, you should put the question to JM7 as I've no idea why BOINC would do that if you only had a 0.5 day cache. I could probably make a guess or two. For example, I've seen BOINC drop into high priority with few tasks on board and plenty of days to spare simply because the CPU efficiency has been drastically reduced - eg 0.02 instead of 0.98 or more. Tasks can "spin their wheels", ie appear to be running but not making any actual progress. If that goes on for quite a few hours, the CPU efficiency drops alarmingly.

In my previous answer, I didn't comment on whether BOINC's decision to invoke high priority was reasonable or not. All I was saying is that given that tasks were running at high priority (rightly or wrongly) it was pretty easy to see why the new task got started in preference to the almost completed existing task. This was 100% of the essence of the original question and that is all I was trying to answer.

Cheers,
Gary.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,125
Credit: 126,945,457
RAC: 11,517

RE: For example, I've seen

Message 90505 in response to message 90504

Quote:
For example, I've seen BOINC drop into high priority with few tasks on board and plenty of days to spare simply because the CPU efficiency has been drastically reduced - eg 0.02 instead of 0.98 or more. Tasks can "spin their wheels", ie appear to be running but not making any actual progress. If that goes on for quite a few hours, the CPU efficiency drops alarmingly.


Would that be in the instance of hibernation, say?

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

Gerry Rough
Gerry Rough
Joined: 1 Mar 05
Posts: 102
Credit: 1,847,066
RAC: 0

RE: The WCG tasks in total

Message 90506 in response to message 90503

Quote:
The WCG tasks in total amount to about 50-60 hours of crunching. You don't list anywhere, if I recall correctly, the complete list of projects and the respective resource shares for that computer. If the share for WCG was around 30% or so, the number of tasks in your cache wouldn't seem to be unusual - particularly if any other projects weren't able to supply their full quota of work at the time those tasks were downloaded. There are simply too many unknown factors to give a rigorous answer to this question.

I suspect mikey may have given the correct answer to the question. Perhaps my WCG got out of whack and BOINC decided to download the required WUs to make up for some debt from other projects. The resource share, by the way, is 5% for that project. It seems to have caught up since it is now running normally with no high priority tasks running.


(Click for detailed stats)

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5,385,205
RAC: 0

RE: Hi Paul, RE: The

Message 90507 in response to message 90504

Quote:

Hi Paul,

Quote:
The only argument I have with your analysis is that I also have seen this issue and I do run with a very limited cache.

When you say "this issue", I presume you are referring to BOINC making the decision to run tasks at high priority when there was no need for it to do so

Ok, two issues ... :)

In my case the high priority is caused by the way the internal calculations are made and because my switch interval is 12 hours this causes BOINC to think that the tasks are under siege even though they only take 20 minutes to run ...

I was actually referring to the starting of other tasks when there is a task that is in progress and has minutes or seconds to run to completion ...

In both cases, the problem is simplistic internal modeling and calculations that really do not reflect the multi-CPU systems. The third effect that falls out of this is when you do run "lean" as I do, the fact that you have one or more long running tasks present also throws in a huge bias ... which tends to cause the cache to "flex" more than it should and in many cases run lower than it should.

The old problem of managers wanting to put 9 women on the job and get the baby in a month ...

Again, I tend to see it more on my 8 CPU systems than the 4, but I first noticed it on the 4 CPU Dells I have (well over 3 years ago).

Last comment, as far as I know much of what is being done in the CPU Scheduler is not JM VII's "fault" in that I don't think he is involved in the changes underway or that have been done in a few cycles.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,210
Credit: 43,611,564,184
RAC: 44,033,237

RE: Gary Roberts has

Message 90508 in response to message 90502

Quote:
Gary Roberts has offered the "it had more deadline pressure" explanation. I think there MAY be another explanation for why Gerry's task Z is crunching when his task A is not. The explanation is very simple... maybe it just looks like task A was overlooked.... perhaps A was crunching just a few minutes before Gerry took the screenshot.


That is possible but I would think less likely than the simpler explanation of BOINC doing the tasks it considers to be most at risk.

When I look at all the WCG tasks more closely, things are a bit more complicated than at first glance. Firstly, I now notice that there are a bunch of tasks with a deadline timestamp of 7:28:41AM which were all obviously downloaded in a single batch. However some are due on the 22nd and some are due exactly 1 day later on the 23rd. I know nothing about WCG but obviously they must have at least 2 different deadlines, one exactly 1 day longer than the other. I had initially overlooked this fact.

Secondly, task A has a deadline of 7:08:08PM. I had initially mistaken this for "AM" ie 20mins earlier than X but in fact it's half a day later than X or Y or Z so it's little surprise that these three are being done first since they really are under a lot more deadline pressure than A.

It seems to me that unless high priority mode is rescinded, BOINC will knock off all the "AM" tasks for the 22nd before it finally goes back to A.

Quote:
Furthermore, Gary Roberts' "it has more deadline pressure" theory predicts that Gerry's computer should, one by one, start ALL of the WCG tasks in the cache.


No, not ALL of them - only those that have a deadline on the 22nd. There seem to be 3 more of these. There are a further 7 whose deadline isn't until the 23rd. A should be completed before those 7.

Quote:
But Gerry has told us that hasn't happened so Roberts' theory seems to fail.


Where did he say that? Perhaps it was in a PM? In any case my theory only predicts that 3 more will start before A is completed. Of course, if BOINC should decide that it no longer needs to use high priority mode or if high priority mode switches to a different project then things would be different yet again.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,210
Credit: 43,611,564,184
RAC: 44,033,237

RE: Would that be in the

Message 90509 in response to message 90505

Quote:
Would that be in the instance of hibernation, say?

No, nothing to do with hibernation, I'm running lots of machines in non-airconditioned space where they all tend to overheat each other, particularly at weekends. Occasionally, one or two machines in a batch will have the science app "lock up" and although the status is shown as "running" and everything otherwise appears normal, there is no further progress until BOINC is stopped and restarted. When that happens, the CPU efficiency needs to be fixed in the state file otherwise BOINC immediately goes into high priority mode.

I've seen others report this "spinning the wheels" behaviour from time to time and my assumption is that it's an overheating problem. I rarely see the problem in winter.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,210
Credit: 43,611,564,184
RAC: 44,033,237

RE: ... The resource share,

Message 90510 in response to message 90506

Quote:
... The resource share, by the way, is 5% for that project.


Then you really did have far too many WCG tasks for that share. What is your most highly resourced project and does it ever fail to deliver tasks exactly when asked? If BOINC can't get what it wants from the main projects, it tends to immediately get what it needs from others that can supply, even though this will cause problems if the main project(s) then get work again.

Cheers,
Gary.

Gerry Rough
Gerry Rough
Joined: 1 Mar 05
Posts: 102
Credit: 1,847,066
RAC: 0

RE: RE: ... The resource

Message 90511 in response to message 90510

Quote:
Quote:
... The resource share, by the way, is 5% for that project.

Then you really did have far too many WCG tasks for that share. What is your most highly resourced project and does it ever fail to deliver tasks exactly when asked? If BOINC can't get what it wants from the main projects, it tends to immediately get what it needs from others that can supply, even though this will cause problems if the main project(s) then get work again.

Right now, my highest resource share is Lattice, which I am planning to dump in the next few months when I reach 200k (might as well make it worth my while!). I am in the process of paring back my projects to six. Nope, no project has ever failed to deliver on time. The high priority mess that you saw is fairly rare, even with my extra two days cache, which I have now changed. I have noticed in the last month or so that high priority is happening more than usual, though not as much as you might think.

When BOINC gets too much work for a given project, then does it not still calculate that it is indeed ovecommitted? This doesn't sound very productive.


(Click for detailed stats)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.