Work distribution

KWSN-GMC-Peeper of the Castle Anthrax
KWSN-GMC-Peeper...
Joined: 6 Oct 05
Posts: 62
Credit: 54,755,621
RAC: 0
Topic 195635

The Einstein people really need to fix the work distribution problems. Every so often this project sends me something like 4 days work for EACH of 7 or 8 days in a row and I end up having to delete work units.
This has happened a number of times and I'm beginning to think my best solution is to just take Einstein off my list. After all.if it's not that important to them why should it be to me/us?

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1,079
Credit: 341,280
RAC: 0

Work distribution

Quote:
The Einstein people really need to fix the work distribution problems.


Perhaps you need to fix your preferences (cache settings).

Quote:
Every so often this project sends me something like 4 days work for EACH of 7 or 8 days in a row...


Why would that be bad? For what kind of application are those tasks?

Quote:
...and I end up having to delete work units.


And why would you do that?

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

KWSN-GMC-Peeper of the Castle Anthrax
KWSN-GMC-Peeper...
Joined: 6 Oct 05
Posts: 62
Credit: 54,755,621
RAC: 0

My preferences are correct.

My preferences are correct. I've been doing this for quite some time and am having no problems with any other of the projects i'm running. Please feel free to explain to me what in the preferences can allow a single project to send me vastly more work than there are hours in a day.

You fail to understand the problem. It happened again today. I have received approximately 32 hours (projected..and that actually is fairly accurate on my system) work for each of 8 days. It has happened before.
I've been doing this so long I neither feel the need to look in on the program on a daily or even weekly basis nor do I feel this is a race with other people to do 'more work'. I just let it run and crunch data with no regard as to how much. The first time it happened, by the time I noticed, I was still crunching on WU that were already 2 weeks past due. Spending the time and electricity on WU that will not be used is, of course, not acceptable.

This only happens with E@H.

KWSN-GMC-Peeper of the Castle Anthrax
KWSN-GMC-Peeper...
Joined: 6 Oct 05
Posts: 62
Credit: 54,755,621
RAC: 0

Oh yes...I can say this

Oh yes...I can say this problem did not begin until I put in a video card with a GPU that could be used by BOINC so I would imagine the problem is rooted there.
I actually brought this issue up on the S@H board and according to many people there it is a known problem and has been for some time.

paul milton
paul milton
Joined: 16 Sep 05
Posts: 329
Credit: 35,825,044
RAC: 0

RE: Oh yes...I can say

Quote:
Oh yes...I can say this problem did not begin until I put in a video card with a GPU that could be used by BOINC so I would imagine the problem is rooted there.
I actually brought this issue up on the S@H board and according to many people there it is a known problem and has been for some time.

its a boinc bug, theres a thread around here some where about it but i cant for the life of me find it :| theres not much the einstein folks can do about it.

seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1,079
Credit: 341,280
RAC: 0

RE: Oh yes...I can say

Quote:
Oh yes...I can say this problem did not begin until I put in a video card with a GPU that could be used by BOINC so I would imagine the problem is rooted there.


If you know that, stop using your GPU (with Einstein), which is a preference setting ;-)

Quote:
I actually brought this issue up on the S@H board and according to many people there it is a known problem and has been for some time.


And why would that then be an Einstein problem?

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

RandyC
RandyC
Joined: 18 Jan 05
Posts: 2,663
Credit: 108,316,738
RAC: 19,989

RE: My preferences are

Quote:

My preferences are correct. I've been doing this for quite some time and am having no problems with any other of the projects i'm running. Please feel free to explain to me what in the preferences can allow a single project to send me vastly more work than there are hours in a day.

You fail to understand the problem. It happened again today. I have received approximately 32 hours (projected..and that actually is fairly accurate on my system) work for each of 8 days. It has happened before.
I've been doing this so long I neither feel the need to look in on the program on a daily or even weekly basis nor do I feel this is a race with other people to do 'more work'. I just let it run and crunch data with no regard as to how much. The first time it happened, by the time I noticed, I was still crunching on WU that were already 2 weeks past due. Spending the time and electricity on WU that will not be used is, of course, not acceptable.

This only happens with E@H.

I'm only running two systems on BOINC now, but I've seen issues like this on both. One of the systems has settled down and I can leave it to its own devices now. On that system, when I ran E@H (both GW and BRP [cuda only]) side by side with Malaria, then MCN went wild and downloaded way too much work. I disabled MCN and it started behaving itself.

The other system has a similar mix, but I can't trust it to behave itself. Sometimes I can open it up and it will run fine for two or three days, then it will suck down 3 weeks or more of wus while I'm off at work and can't monitor it. My current strategy (and it only works because I'm running only two systems) is to review the cache size morning and evening and allow downloads as necessary. It will download what it thinks is up to 35 days worth of E@H at one time. In actuality, it downloads about 2 1/2 days when that happens. Obviously, for E@H, a 35 day cache will time out long before finishing the WUs. I'm just glad the system can chew though them quick enough to avoid deadline issues.

Seti Classic Final Total: 11446 WU.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,115
Credit: 36,576,012,436
RAC: 37,747,698

RE: The Einstein people

Quote:
The Einstein people really need to fix the work distribution problems ...


You mention in a subsequent message that you are a long term participant in BOINC projects so I guess you would be aware of the differences between BOINC's responsibilities and those of the project itself. The distribution of work is clearly a BOINC responsibility. The software limitations which are causing your distress will only get remedied when fixes/enhancements are added to BOINC. Project staff don't usually get any time to get involved in that.

Participants often blame the project for BOINC's limitations. Here is a link to a previous message I posted in response to a similar type of complaint. The comments made there don't really apply to your situation since you seem to have an excess of CUDA tasks rather than CPU tasks. In fact, your tasks list doesn't really show the situation you mention. There are currently 56 tasks in total of which 22 are completed and returned and 34 are shown as 'in progress'. I can see when you started to get CUDA tasks - 7:24 on Feb 04 (UTC). You have 34 CUDA tasks in total and none have been returned and none have been aborted or have errored out. All the 22 completed tasks are CPU tasks and you don't have any 'in progress' CPU tasks. There seems to be nothing to support a claim of "4 days work for EACH of 7 or 8 days in a row ..." and "having to delete work units".

You don't have a single deleted task, nothing is anywhere near past deadline, you aren't processing GPU tasks anyway, you don't have any CPU tasks on hand, and there is no evidence to show 4 days worth of work being delivered to you on each of 7 or 8 consecutive days. You probably do have an oversupply of GPU tasks but since you haven't returned any, there is no run time available so it's hard to make an accurate judgment. When done on a CPU, the BRP3 tasks each take about one day. They should take a lot less than a day each (maybe quite a few per day) when done on a GPU+CPU. I'm basing this on the speedups that other people are claiming - I have no personal knowledge. What are they estimated at in your cache and why are none being returned?

All I can think of is that perhaps you are remembering the bad experience that varous people had when there was a genuine total shortage of GPU work and BOINC's flawed response was to keep downloading extra CPU tasks every time unfulfilled GPU task requests were made and (for some reason) you think that is happening (or about to happen) again. I don't know of any reason why there might be a new round of that previous problem. You can read about that previous problem by perusing the entire Scheduler went nuts thread. This is probably the thread that Paul was referring to.

EDIT: I have re-examined your tasks list since I copied down the numbers given above. You have acquired two more tasks in the interim, another CUDA task and a BRP3 CPU task. So you now have a single CPU task and 35 GPU tasks 'in progress'. I have no idea why BOINC is continuing to request GPU work even though none is being returned. I also have no idea why BOINC is (until recently) failing to get CPU work when, in the past, CPU tasks have been very promptly crunched and returned. What resource shares do you use for all your projects? Does your GPU do work on any other project?

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 1,933
Credit: 196,026,832
RAC: 918,195

RE: its a boinc bug, theres

Quote:
its a boinc bug, theres a thread around here some where about it but i cant for the life of me find it :| theres not much the einstein folks can do about it.


I think you might mean Scheduler went nuts.

People keep saying "it's a BOINC bug", but without evidence, and an explanation of the process involved, we'll never get it fixed. The first step would be to work out which part of BOINC has the problem. Does the project send people too much work, of does their computer request too much work?

In the earlier case, it did appear to be a problem at the server end: request CUDA work, get allocated CPU work, still need CUDA work, request CUDA work, get allocated CPU work.... What we needed then, but never got, was somebody to catch a server log for one of those 'request CUDA, get CPU' events. It's easy to do - just click the 'last contact' link in your computer list - but the tricky bit is finding a machine which exhibits this behaviour, and catching it in the act. Because the last contact link show exactly that - the last contact - there's no point in clicking it hours after in the event. It's doing so while the work is piling up that would be interesting.

mikey
mikey
Joined: 22 Jan 05
Posts: 5,655
Credit: 540,823,983
RAC: 128,412

RE: Oh yes...I can say

Quote:
Oh yes...I can say this problem did not begin until I put in a video card with a GPU that could be used by BOINC so I would imagine the problem is rooted there.
I actually brought this issue up on the S@H board and according to many people there it is a known problem and has been for some time.

Your problem is most likely that you use BOTH your cpu and your gpu on the same project!! Boinc, the software, currently has no clue how to separate the needs of one versus the other. The gpu is crunching thru units at the rate of 10 per hour while the cpu is going thru units at the rate of 1 every 3 hours, all numbers are made up by me, so when you need workunits Boinc gets workunits, sometimes LOTS of workunits because after all your gpu needs them! The best thing is to chose one to crunch here and move the other to another project. There are Beta version of the Boinc software but I do not know if any of them are any better at this yet.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 1,933
Credit: 196,026,832
RAC: 918,195

RE: RE: Oh yes...I can

Quote:
Quote:
Oh yes...I can say this problem did not begin until I put in a video card with a GPU that could be used by BOINC so I would imagine the problem is rooted there.
I actually brought this issue up on the S@H board and according to many people there it is a known problem and has been for some time.

Your problem is most likely that you use BOTH your cpu and your gpu on the same project!! Boinc, the software, currently has no clue how to separate the needs of one versus the other. The gpu is crunching thru units at the rate of 10 per hour while the cpu is going thru units at the rate of 1 every 3 hours, all numbers are made up by me, so when you need workunits Boinc gets workunits, sometimes LOTS of workunits because after all your gpu needs them! The best thing is to chose one to crunch here and move the other to another project. There are Beta version of the Boinc software but I do not know if any of them are any better at this yet.


It's not actually as easy as that.

BOINC, such as the currently recommended v6.10.58, measures the size of your CPU queue and your GPU/CUDA queue separately, and requests the right amount of work (measured in seconds) to keep each queue topped up, independently. And the Einstein project, in particular, is good at estimating the relative speeds of your CPU and your GPU, and describing the "size" of each task and the speed of each device appropriately, so that the wild fluctuations in DCF seen on some projects when running both types of device don't happen here (or at least not so much - BOINC can still be confused by, for example, multiple graphics cards of different speeds). The newer BOINC clients, such as the v6.12.13 in final testing, are better still, but v6.10.58 isn't bad.

My new host 3868392 is running both CPU and GPU Einstein tasks, without any of the side effects you describe - but without providing any evidence of a work fetch problem into my debug log, either.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.