Excessive Work Cache Size - How to screw your new Wingman!!

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: RE: RE: But,

Quote:
Quote:
Quote:

But, please go back and (re-)read Gary's opening post in this thread. The basic underlying reason for high priority running is your choice to keep a 10-day cache on a 14-day turnround project. That's tight, and as Gary demonstrated, risky and (arguably) antisocial - from the perspective of both the project as a whole, and of your individual wingmates.

I disagree with your assertion that I'm "antisocial". Until BOINC started screwing up in this manner, my 10-day caches were working out just fine.

since you say you know it's not working well anymore (and we all know DA's brickhead sheduler get's more and more crazy), go blame DA!

it's pretty crazy to inist on a setup which you are calling "screwed up" yourself..

What's the problem with pointing out a bug in the BOINC software?

FrankHagen
FrankHagen
Joined: 13 Feb 08
Posts: 102
Credit: 272200
RAC: 0

RE: RE: RE: RE: But

Quote:
Quote:
Quote:
Quote:

But, please go back and (re-)read Gary's opening post in this thread. The basic underlying reason for high priority running is your choice to keep a 10-day cache on a 14-day turnround project. That's tight, and as Gary demonstrated, risky and (arguably) antisocial - from the perspective of both the project as a whole, and of your individual wingmates.

I disagree with your assertion that I'm "antisocial". Until BOINC started screwing up in this manner, my 10-day caches were working out just fine.

since you say you know it's not working well anymore (and we all know DA's brickhead sheduler get's more and more crazy), go blame DA!

it's pretty crazy to inist on a setup which you are calling "screwed up" yourself..

What's the problem with pointing out a bug in the BOINC software?

you should know there is one single guy who has a problem to accept that.
and of course that the one and only thing we can do is to quirk around those bugs on client side.

complaining about bugs which only can be fixed in berkely over here is - hmmm - a waste of time...

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: RE: RE: RE: Quote

Quote:
Quote:
Quote:
Quote:
Quote:

But, please go back and (re-)read Gary's opening post in this thread. The basic underlying reason for high priority running is your choice to keep a 10-day cache on a 14-day turnround project. That's tight, and as Gary demonstrated, risky and (arguably) antisocial - from the perspective of both the project as a whole, and of your individual wingmates.

I disagree with your assertion that I'm "antisocial". Until BOINC started screwing up in this manner, my 10-day caches were working out just fine.

since you say you know it's not working well anymore (and we all know DA's brickhead sheduler get's more and more crazy), go blame DA!

it's pretty crazy to inist on a setup which you are calling "screwed up" yourself..

What's the problem with pointing out a bug in the BOINC software?

you should know there is one single guy who has a problem to accept that.
and of course that the one and only thing we can do is to quirk around those bugs on client side.

complaining about bugs which only can be fixed in berkely over here is - hmmm - a waste of time...

I pointed out a problem with the software, thinking that we might have a rational discussion about it. But, it turns out that I was wrong.

Okay, fine. The next time I identify a problem with either the BOINC or Einstein software, I'll just keep my damn mouth shut.

FrankHagen
FrankHagen
Joined: 13 Feb 08
Posts: 102
Credit: 272200
RAC: 0

RE: I pointed out a problem

Quote:

I pointed out a problem with the software, thinking that we might have a rational discussion about it. But, it turns out that I was wrong.

Okay, fine. The next time I identify a problem with either the BOINC or Einstein software, I'll just keep my damn mouth shut.

but it's pretty clear, it's not the project.

you can watch the same happening on most of the projects which got CPU and GPU apps.

in fact the problem is current boinc design and there is nothing we can do about it.

there are two possible ways to quirk around it:

client-side micromanagement of workflow for each and every project and host you got - and i agree: THIS a real PITA!

or the real quirk - splitting CPU and GPU work into 2 seperate sub-projects and fixing the estimated flops.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5779100
RAC: 0

RE: ...(and we all know

Quote:
...(and we all know DA's brickhead sheduler get's more and more crazy), go blame DA!


It's so easy to go blame someone else over how something no longer works as you're used to, no matter even if the old way that you're used to is highly flawed and most probably the broken way.

I always see people post things like "go blame DA", or "David Anderson is the bad guy", or "David A is to blame, he doesn't listen to us". Yet I don't see any of you who are saying this, dive into the source code and make their own version of whatever it is David broke according to you, either.

Nor does anyone of you go out onto the email lists and point out to David what's broken, where exactly, and how it should be fixed.

So why insist in blaming David, when all of you doing that blaming can as easily be blamed for lacking backbone into fixing the problem as well? I don't see any fruits of your work, I don't see any BOINC clones out there with your 'fixed' code in it. You (general, plural) know it so well, yet here you are, still only using the same BOINC as we all are using.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109922348249
RAC: 31202068

OK, enough is enough! This

OK, enough is enough!

This is not the thread or even the forum in which posts that appear to be designed to start a war about the competency or otherwise of the BOINC Devs will be tolerated. If there is any escalation, I'll delete the lot.

Please allow the thread to return to topic.

It is quite permissible to describe what you think are shortcomings in BOINC and it's quite permissible to present a counter opinion. Just leave out the (totally unnecessary) character assassination (and/or flame-bait) while you do so.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109922348249
RAC: 31202068

RE: ... For my

Quote:
... For my higher-performance machines, I generally like to keep a ten-day cache.
....
.... on the two machines that I've upgraded to CUDA-capable ....

Whilst there isn't usually a problem with a 10 day cache for "always on" machines that are being regularly monitored and observed, I must apologise for overlooking the second part of the quote. At the time, I was catching up with a number of posts and I was skimming through quite quickly until I got to Richard's post and the 'unix time format' explanation of why the 'leading zero' theory couldn't be the cause of the problem. That post grabbed my attention and I ended up responding to your very next message without going back and re-reading your original post.

For some reason, I was quite sure in my mind that you were talking about CPU only hosts, so I only bothered to list some ways that I knew from personal experience that HP mode could be triggered on such machines. I don't have any hosts with GPUs so I have no relevant experience to call on anyway.

However, from what others have said about running CPU and GPU tasks from the same project simultaneously, I think I understand why you can get away with 10 day caches on CPU machines but could easily have trouble on a host with a fast GPU. Some one will correct me if I'm wrong, but I believe the time taken to crunch a BRP task on a fast GPU is quite overestimated. I might be wrong about this and it might just be badly overestimated in those cases where people are running under AP but without a estimate in app_info.xml. People have said they have to increase the cache size just to get a supply of GPU tasks.

As those tasks will be crunched quickly, each one completed will result in a drop in the DCF as BOINC tries to correct the estimate. This will affect the CPU work cache (each CPU task will end up with a reduced estimate) causing more CPU work to be downloaded to fill the cache. If a CPU task then completes, the actual crunch time will be longer than the reduced estimate and there will be a single upward step in DCF to make the correction to all tasks in the cache. As a result of this, BOINC may now suddenly think that you have too many hours of work to complete safely within the deadline so HP mode is entered immediately.

So, on your two CUDA enabled hosts, what is the estimate for a GPU task and how long does it actually take to crunch one? If this really is the cause of your problem, I believe you could fix it by using an app_info.xml file and playing with parameters there. In your state file there should be a parameter. My understanding is that the value of this parameter is supplied by the project, so you can't tweak it in the state file without it getting reset everytime you get new tasks. If you were to set up an app_info.xml and run under AP, you could include a tweakable entry and fix the crunch time estimate for your setup. If you prevent DCF from oscillating you should become stable again and avoid HP mode.

Cheers,
Gary.

FrankHagen
FrankHagen
Joined: 13 Feb 08
Posts: 102
Credit: 272200
RAC: 0

RE: As those tasks will be

Quote:
As those tasks will be crunched quickly, each one completed will result in a drop in the DCF as BOINC tries to correct the estimate. This will affect the CPU work cache (each CPU task will end up with a reduced estimate) causing more CPU work to be downloaded to fill the cache. If a CPU task then completes, the actual crunch time will be longer than the reduced estimate and there will be a single upward step in DCF to make the correction to all tasks in the cache. As a result of this, BOINC may now suddenly think that you have too many hours of work to complete safely within the deadline so HP mode is entered immediately.

exactly that's what happens. the bigger the difference betwen CPU and GPU runtimes the worse.

mikey
mikey
Joined: 22 Jan 05
Posts: 11940
Credit: 1832221543
RAC: 213071

RE: RE: As those tasks

Quote:
Quote:
As those tasks will be crunched quickly, each one completed will result in a drop in the DCF as BOINC tries to correct the estimate. This will affect the CPU work cache (each CPU task will end up with a reduced estimate) causing more CPU work to be downloaded to fill the cache. If a CPU task then completes, the actual crunch time will be longer than the reduced estimate and there will be a single upward step in DCF to make the correction to all tasks in the cache. As a result of this, BOINC may now suddenly think that you have too many hours of work to complete safely within the deadline so HP mode is entered immediately.

exactly that's what happens. the bigger the difference betwen CPU and GPU runtimes the worse.

And if the Boinc Developers could come up with a way to separate that in the software then more of us would run both the cpu and gpu on the same project. Right now alot os us must crunch for project a on one machines cpu but use a different machines gpu to solve the problems and still have a decent sized cache. They we reverse it on a 2nd pc.

FrankHagen
FrankHagen
Joined: 13 Feb 08
Posts: 102
Credit: 272200
RAC: 0

of course you can run the

of course you can run the CPU-tasks inside VM and the GPU-tasks outside. performance loss is very little..

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.