Excessive Work Cache Size - How to screw your new Wingman!!

Donald A. Tevault

Joined: 17 Feb 06

Posts: 439

Credit: 73516529

RAC: 0

RE: RE: RE: But,

23 Apr 2011 14:26:14 UTC

Message 105146 in response to message 105145

(moderation:

)

Quote:

Quote:
Quote:

But, please go back and (re-)read Gary's opening post in this thread. The basic underlying reason for high priority running is your choice to keep a 10-day cache on a 14-day turnround project. That's tight, and as Gary demonstrated, risky and (arguably) antisocial - from the perspective of both the project as a whole, and of your individual wingmates.

I disagree with your assertion that I'm "antisocial". Until BOINC started screwing up in this manner, my 10-day caches were working out just fine.

since you say you know it's not working well anymore (and we all know DA's brickhead sheduler get's more and more crazy), go blame DA!

it's pretty crazy to inist on a setup which you are calling "screwed up" yourself..

What's the problem with pointing out a bug in the BOINC software?

FrankHagen

Joined: 13 Feb 08

Posts: 102

Credit: 272200

RAC: 0

RE: RE: RE: RE: But

23 Apr 2011 14:37:54 UTC

Message 105147 in response to message 105146

(moderation:

)

Quote:

Quote:
Quote:
Quote:

But, please go back and (re-)read Gary's opening post in this thread. The basic underlying reason for high priority running is your choice to keep a 10-day cache on a 14-day turnround project. That's tight, and as Gary demonstrated, risky and (arguably) antisocial - from the perspective of both the project as a whole, and of your individual wingmates.

I disagree with your assertion that I'm "antisocial". Until BOINC started screwing up in this manner, my 10-day caches were working out just fine.

since you say you know it's not working well anymore (and we all know DA's brickhead sheduler get's more and more crazy), go blame DA!

it's pretty crazy to inist on a setup which you are calling "screwed up" yourself..

What's the problem with pointing out a bug in the BOINC software?

you should know there is one single guy who has a problem to accept that.
and of course that the one and only thing we can do is to quirk around those bugs on client side.

complaining about bugs which only can be fixed in berkely over here is - hmmm - a waste of time...

Donald A. Tevault

Joined: 17 Feb 06

Posts: 439

Credit: 73516529

RAC: 0

RE: RE: RE: RE: Quote

23 Apr 2011 14:54:54 UTC

Message 105148 in response to message 105147

(moderation:

)

Quote:

Quote:
Quote:
Quote:
Quote:

But, please go back and (re-)read Gary's opening post in this thread. The basic underlying reason for high priority running is your choice to keep a 10-day cache on a 14-day turnround project. That's tight, and as Gary demonstrated, risky and (arguably) antisocial - from the perspective of both the project as a whole, and of your individual wingmates.

I disagree with your assertion that I'm "antisocial". Until BOINC started screwing up in this manner, my 10-day caches were working out just fine.

since you say you know it's not working well anymore (and we all know DA's brickhead sheduler get's more and more crazy), go blame DA!

it's pretty crazy to inist on a setup which you are calling "screwed up" yourself..

What's the problem with pointing out a bug in the BOINC software?

you should know there is one single guy who has a problem to accept that.
and of course that the one and only thing we can do is to quirk around those bugs on client side.

complaining about bugs which only can be fixed in berkely over here is - hmmm - a waste of time...

I pointed out a problem with the software, thinking that we might have a rational discussion about it. But, it turns out that I was wrong.

Okay, fine. The next time I identify a problem with either the BOINC or Einstein software, I'll just keep my damn mouth shut.

FrankHagen

Joined: 13 Feb 08

Posts: 102

Credit: 272200

RAC: 0

RE: I pointed out a problem

23 Apr 2011 16:21:11 UTC

Message 105149 in response to message 105148

(moderation:

)

Quote:

I pointed out a problem with the software, thinking that we might have a rational discussion about it. But, it turns out that I was wrong.

Okay, fine. The next time I identify a problem with either the BOINC or Einstein software, I'll just keep my damn mouth shut.

but it's pretty clear, it's not the project.

you can watch the same happening on most of the projects which got CPU and GPU apps.

in fact the problem is current boinc design and there is nothing we can do about it.

there are two possible ways to quirk around it:

client-side micromanagement of workflow for each and every project and host you got - and i agree: THIS a real PITA!

or the real quirk - splitting CPU and GPU work into 2 seperate sub-projects and fixing the estimated flops.

Jord

Joined: 26 Jan 05

Posts: 2952

Credit: 5893653

RAC: 0

RE: ...(and we all know

25 Apr 2011 9:47:18 UTC

Message 105150 in response to message 105145

(moderation:

)

Quote:

...(and we all know DA's brickhead sheduler get's more and more crazy), go blame DA!

It's so easy to go blame someone else over how something no longer works as you're used to, no matter even if the old way that you're used to is highly flawed and most probably the broken way.

I always see people post things like "go blame DA", or "David Anderson is the bad guy", or "David A is to blame, he doesn't listen to us". Yet I don't see any of you who are saying this, dive into the source code and make their own version of whatever it is David broke according to you, either.

Nor does anyone of you go out onto the email lists and point out to David what's broken, where exactly, and how it should be fixed.

So why insist in blaming David, when all of you doing that blaming can as easily be blamed for lacking backbone into fixing the problem as well? I don't see any fruits of your work, I don't see any BOINC clones out there with your 'fixed' code in it. You (general, plural) know it so well, yet here you are, still only using the same BOINC as we all are using.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5878

Credit: 118830579469

RAC: 22444597

OK, enough is enough! This

25 Apr 2011 10:59:30 UTC

Message 105151

(moderation:

)

OK, enough is enough!

This is not the thread or even the forum in which posts that appear to be designed to start a war about the competency or otherwise of the BOINC Devs will be tolerated. If there is any escalation, I'll delete the lot.

Please allow the thread to return to topic.

It is quite permissible to describe what you think are shortcomings in BOINC and it's quite permissible to present a counter opinion. Just leave out the (totally unnecessary) character assassination (and/or flame-bait) while you do so.

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5878

Credit: 118830579469

RAC: 22444597

RE: ... For my

25 Apr 2011 12:11:24 UTC

Message 105152 in response to message 105129

(moderation:

)

Quote:

... For my higher-performance machines, I generally like to keep a ten-day cache.
....
.... on the two machines that I've upgraded to CUDA-capable ....

Whilst there isn't usually a problem with a 10 day cache for "always on" machines that are being regularly monitored and observed, I must apologise for overlooking the second part of the quote. At the time, I was catching up with a number of posts and I was skimming through quite quickly until I got to Richard's post and the 'unix time format' explanation of why the 'leading zero' theory couldn't be the cause of the problem. That post grabbed my attention and I ended up responding to your very next message without going back and re-reading your original post.

For some reason, I was quite sure in my mind that you were talking about CPU only hosts, so I only bothered to list some ways that I knew from personal experience that HP mode could be triggered on such machines. I don't have any hosts with GPUs so I have no relevant experience to call on anyway.

However, from what others have said about running CPU and GPU tasks from the same project simultaneously, I think I understand why you can get away with 10 day caches on CPU machines but could easily have trouble on a host with a fast GPU. Some one will correct me if I'm wrong, but I believe the time taken to crunch a BRP task on a fast GPU is quite overestimated. I might be wrong about this and it might just be badly overestimated in those cases where people are running under AP but without a estimate in app_info.xml. People have said they have to increase the cache size just to get a supply of GPU tasks.

As those tasks will be crunched quickly, each one completed will result in a drop in the DCF as BOINC tries to correct the estimate. This will affect the CPU work cache (each CPU task will end up with a reduced estimate) causing more CPU work to be downloaded to fill the cache. If a CPU task then completes, the actual crunch time will be longer than the reduced estimate and there will be a single upward step in DCF to make the correction to all tasks in the cache. As a result of this, BOINC may now suddenly think that you have too many hours of work to complete safely within the deadline so HP mode is entered immediately.

So, on your two CUDA enabled hosts, what is the estimate for a GPU task and how long does it actually take to crunch one? If this really is the cause of your problem, I believe you could fix it by using an app_info.xml file and playing with parameters there. In your state file there should be a parameter. My understanding is that the value of this parameter is supplied by the project, so you can't tweak it in the state file without it getting reset everytime you get new tasks. If you were to set up an app_info.xml and run under AP, you could include a tweakable entry and fix the crunch time estimate for your setup. If you prevent DCF from oscillating you should become stable again and avoid HP mode.

Cheers,
Gary.

FrankHagen

Joined: 13 Feb 08

Posts: 102

Credit: 272200

RAC: 0

RE: As those tasks will be

25 Apr 2011 12:44:08 UTC

Message 105153 in response to message 105152

(moderation:

)

Quote:

As those tasks will be crunched quickly, each one completed will result in a drop in the DCF as BOINC tries to correct the estimate. This will affect the CPU work cache (each CPU task will end up with a reduced estimate) causing more CPU work to be downloaded to fill the cache. If a CPU task then completes, the actual crunch time will be longer than the reduced estimate and there will be a single upward step in DCF to make the correction to all tasks in the cache. As a result of this, BOINC may now suddenly think that you have too many hours of work to complete safely within the deadline so HP mode is entered immediately.

exactly that's what happens. the bigger the difference betwen CPU and GPU runtimes the worse.

mikey

Joined: 22 Jan 05

Posts: 12842

Credit: 1884265765

RAC: 603696

RE: RE: As those tasks

25 Apr 2011 14:06:01 UTC

Message 105154 in response to message 105153

(moderation:

)

Quote:

Quote:
As those tasks will be crunched quickly, each one completed will result in a drop in the DCF as BOINC tries to correct the estimate. This will affect the CPU work cache (each CPU task will end up with a reduced estimate) causing more CPU work to be downloaded to fill the cache. If a CPU task then completes, the actual crunch time will be longer than the reduced estimate and there will be a single upward step in DCF to make the correction to all tasks in the cache. As a result of this, BOINC may now suddenly think that you have too many hours of work to complete safely within the deadline so HP mode is entered immediately.

exactly that's what happens. the bigger the difference betwen CPU and GPU runtimes the worse.

And if the Boinc Developers could come up with a way to separate that in the software then more of us would run both the cpu and gpu on the same project. Right now alot os us must crunch for project a on one machines cpu but use a different machines gpu to solve the problems and still have a decent sized cache. They we reverse it on a 2nd pc.

FrankHagen

Joined: 13 Feb 08

Posts: 102

Credit: 272200

RAC: 0

of course you can run the

25 Apr 2011 14:10:18 UTC

Message 105155 in response to message 105154

(moderation:

)

of course you can run the CPU-tasks inside VM and the GPU-tasks outside. performance loss is very little..

Excessive Work Cache Size - How to screw your new Wingman!!

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner