Pending work passing deadline

BarryAZ
BarryAZ
Joined: 8 May 05
Posts: 190
Credit: 320740540
RAC: 9548

RE: The combination of 2

Message 14259 in response to message 14258

Quote:


The combination of 2 days queue, sharing with seti and CC4.19 is the lethal mix that is causing you to have far too much work on hand. If you were to set your queue to between 0.5 and 1.0 days, your problems would probably vanish. This would be the quickest and easiest "fix" if you really wanted to stay with 4.19.
Please remember that the simple act of having just two projects (where one seems to be very reliable anyway) largely negates the need for large queues.

Well I can appreciate the rational -- though it is pretty much just that. Of course the impetus behind BOINC versus seti classic was in part the capability of running multiple projects -- an admirable concept.

It seems though that reliability got lost in that shuffle -- SETI BOINC is a relatively popular (at least in the BOINC world) project which unfortunately is only modestly reliable (at this instant they went 'offline' as far as I can see -- everything simply went dead about 30 minutes ago).

So to compensate for that sad lack of dependability, the recommendation was for larger caches (5, 6, 8 days). Also to compensate, one could add projects (which I've done with Einstein (and on one workstation I've added Climate instead). But the deal there is that notwithstanding the propaganda about multiple projects, running multiple projects adds to the complexity regarding optimal settings.

And then we have 'client version of the week' -- just to make sure that we have yet another stack of variables to complicate the settings.

I am rather surprised that a setting of 2 days and only Einstein (I had that on at least four of my computers) is evaluated as a deadly combination though.

I'm waiting for the killer evaluative code piece -- set up your computer, indicate which projects you will run on the computer, and this killer code will not only evaluate the computer configuration, but the mix of projects, the client version, the reliability of each of the projects and automatically spit out the ideal settings one should have.

Of course that would be the opinion of the 'killer eval code piece' -- but opinions are what we all rely on (smile).

BarryAZ
BarryAZ
Joined: 8 May 05
Posts: 190
Credit: 320740540
RAC: 9548

RE: Hope some of this helps

Message 14260 in response to message 14258

Quote:
Hope some of this helps with what you decide to do. Whatever else happens, please try to stop having to ditch expired work as this is a big negative for the health of the EAH server which surely must start to buckle under the ever increasing load at some point.

I can appreciate that -- I would love to stop having to ditch expired work -- that's why I posted the query in the first place.

It would be nice to have a client setting (specific to EACH project running on the specific workstation) which would allow a tweak -- 'I'm only 90% as efficient as your benchmarks think I am'.

I was wondering if perhaps I revised my project settings to indicate I'm only running 22 hours a day might achieve that result for my Einstein work.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109376352889
RAC: 36002285

RE: I am rather surprised

Message 14261 in response to message 14259

Quote:

I am rather surprised that a setting of 2 days and only Einstein (I had that on at least four of my computers) is evaluated as a deadly combination though.

I'm sorry, I really didn't make myself clear enough about the "problem" of 4.19 being in the mix. My statement was meant to highlight the fact that it was all three components, and particularly the presence of 4.19, that made the mix lethal. I'll also apologise for the overcolourful use of language in calling it "lethal".

CC4.19 has been a very stable platform (mostly) on which to run EAH. However it suffers from a particular nasty habit of downloading far too much work for EAH. It seems to underestimate the time to finish a WU when in fact each EAH WU always takes quite a bit longer than the estimate. It seems to assume that your machine will always be running 24/7 and it seems not to notice how many projects you might be trying to run. It's a bit like "think of plenty of WUs to at least last the distance and then double that again" as its inbuilt cacheing policy. I once tried 3 days as the cache size and ended up with about 7-8 days actual work and nearly died of shock. This is bad enough on its own but becomes worse if you support two or more projects.

Once you know about this and budget for it, it ceases to be too much of a problem. I've got one machine in particular, a P4 2.6G HT enabled with EAH/SAH 70/30, CC4.19 and a 0.5 day cache, that has been running unattended for the last 18 days while I've been out of the country and it never missed a beat. It had a heap of failed SAH uploads, due to the problems at SAH, which have now cleared and been turned into a heap of pendings awaiting the current validation backlog. It was running an optimised seti app and it completed a bunch of work for both projects. It doesn't seem to have run out of seti work at any stage but even if it did then EAH would have taken up the slack.

I've just had a good, close look at what that 0.5 days actually buys me. I'll give you the figures per CPU and as estimated hours/actual hours.

For EAH I have on hand 30 hours/40 hours for each of the two cpus.
For SAH I have on hand 11 hours/03 hours for each of the two cpus.
(Notice the effect of the optimised app in reducing the SAH crunch time).

So in actual crunching time, my 0.5 day cache actually gives me the best part of a full two days work (43 hours). It's little wonder that a 2+ day cache is likely to run you into deadline problems.

Cheers,
Gary.

Keck_Komputers
Keck_Komputers
Joined: 18 Jan 05
Posts: 376
Credit: 5744955
RAC: 0

RE: It would be nice to

Message 14262 in response to message 14260

Quote:
It would be nice to have a client setting (specific to EACH project running on the specific workstation) which would allow a tweak -- 'I'm only 90% as efficient as your benchmarks think I am'.


This is coming in the next client. Look for references to 'duration correction factor' on the boards or in the wiki for more info.

BOINC WIKI

BOINCing since 2002/12/8

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

John, The discussion here

John,

The discussion here in some of the posts looks real good. Do you think we should capture some of it. I am not sure if I am going to be working much today or not (I just got up). I know some of it may become Overtaken By events with the next major recommended release if the "chop limit" the version.

@All,

I use 4.45; and do sometimes do an unscheduled "update" to flush pendings and it work very well for me. Caviat, I have reletively fast computers (slowest is 2.8 GHz). My cache is 3 days across all 5 production projects (only 3 on the PowerMac can't do LHC@Home as there is no application and CPDN models crash). I have no WU that get lost of over-run the deadline. Like others Einstein@Home's estimate is about 1 hour too low on all machines. SETI@Home is 2* too high.

This does give me a large population. BOINC View is telling me for my 8 computers I have 151 results pending or in-flight (2 ready to report).

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109376352889
RAC: 36002285

Paul, Much of the

Message 14264 in response to message 14263

Paul,

Much of the discussion in this thread is documenting the particular behaviour of 4.19 and is not really relevant to 4.45 and later. With each passing day, the advent of 5.xx makes it a bit futile to dwell too much in the past. My guess is that the problems of older versions like 4.13 over in SAH land make it increasingly likely that some sort of lockout of older versions can't be too far away.

I really do admire your tenacity and you continue to amaze me with your apparent capacity to "document the world", despite your handicap, but in this particular case I think you should basically ignore what hasn't already been documented about versions like 4.19 and earlier and spend your apparently boundless energy on documenting the future, particularly for the benefit of all the new recruits from classicland who will probably land here sooner or later :).

On the subject of 4.45 and later, new recruits are going to need quite a bit of handholding in coping with JM7's scheduler philosophy and particularly for things seemingly to get "worse" until the code gets things "stabilized". Even experienced people are not really "getting it" and are taking actions (through impatience) that are actually making things worse. We are going to need every bit of your energy in documenting this sort of stuff for new users.

Please accept my personal warmest regards and thanks for the great job you are doing.

Cheers,
Gary.

BarryAZ
BarryAZ
Joined: 8 May 05
Posts: 190
Credit: 320740540
RAC: 9548

RE: RE: I am rather

Message 14265 in response to message 14261

Quote:
Quote:

I am rather surprised that a setting of 2 days and only Einstein (I had that on at least four of my computers) is evaluated as a deadly combination though.

I'm sorry, I really didn't make myself clear enough about the "problem" of 4.19 being in the mix. My statement was meant to highlight the fact that it was all three components, and particularly the presence of 4.19, that made the mix lethal. I'll also apologise for the overcolourful use of language in calling it "lethal".

CC4.19 has been a very stable platform (mostly) on which to run EAH. However it suffers from a particular nasty habit of downloading far too much work for EAH. It seems to underestimate the time to finish a WU when in fact each EAH WU always takes quite a bit longer than the estimate. It seems to assume that your machine will always be running 24/7 and it seems not to notice how many projects you might be trying to run. It's a bit like "think of plenty of WUs to at least last the distance and then double that again" as its inbuilt cacheing policy. I once tried 3 days as the cache size and ended up with about 7-8 days actual work and nearly died of shock. This is bad enough on its own but becomes worse if you support two or more projects.

Once you know about this and budget for it, it ceases to be too much of a problem. I've got one machine in particular, a P4 2.6G HT enabled with EAH/SAH 70/30, CC4.19 and a 0.5 day cache, that has been running unattended for the last 18 days while I've been out of the country and it never missed a beat. It had a heap of failed SAH uploads, due to the problems at SAH, which have now cleared and been turned into a heap of pendings awaiting the current validation backlog. It was running an optimised seti app and it completed a bunch of work for both projects. It doesn't seem to have run out of seti work at any stage but even if it did then EAH would have taken up the slack.

I've just had a good, close look at what that 0.5 days actually buys me. I'll give you the figures per CPU and as estimated hours/actual hours.

For EAH I have on hand 30 hours/40 hours for each of the two cpus.
For SAH I have on hand 11 hours/03 hours for each of the two cpus.
(Notice the effect of the optimised app in reducing the SAH crunch time).

So in actual crunching time, my 0.5 day cache actually gives me the best part of a full two days work (43 hours). It's little wonder that a 2+ day cache is likely to run you into deadline problems.


BarryAZ
BarryAZ
Joined: 8 May 05
Posts: 190
Credit: 320740540
RAC: 9548

RE: RE: OK -- so what I

Message 14266 in response to message 14261

Quote:
Quote:

OK -- so what I can do simply is reduce the cache on the Einstein side of things to reduce the tendency of the 4.19 client to over estimate work -- or rather, more readily 'correct' for the overestimation by only allowing for 12 hours backlog and so be able to automatically adjust twice a day. I can look into that. I was wondering though if the setting indicating only say 21 of 24 hours would do that as well.

With Seti, because of the way it downloads work, I can manually adjust for the 'too much work, too little time' problem without resetting -- I manually go in to the project and delete work to do's that are not going to complete in time. With Einstein you can't do that as work is downloaded as 'batches' of workunits that are not differentiated inside the project directory.

CC4.19 has been a very stable platform (mostly) on which to run EAH. However it suffers from a particular nasty habit of downloading far too much work for EAH. It seems to underestimate the time to finish a WU when in fact each EAH WU always takes quite a bit longer than the estimate. It seems to assume that your machine will always be running 24/7 and it seems not to notice how many projects you might be trying to run. It's a bit like "think of plenty of WUs to at least last the distance and then double that again" as its inbuilt cacheing policy. I once tried 3 days as the cache size and ended up with about 7-8 days actual work and nearly died of shock. This is bad enough on its own but becomes worse if you support two or more projects.

Once you know about this and budget for it, it ceases to be too much of a problem. I've got one machine in particular, a P4 2.6G HT enabled with EAH/SAH 70/30, CC4.19 and a 0.5 day cache, that has been running unattended for the last 18 days while I've been out of the country and it never missed a beat. It had a heap of failed SAH uploads, due to the problems at SAH, which have now cleared and been turned into a heap of pendings awaiting the current validation backlog. It was running an optimised seti app and it completed a bunch of work for both projects. It doesn't seem to have run out of seti work at any stage but even if it did then EAH would have taken up the slack.


BarryAZ
BarryAZ
Joined: 8 May 05
Posts: 190
Credit: 320740540
RAC: 9548

That's good to know -- I

Message 14267 in response to message 14262

That's good to know -- I expect that in the next month I will jump to a newer client -- though frankly, the very large number of .01 updates running loose has made me concerned as to which is actually a good choice. There is an awful lot of tweaking going on in various client versions -- and since I'm involved in two projects (plus a little toe in the water on the Climate project) I've also got to juggle the various version tweaks relative to the projects I'm running.

The client versionitis thing has the scent of an alpha rather than beta activity, let alone a production activity.

Quote:
Quote:
It would be nice to have a client setting (specific to EACH project running on the specific workstation) which would allow a tweak -- 'I'm only 90% as efficient as your benchmarks think I am'.

This is coming in the next client. Look for references to 'duration correction factor' on the boards or in the wiki for more info.


BarryAZ
BarryAZ
Joined: 8 May 05
Posts: 190
Credit: 320740540
RAC: 9548

Paul, I suspect, based on

Message 14268 in response to message 14263

Paul, I suspect, based on input here I might try the 4.45 client. I have fouor configurations running Pentium III's, but aside from that, my quite large farm is dominated by P4's running 2.4G and up, or XP's running XP2400 and up, so my environment is reasonably close to yours.

About the only thing that gives me pause is the discussion of a 5x client -- I suppose simply bumping up to 4.45 in the next week or so will bring me 'into the mainstream' though.

Quote:


@All,

I use 4.45; and do sometimes do an unscheduled "update" to flush pendings and it work very well for me. Caviat, I have reletively fast computers (slowest is 2.8 GHz). My cache is 3 days across all 5 production projects (only 3 on the PowerMac can't do LHC@Home as there is no application and CPDN models crash). I have no WU that get lost of over-run the deadline. Like others Einstein@Home's estimate is about 1 hour too low on all machines. SETI@Home is 2* too high.

This does give me a large population. BOINC View is telling me for my 8 computers I have 151 results pending or in-flight (2 ready to report).


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.