About lack of time to finish workunits until deadline

Annika

Joined: 8 Aug 06

Posts: 720

Credit: 494410

RAC: 0

Alinator, how on earth do you

7 Dec 2007 0:50:07 UTC

Message 75809

(moderation:

)

Alinator, how on earth do you manage 90k secs on a 500 MHz Katmai? You're making me kinda jealous ;-) my (or rather my Dad's, but it runs on my account) 667 MHz Coppermine takes 170 k secs, and iirc Coppermines should have quite a performance advantage compared to Katmais, shouldn't they? Weren't Katmais more or less "tuned-up P2s"? Gary, your figures fit in better with my experiences.

Brian Silvers

Joined: 26 Aug 05

Posts: 772

Credit: 282700

RAC: 0

RE: Here are some

7 Dec 2007 0:50:42 UTC

Message 75810 in response to message 75808

(moderation:

)

Quote:

Here are some approximate but realistic average figures for PIII hosts:-

1. Katmai 500MHz - ~300Ksecs or ~83hrs
2. Katmai 600MHZ - ~250Ksecs or ~70hrs
3. Coppermine 866MHz - ~150Ksecs or ~42hrs
4. Coppermine 1GHz - ~140Ksecs or ~39hrs
4. Tualatin 1.4GHz - ~95Ksecs or ~26hrs

I agree that 100 hours is a bit too slow for an average PIII. It's even too slow for a PIII 450 which is the slowest PIII there is. An average PIII (either of the two coppermines in the list above) is quite suitable for EAH work as long as the box can get about 5 hours per day of actual EAH crunching.

Well, the only Pentium III in the OP's host list on BOINCstats is a Tualatin-based 1.26GHz. I still stick by a bad RDCF or a slow benchmark if it is really estimating 100 hours.

No matter the estimate, completion is key. Completion by deadline is going to depend on resource allocation and uptime. If the uptime cannot be changed, then changing the resource allocation is the only thing that can be done until such time as SSE optimization comes along.

Winterknight

Joined: 4 Jun 05

Posts: 1391

Credit: 365031630

RAC: 144684

I think some of you are

7 Dec 2007 6:14:34 UTC

Message 75811

(moderation:

)

I think some of you are missing a valid point. In post 78037 it was mentioned that a host was at work and only on during work time.

In my experience with low Einstein resource computer it frequently downloads the next unit 24hrs before it starts crunching. And as the BOINC manager tries to complete a unit one day plus switching period before deadline. Then with 14 day deadline it means in reality there is only 12 days to crunch the unit.
In the worst scenario there can be two weekends within a 12 day period leaving only eight working days to complete the unit.
With an eight hour working day on a computer that is also used for other duties, it means the total cpu time available has to be under 64 hrs, possibly quite a bit under.

Therefore if the project really wants to only use spare cpu cycles the units probably need to be under 50hrs long if the 2 week deadline is to be kept in place. And under 25 hours if we assume the we should be connected to more than one project.

Add in any other variables like public holidays, only 35hr working week, computer very rarely switched on early Mon morning because there is usually 08:45 managers meeting I have to attend (boring), out of office up to one day a week etc. etc. And you can see why units that take over 20 cpu hrs are not welcome.

Alinator

Joined: 8 May 05

Posts: 927

Credit: 9352143

RAC: 0

Hmmmm.... Let me double

7 Dec 2007 6:35:03 UTC

Message 75812

(moderation:

)

Hmmmm....

Let me double check my data here. I don't deliberately try to micromanage my machines into 'Deadman's Corner', but it's possible I've gotten there.

I was speaking from a current month-to-date query of the task database I log for all my hosts.

Suffice to say that even though BOINC has made tremendous strides in refining it's performance over a large spectrum of projects under a wide range of hosts, it's still possible to develop paradoxical scheduling scenarios.

That's the main reason I maintain that BOINC is not and has never been a "Fire and Forget" solution for Distributed Computing.

Alinator

Brian Silvers

Joined: 26 Aug 05

Posts: 772

Credit: 282700

RAC: 0

RE: I think some of you are

7 Dec 2007 7:15:47 UTC

Message 75813 in response to message 75811

(moderation:

)

Quote:

I think some of you are missing a valid point. In post 78037 it was mentioned that a host was at work and only on during work time.

Nope. Didn't miss it at all. I researched it and see that it has what appears to be a 55/45 split, with the 55% going to SETI and 45% to Einstein. That system is living proof of BOINC doing all it can to fit into the constraints it was given by the user. As others have begun to mention, 100 hours seems long for the class of system. I have suggested too high of a RDCF or a low benchmark. Perhaps metalius can go to their account and into the display for their computer and tell us what the Duration Correction Factor (down near the bottom) says for that system?

Now, if I'm right about where I think you're heading, here's the issue:

Do you want all projects to go to the idea of deadlines for the LCD (Lowest Common Denominator)? If so, what should that LCD be? Do we set it for the venerable Pentium 60 and then create workunits for every project that run in a reasonable timeframe for that processor? Extending on with this line of thought, do we assume 50% uptime or 10%? Do we assume 100% resource allocation, or 20%? Do we put in a buffer to "play it safe" in case someone has a power outage? If so, how long do we estimate an average power outage to last? What if the power outage lasts longer than our estimate? What if the system's power supply goes out? What if the video card goes out? Etc, etc, etc...

Deadlines are set here to facilitate getting the data run finished so that they can do post-processing. The only other way, IMO, to do what the project needs to do and then to conform to what you seem to want is to actually INCREASE the workunit size and go to trickles like CPDN. You then give this massive amount of work to a system that is broken up into pieces internally and let them have at it...

IMO, YMMV, etc, etc, etc...

Alinator

Joined: 8 May 05

Posts: 927

Credit: 9352143

RAC: 0

Agreed, but that doesn't mean

7 Dec 2007 7:35:57 UTC

Message 75814

(moderation:

)

Agreed, but that doesn't mean that there aren't ways to manually intervene to workaround the limitations of the algorithms currently in use.

You just have to be willing to commit to collecting the data you need to be able to override the default behaviour to meet the specific goals you are setting for your hosts when divergence appears.

Alinator

I should have said as well, that isn't always a trivial task! :-)

Brian Silvers

Joined: 26 Aug 05

Posts: 772

Credit: 282700

RAC: 0

RE: Agreed, but that

7 Dec 2007 7:43:06 UTC

Message 75815 in response to message 75814

(moderation:

)

Quote:

Agreed, but that doesn't mean that there aren't ways to manually intervene to workaround the limitations of the algorithms currently in use.

Ya know, I assume this was in response to me, but I dunno... Are you allergic to the "Reply to this post" button? :P

Yes, I know how to "work around" some "limitations", such as keeping 6 days of work on my system instead of 3 days as my preferences state, but anyway...

@ metalius:

Sorry to be so "negative" about your situation. I/we can try to help see if the estimate is incorrect (you never said how long it actually takes to complete on your Pentium III), but at most someone may be able to talk Bernd/Bruce Allen into increasing deadlines temporarily by one week. I think there is enough "proof" that this would be helpful, and the fact that the runtimes vary so widely is a supporting argument.

Alinator

Joined: 8 May 05

Posts: 927

Credit: 9352143

RAC: 0

RE: RE: Agreed, but that

7 Dec 2007 7:51:34 UTC

Message 75816 in response to message 75815

(moderation:

)

Quote:

Quote:
Agreed, but that doesn't mean that there aren't ways to manually intervene to workaround the limitations of the algorithms currently in use.

Ya know, I assume this was in response to me, but I dunno... Are you allergic to the "Reply to this post" button? :P

LOL...

OK, you got me!...

Only if I'm reasonably sure I wouldn't be the next post in order. :-D

Alinator

Joined: 8 May 05

Posts: 927

Credit: 9352143

RAC: 0

RE: Hmmmm.... Let me

7 Dec 2007 15:46:52 UTC

Message 75817 in response to message 75812

(moderation:

)

Quote:

Hmmmm....

Let me double check my data here...

Alinator

Ah yes... DUH... Alinator!!

Apparently I didn't enter the parameter to limit the query to only S5R3, so the average runtime figure for the Katmai which came back was for all of EAH! :-O

The correct answer for mine on S5R3 only so far is just under 520 Ksecs on average, which works out to about having to run it just about 10 and 1/2 hours a day to meet the 14 day deadline.

Note to self: What happened to applying a reality check to your calcs before posting them?? ;-)

Alinator

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5869

Credit: 114612016646

RAC: 35210555

RE: The correct answer for

7 Dec 2007 21:52:46 UTC

Message 75818 in response to message 75817

(moderation:

)

Quote:

The correct answer for mine on S5R3 only so far is just under 520 Ksecs on average....

There's still something wrong - you're possibly including the monster tasks from S5R2 as well.

I can assure you that this 500MHz Katmai of mine is taking around 300Ksecs for current tasks. It is running 24/7 but could theoretically make the deadline with just 6/7 operation, ie 25% duty cycle.

Back on general discussion, when assessing whether or not a particular machine will make the deadline with a particular duty cycle, it is important to fully account for the lost time caused by the size of your cache. If a task sits idle for several days before crunching starts, the effective deadline is significantly reduced. Lowering the cache size can be quite effective in allowing a slow machine to make the deadline.

Cheers,
Gary.

About lack of time to finish workunits until deadline

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner