highFreq or lowFreq

MarkHNC

Joined: 31 Aug 12

Posts: 37

Credit: 170965842

RAC: 0

I have a Xeon whose CPU in

25 May 2017 1:30:35 UTC

Message 158263

(moderation:

)

I have a Xeon whose CPU in the profile on the Einstein website reads "GenuineIntel Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz [Family 6 Model 45 Stepping 7]", which I know as an E5-2670v1. CPU-Z reports it running a hair shy of 3GHz. It has 32GB 1600MHz DDR3 ECC CL11 RAM. Oddly, although the box runs Windows 10 Pro, it is listed as Windows 8.1 Pro on Einstein. Hyperthreading is on. The challenge for me is that it runs both WCG and Einstein, including 2x FGRP on a GTX 960 SSC GPU (except when it is in panic mode, like it has been for the past couple of days). To get Einstein to get a decent amount of CPU work, I had to change the resource share to Einstein=400 (80%) and WCG=100 (20%).

BOINC has only this evening started getting its estimates right about the runtime of these tuning units. This is the only one of my machines that is getting the "Hi" units. The units are generally taking between 65,000 seconds and 67,500 seconds to complete. However, I ended up aborting a bunch of work units that it wouldn't get done because it had downloaded a bunch using the bad estimate (as recently as this morning, it was still estimating ~13 hours when the units were consistently taking ~18-19 hours).

Is the client version in any way responsible for the estimate taking so long to "settle?" I'm using the WCG BOINC client, which is 7.2.47. I assume that the hyperthreading is why it is taking about double the expected range that Christian referenced?

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117693009170

RAC: 35071972

MarkHNC wrote:Is the client

25 May 2017 3:41:38 UTC

Message 158265 in response to message 158263

(moderation:

)

MarkHNC wrote:

Is the client version in any way responsible for the estimate taking so long to "settle?"

It's not the client version. It's because you have Einstein CPU tasks that take longer then the estimate and GPU tasks that take less then their estimate. At Einstein, the client uses duration correction factor (DCF) to refine estimates. Because GPU tasks tend to finish more quickly than their estimates, the client will reduce the DCF in an attempt to compensate, every time a GPU task finishes. The reverse happens every time a CPU task finishes. The estimates for either task type can never 'settle' because they are continually being 'pulled' in different directions. This is because there is only one DCF which all searches have to use.

At the moment you have a bunch more CPU tasks that have already expired and even more that are quite close to expiry. The easiest way to help your BOINC client deal with the situation is to reduce your work cache setting to a very low value (I'd be using 0.1 days) until the backlog is cleared. You are going to have to abort more CPU tasks - those already past deadline and more that are close to deadline. If you don't drastically reduce your cache setting, BOINC will just keep requesting replacements for aborted tasks and the problem will continue. Part of the problem is the current 5 day deadline for GW tasks. When the 'non-tuning' full run gets underway the deadline will be 14 days. However you will still need to keep a sensible cache size for the mixture of CPU/GPU tasks.

Cheers,
Gary.

Sid

Joined: 17 Oct 10

Posts: 164

Credit: 971876520

RAC: 419806

archae86 wrote:Nick_43

25 May 2017 8:30:35 UTC

Message 158266 in response to message 158232

(moderation:

)

archae86 wrote:

Nick_43 wrote:
I would think my 8MB of cache would be considerable, maybe it isn't these days...

That's a Nehalem EP which was quite a fearsome chip when new. It is no longer new, nor even middle-aged. I've just retired my system which was running a Xeon E5620 Westmere, which is a quick redraft of Nehalem on the next generation manufacturing process (32 nm down from 45 nm for yours).

L5640 that I am using might be a bit ancient processor but having 12 virtual cores it can win by weight of numbers rather than by skill. 24 virtual cores in two sockets motherboard might be very cheap and relatively efficient solution so I'm a bit reluctant to retire it.

MarkHNC

Joined: 31 Aug 12

Posts: 37

Credit: 170965842

RAC: 0

Gary Roberts wrote:MarkHNC

25 May 2017 9:55:37 UTC

Message 158268 in response to message 158265

(moderation:

)

Gary Roberts wrote:

MarkHNC wrote:
Is the client version in any way responsible for the estimate taking so long to "settle?"

The easiest way to help your BOINC client deal with the situation is to reduce your work cache setting to a very low value (I'd be using 0.1 days) until the backlog is cleared.

I have minimum work buffer of 0.50 days and max additional work buffer of 1.0 days. I assumed I would have to do so in order to receive larger/longer running work units (both here and at WCG). Wouldn't a tiny buffer prevent me from getting longer running units? It doesn't appear to have downloaded any new work units since it went into panic mode.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7226118262

RAC: 1067060

MarkHNC wrote:Wouldn't a tiny

25 May 2017 12:48:14 UTC

Message 158274 in response to message 158268

(moderation:

)

MarkHNC wrote:

Wouldn't a tiny buffer prevent me from getting longer running units?

No it would not prevent you getting work.

With current deadlines, task types, and the duration estimating characteristics of the boinc client, people who want to mix this particular new CPU work with GPU work on Einstein need either to set a very short queue length, or to be prepared to spend a lot of time manually adjusting back and forth in order effectively to use a different queue length setting for GPU tasks and for CPU tasks.

Somewhat to my unpleasant surprise, the current boinc client seems to trip into high priority mode well over a day before the task deadline with a CPU task load the machine could easily finish on time. This "feature" combined with the already known DCF mismatch problem between work types really drives down the acceptable queue length.

At the moment I am making a once per day CPU task fetch procedure in which I first adjust down the queue length to 0.2 days on one machine and 0.3 days on another machine, and only then enable CPU task fetch. When new work has come on board, I first disable CPU task fetch again, then raise my queue length to my desired two days. The specific numbers are particular to my specific machines, and I agree with Gary that for a first shot 0.1 days is a really good idea, especially if you want a set-and-forget situation.

This will all get somewhat better when the project raises the deadline for these CPU tasks from the current short 5 days, but it is not going to get really good unless a brave new BOINC really handles a mix of different performing work types much better regarding run time estimation. Don't hold your breath for that one.

MarkJ

Joined: 28 Feb 08

Posts: 437

Credit: 139002861

RAC: 0

Ryzen 1700 at stock clocks

25 May 2017 14:08:28 UTC

Message 158280

(moderation:

)

Ryzen 1700 at stock clocks getting Lo only. Current estimate 5+ hours. Will see how they go. Hopefully they can get some stats out of the tuning run.

its this host

BOINC blog

Nick

Joined: 12 Oct 13

Posts: 27

Credit: 8949649

RAC: 0

AGENTBI am running tasks

25 May 2017 16:13:27 UTC

Message 158289

(moderation:

)

AGENTB

The X5570 also does not support AVX, so that may also be a factor.

I am running tasks that are labeled AVX.

Nick

Joined: 12 Oct 13

Posts: 27

Credit: 8949649

RAC: 0

Regarding DCF Is this why

25 May 2017 16:17:58 UTC

Message 158291

(moderation:

)

Regarding DCF

Is this why after all these years, BONIC still can't figure out that after spending 4 hours completing 75% of a task it thinks it still needs 12 more hours to complete the remaining 25%? This has annoyed me for years wondering why it's never been fixed.

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

Nick_43 wrote:Regarding

25 May 2017 17:52:04 UTC

Message 158293 in response to message 158291

(moderation:

)

Nick_43 wrote:

Regarding DCF

Is this why after all these years, BONIC still can't figure out that after spending 4 hours completing 75% of a task it thinks it still needs 12 more hours to complete the remaining 25%? This has annoyed me for years wondering why it's never been fixed.

You can use the <fraction_done_exact/> option in an app_config.xml file to fix this, as Retvari Zoltan explains on GPUGrid.

https://einsteinathome.org/comment/reply/207795/158291?quote=1#comment-form

I don't know why this is not the default. The BOINC scheduler is beyond understanding.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7226118262

RAC: 1067060

Jim1348 wrote:You can use the

25 May 2017 18:55:28 UTC

Message 158296 in response to message 158293

(moderation:

)

Jim1348 wrote:

You can use the <fraction_done_exact/> option in an app_config.xml file

<snip>

I don't know why this is not the default. The BOINC scheduler is beyond understanding.

I'm relying on Retvari Zoltan's description. If true, with this option selected the time-to-go estimate displayed by BOINC completely ignores work content input provided by the project with a work unit, and also ignores any locally acquired history of behavior information, instead relying purely on the current fraction complete as provided by the application and time spent so far, as logged by BOINC. So the name of the option is deeply misleading.

The degree to which this is better or worse than the default will be strongly application and configuration dependent. A given user may like it better or worse, depending on their personal weighting of accuracy in different situations.

If you are working with applications which give excellent fraction complete estimates in all situations of interest to you, this option may very well be greatly superior for you.

I, personally, run non-identical GPU units on a host. This option may well considerably aid time to completion accuracy for me, which is badly compromised both by DCF wander depending on which GPU has recently reported work, and by inaccurate relative speed by GPU estimate employed, and I think I'll give it a try.

But I doubt this influences the calculation done to estimate work in queue, so I doubt it will help the CPU/CPU mismatch fetching problems.

highFreq or lowFreq

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner