Verification backlog

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3562358667

RAC: 0

Gary Roberts wrote: Either

7 Apr 2021 11:12:51 UTC

Message 184760 in response to message 184754

(moderation:

)

Gary Roberts wrote:

Either the estimates need to be fixed, or just one of the GPU searches needs to be automatically on by default.

If the estimates can't be fixed (I don't understand why they couldn't be) then a note warning about 2GB GPUs for the GW search should also include a warning that a volunteer can expect trouble if both GPU searches are enabled with a work cache setting above some minimal value.

The numbers can be tweaked; but my understanding is that just like there's only 1 DCF on the client with the server software version E@H uses there's only a single scaling factor on the server side for CPU and GPU tasks combined. That makes it unfixable because eg my two i7-47xx boxes have similar CPU speeds but the one with a 3070 and the one with a 1070 need an ~2x difference in the CPU-GPU scaling factor to stabilize on the same DCF for both project types.

I've been told that newer versions of the software do have the ability to address the CPU vs GPU problem; but in addition to being tied to the unpopular Credit New changes from some years back; they'd require E@H to upgrade it's heavily customized server software to a newer release version. While locality scheduling - the main reason why E@H hacked their server code up so extensively many years ago - has been added to the core server code; at this point I'm doubtful that they ever will unless some feature that they need and can't backport is added, or a change in IT management results in a policy to require all software running on the network to be a current version forcing the issue.

What can and should be done IMO to mitigate the problem is that periodically (maybe once a year) E@H should adjust the scaling factors so that the average CPU/GPU combination ends up with all applications having the same DCF. That at least would even out the pain of trying to have a cache of more than a few tasks and mean that users with a typical setup will have things work more or less the way they're supposed to instead of the current situation where the handful of people using decade old GPUs are getting something close to what's expected while people who upgrade their GPUs semi-regularly are having increasingly awful experiences; increasingly trending towards not being able to run the CPU and GPU with the same client installation any longer.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3953

Credit: 46828732642

RAC: 64311117

the DCF initially gets

7 Apr 2021 13:57:51 UTC

Message 184763 in response to message 184760

(moderation:

)

the DCF initially gets calculated from the flops estimate, and then slowly adjusts as it gets an average for actual run time from processed tasks.

if the flops estimate was correct for GW, the DCF would be caluculated more closely from the beginning and you wouldnt get these wild swings when flipping between GR/GW or running both at the same time.

currently, the GR GPU tasks are tagged with a 525,000 flops estimate, whereas the GW GPU are tagged with 144,000 flops estimate. this is what drives the DCF, and makes your system think that GW tasks are 3.6x faster than GR, when in reality then run about the same speed, or only slightly faster.

truly just increasing the flops estimate for GW would solve MANY problems. it's an easy fix server side. the project admins can key in any value they like. they just need to actually adjust it.

_________________________________________________________________________

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2957116370

RAC: 716840

The trouble is, DCF was

7 Apr 2021 14:21:34 UTC

Message 184764

(moderation:

)

The trouble is, DCF was designed and coded in the days when only CPUs could be used in BOINC. For any given host, all the CPU cores run at the same speed: even multi-CPU servers require that the CPUs are matched.

GPUs come in all shapes and sizes. Their speeds are not even a fixed multiple of the host CPU speed, and multiple GPUs of different speeds can be run in the same host. Only the speed of the 'best' GPU of a given class is reported to the server.

With that much variability, it is simply impossible to make a single flops value per task type to cover all bases, and stabilise DCF for all. Hence, BOINC was moved to a different system. I don't happen to like the replacement system: the presenting problem could have been handled differently. A multivalued DCF solution was coded for the client, but never extended to the server. The proposal was rejected, and not distributed in any public build.

It would be nice if Einstein could adopt a solution - any solution - which controlled these wild fluctuations.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3953

Credit: 46828732642

RAC: 64311117

on a system by system basis,

7 Apr 2021 14:56:27 UTC

Message 184765 in response to message 184764

(moderation:

)

on a system by system basis, the flops estimate would be sufficient. DCF is calculated for each system individually. and within that system, flops estimates should scale well between task types. there wont be a large variance between GR/GW if flops were normalized. it would certainly still be more beneficial to at least TRY to normalize them, as the driving issue now is the very large difference between GR and GW flops, more than any system-specific variance would be. you might not be able to get it "perfect" for every system, but they can certainly get it much closer than it currently is.

if they moved GW GPU flops from say 144,000 to 420,000 (80% of GR), that would be a huge help.

_________________________________________________________________________

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2957116370

RAC: 716840

A similar variation occurs

7 Apr 2021 14:53:59 UTC

Message 184766

(moderation:

)

A similar variation occurs between NVidia GPU apps and BRP4 apps on intel_gpu.

lohphat

Joined: 20 Feb 05

Posts: 29

Credit: 87274897

RAC: 24849

Richard Haselgrove

9 Apr 2021 0:58:46 UTC

Message 184804 in response to message 184764

(moderation:

)

Richard Haselgrove wrote:

The trouble is, DCF was designed and coded in the days when only CPUs could be used in BOINC. For any given host, all the CPU cores run at the same speed: even multi-CPU servers require that the CPUs are matched.

However, newer Ryzen CPUs have a "gaming mode" where one core is allowed to run at a faster clock speed for optimization. I personally don't use this mode, but it does exist.

lohphat

Joined: 20 Feb 05

Posts: 29

Credit: 87274897

RAC: 24849

What I don't understand is

9 Apr 2021 1:02:10 UTC

Message 184805

(moderation:

)

What I don't understand is that I'm also seeing results on shared WUs where my 980 Ti is doing work much faster than a 1070 board. I'll have to go back and look at the card spec deltas but this is unexpected at first blush.

mikey

Joined: 22 Jan 05

Posts: 12682

Credit: 1839088786

RAC: 3841

lohphat wrote: What I don't

9 Apr 2021 3:11:19 UTC

Message 184807 in response to message 184805

(moderation:

)

lohphat wrote:

What I don't understand is that I'm also seeing results on shared WUs where my 980 Ti is doing work much faster than a 1070 board. I'll have to go back and look at the card spec deltas but this is unexpected at first blush.

Older Nvidia cards had better dual precision percentages than newer Nvidia cards

Harri Liljeroos

Joined: 10 Dec 05

Posts: 4344

Credit: 3207548437

RAC: 1986764

mikey wrote: lohphat

9 Apr 2021 7:55:23 UTC

Message 184813 in response to message 184807

(moderation:

)

mikey wrote:

lohphat wrote:

What I don't understand is that I'm also seeing results on shared WUs where my 980 Ti is doing work much faster than a 1070 board. I'll have to go back and look at the card spec deltas but this is unexpected at first blush.

Older Nvidia cards had better dual precision percentages than newer Nvidia cards

It might also be that the other computer is doing more than one task at a time.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3953

Credit: 46828732642

RAC: 64311117

mikey wrote:lohphat

9 Apr 2021 11:28:10 UTC

Message 184815 in response to message 184807

(moderation:

)

mikey wrote:

lohphat wrote:

What I don't understand is that I'm also seeing results on shared WUs where my 980 Ti is doing work much faster than a 1070 board. I'll have to go back and look at the card spec deltas but this is unexpected at first blush.

Older Nvidia cards had better dual precision percentages than newer Nvidia cards

that's not the case with these models. the 1070 has higher DP performance than a 980ti. both with the same 1:32 ratio to their FP32 performance.

1070 - 202 GFLOPS

980ti - 189.4 GFLOPS

since GW tasks can be very CPU limited, it's more likely that the 1070 wingman has a weak or overcommited CPU leading to reduced performance. or is running multiples as mentioned. honestly there could be several reasons, but DP isn't one of them.

_________________________________________________________________________

Verification backlog

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner