Verification backlog

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1,364
Credit: 3,562,358,667
RAC: 0

Gary Roberts wrote: Either

Gary Roberts wrote:

Either the estimates need to be fixed, or just one of the GPU searches needs to be automatically on by default.

If the estimates can't be fixed (I don't understand why they couldn't be) then a note warning about 2GB GPUs for the GW search should also include a warning that a volunteer can expect trouble if both GPU searches are enabled with a work cache setting above some minimal value.

 

The numbers can be tweaked; but my understanding is that just like there's only 1 DCF on the client with the server software version E@H uses there's only a single scaling factor on the server side for CPU and GPU tasks combined.  That makes it unfixable because eg my two i7-47xx boxes have similar CPU speeds but the one with a 3070 and the one with a 1070 need an ~2x difference in the CPU-GPU scaling factor to stabilize on the same DCF for both project types.

I've been told that newer versions of the software do have the ability to address the CPU vs GPU problem; but in addition to being tied to the unpopular Credit New changes from some years back; they'd require E@H to upgrade it's heavily customized server software to a newer release version.  While locality scheduling - the main reason why E@H hacked their server code up so extensively many years ago - has been added to the core server code; at this point I'm doubtful that they ever will unless some feature that they need and can't backport is added, or a change in IT management results in a policy to require all software running on the network to be a current version forcing the issue.

What can and should be done IMO to mitigate the problem is that periodically (maybe once a year) E@H should adjust the scaling factors so that the average CPU/GPU combination ends up with all applications having the same DCF.  That at least would even out the pain of trying to have a cache of more than a few tasks and mean that users with a typical setup will have things work more or less the way they're supposed to instead of the current situation where the handful of people using decade old GPUs are getting something close to what's expected while people who upgrade their GPUs semi-regularly are having increasingly awful experiences; increasingly trending towards not being able to run the CPU and GPU with the same client installation any longer.

 

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3,833
Credit: 38,737,781,826
RAC: 64,018,473

the DCF initially gets

the DCF initially gets calculated from the flops estimate, and then slowly adjusts as it gets an average for actual run time from processed tasks.

 

if the flops estimate was correct for GW, the DCF would be caluculated more closely from the beginning and you wouldnt get these wild swings when flipping between GR/GW or running both at the same time.

currently, the GR GPU tasks are tagged with a 525,000 flops estimate, whereas the GW GPU are tagged with 144,000 flops estimate. this is what drives the DCF, and makes your system think that GW tasks are 3.6x faster than GR, when in reality then run about the same speed, or only slightly faster.

 

truly just increasing the flops estimate for GW would solve MANY problems. it's an easy fix server side. the project admins can key in any value they like. they just need to actually adjust it.

_________________________________________________________________________

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,142
Credit: 2,860,024,621
RAC: 1,804,572

The trouble is, DCF was

The trouble is, DCF was designed and coded in the days when only CPUs could be used in BOINC. For any given host, all the CPU cores run at the same speed: even multi-CPU servers require that the CPUs are matched.

GPUs come in all shapes and sizes. Their speeds are not even a fixed multiple of the host CPU speed, and multiple GPUs of different speeds can be run in the same host. Only the speed of the 'best' GPU of a given class is reported to the server.

With that much variability, it is simply impossible to make a single flops value per task type to cover all bases, and stabilise DCF for all. Hence, BOINC was moved to a different system. I don't happen to like the replacement system: the presenting problem could have been handled differently. A multivalued DCF solution was coded for the client, but never extended to the server. The proposal was rejected, and not distributed in any public build.

It would be nice if Einstein could adopt a solution - any solution - which controlled these wild fluctuations.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3,833
Credit: 38,737,781,826
RAC: 64,018,473

on a system by system basis,

on a system by system basis, the flops estimate would be sufficient. DCF is calculated for each system individually. and within that system, flops estimates should scale well between task types. there wont be a large variance between GR/GW if flops were normalized. it would certainly still be more beneficial to at least TRY to normalize them, as the driving issue now is the very large difference between GR and GW flops, more than any system-specific variance would be. you might not be able to get it "perfect" for every system, but they can certainly get it much closer than it currently is.

if they moved GW GPU flops from say 144,000 to 420,000 (80% of GR), that would be a huge help.

_________________________________________________________________________

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,142
Credit: 2,860,024,621
RAC: 1,804,572

A similar variation occurs

A similar variation occurs between NVidia GPU apps and BRP4 apps on intel_gpu.

lohphat
lohphat
Joined: 20 Feb 05
Posts: 29
Credit: 80,718,299
RAC: 12,991

Richard Haselgrove

Richard Haselgrove wrote:

The trouble is, DCF was designed and coded in the days when only CPUs could be used in BOINC. For any given host, all the CPU cores run at the same speed: even multi-CPU servers require that the CPUs are matched.

 

However, newer Ryzen CPUs have a "gaming mode" where one core is allowed to run at a faster clock speed for optimization. I personally don't use this mode, but it does exist.

lohphat
lohphat
Joined: 20 Feb 05
Posts: 29
Credit: 80,718,299
RAC: 12,991

What I don't understand is

What I don't understand is that I'm also seeing results on shared WUs where my 980 Ti is doing work much faster than a 1070 board.  I'll have to go back and look at the card spec deltas but this is unexpected at first blush.

mikey
mikey
Joined: 22 Jan 05
Posts: 12,260
Credit: 1,838,178,424
RAC: 10,307

lohphat wrote: What I don't

lohphat wrote:

What I don't understand is that I'm also seeing results on shared WUs where my 980 Ti is doing work much faster than a 1070 board.  I'll have to go back and look at the card spec deltas but this is unexpected at first blush.

Older Nvidia cards had better dual precision percentages than newer Nvidia cards

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 3,945
Credit: 3,000,478,537
RAC: 626,589

mikey wrote: lohphat

mikey wrote:

lohphat wrote:

What I don't understand is that I'm also seeing results on shared WUs where my 980 Ti is doing work much faster than a 1070 board.  I'll have to go back and look at the card spec deltas but this is unexpected at first blush.

Older Nvidia cards had better dual precision percentages than newer Nvidia cards

It might also be that the other computer is doing more than one task at a time.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3,833
Credit: 38,737,781,826
RAC: 64,018,473

mikey wrote:lohphat

mikey wrote:

lohphat wrote:

What I don't understand is that I'm also seeing results on shared WUs where my 980 Ti is doing work much faster than a 1070 board.  I'll have to go back and look at the card spec deltas but this is unexpected at first blush.

Older Nvidia cards had better dual precision percentages than newer Nvidia cards

that's not the case with these models. the 1070 has higher DP performance than a 980ti. both with the same 1:32 ratio to their FP32 performance.

1070 - 202 GFLOPS

980ti - 189.4 GFLOPS

 

since GW tasks can be very CPU limited, it's more likely that the 1070 wingman has a weak or overcommited CPU leading to reduced performance. or is running multiples as mentioned. honestly there could be several reasons, but DP isn't one of them.

_________________________________________________________________________

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.