Radeon Vega

ikeke1

Joined: 6 Oct 17

Posts: 35

Credit: 24340991

RAC: 0

I'm cleaning my 36h WU queue

21 Oct 2017 18:59:43 UTC

Message 162469

(moderation:

)

I'm cleaning my 36h WU queue i've built up (semi)unintentionally - should start 2WU crunching in around 12h time.

Mumak

Joined: 26 Feb 13

Posts: 325

Credit: 3523121392

RAC: 1565997

Why do you need to cleanup

21 Oct 2017 19:12:31 UTC

Message 162470

(moderation:

)

Why do you need to cleanup the queue? You can switch to 2 WUs via app_config.xml immediately.

-----

ikeke1

Joined: 6 Oct 17

Posts: 35

Credit: 24340991

RAC: 0

Damit.What and where do i

21 Oct 2017 23:57:56 UTC

Message 162473

(moderation:

)

Damit. I was under the impression i've to activate it via https://einsteinathome.org/account/prefs/project

What and where do i have to add?

<app_config>

<app>
<name>hsgamma_FGRPB1G</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.5</cpu_usage>
</gpu_versions>
</app>
</app_config>

# in project folder?

edit: seems to be working. (Y)

editx2: preliminary numbers

WUs concurrently	3WU	2WU	1WU
AVG power (W)	165,3	126,6	111,3
AVG gpu load (%)	91,7	68,9	57,0
PPD vs 1WU (%)	146	135	100
W vs 1WU (%)	149	114	100

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117636259521

RAC: 35184417

ikeke1

22 Oct 2017 2:56:00 UTC

Message 162480 in response to message 162473

(moderation:

)

ikeke1 wrote:

<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.5</cpu_usage>

You don't really need the max_concurrent line if your intention is to just have the appropriate number of GPU tasks running as per the gpu_usage setting. Also, you can get a fairly immediate change in concurrency without using app_config.xml at all. If you make a change in the website setting, it is communicated to the client through the downloading of new work. So, after making a website change, just temporarily increase your work cache size sufficiently to trigger a new task. An 'update' on its own is not sufficient. You have to get new work and then the change applies to all tasks on board. However, local changes always trump website changes, so once you have app_config.xml, website changes are ignored.

The project default for supporting CPU cores is one per GPU task instance. This is needed for nvidia but not to this extent for AMD. When you set it as above (and have left BOINC's %cores setting at the default 100%, and have not prevented CPU work from being sent) you would be running 2 GPU tasks to run and have 'reserved' 1 CPU core for support. For the purpose of producing your table of comparative results, I imagine you may have no CPU tasks on board so the cpu_usage would be irrelevant as all cores would be available for support anyway. If this is the case it would be useful to state it for the benefit of people looking at the results and perhaps making a wrong assumption.

ikeke1 wrote:

preliminary numbers

WUs concurrently 3WU 2WU 1WU

AVG power (W) 165,3 126,6 111,3

AVG gpu load (%) 91,7 68,9 57,0

PPD vs 1WU (%) 146 135 100

W vs 1WU (%) 149 114 100

Thank you very much for posting this. It's a nice way to see the interplay between output achieved and power used to achieve it. The following comments are observations which you may certainly already understand. There is no intention to criticise. In fact, I really hope you are intending to refine what you have called 'preliminary' results. It is potentially very useful information.

Because you have been able to produce these figures so quickly, they must be based on very limited numbers of results. Just be aware that there can be a bit of variation in crunch time from task to task so you need quite a few to get a decent average. There may be similar variations in power used from task to task as well.

Even more importantly, please understand that tasks represent the use of different parameters as applied to a particular data file. The data file (e.g. LATeah0043L.dat currently) is evident from the task name and it does change fairly frequently. A couple of days ago it was LATeah0042L.dat. At the moment there would be a number of resend tasks for the previous data file being issued. My impression is that there can be a small difference in crunch time attributable to the data file a particular result was based on.

There is also possible variation based on the frequency term. For example, a task named LATeah0042L_44.0_.... might take a different time than one named LATeah0042L_1012.0_.... Finally, at very low frequencies - 4.0, 12.0, 20.0 ... - some of the tasks run considerably faster (like 50-100% faster) than others at the same frequency. These are known as 'short ends' and there is less data to crunch. The upshot of all this is that 'short ends' should be totally excluded and remaining results averaged over a sufficient sample size to remove most of the potential variation.

Finally, when concurrent tasks are running, you should try to stagger the starting point of each instance. At the start, there is a lot of activity with loading stuff into GPU memory and at the end (%done stops at 89.997%) there is a followup stage where single precision crunching is complete and the 10 most likely candidate signals are being re-evaluated in double precision and a 'toplist' is created. It might make a bit of a difference if the initial startup and the final followup stages don't happen to coincide with each other on multiple tasks. It's reasonably easy to achieve suitable spacing between tasks and that tends to persist for quite a while once achieved :-).

I look forward to seeing the preliminary results updated once you have the chance to accumulate more data :-).

Cheers,
Gary.

ikeke1

Joined: 6 Oct 17

Posts: 35

Credit: 24340991

RAC: 0

To keep it more or less

22 Oct 2017 6:36:25 UTC

Message 162482

(moderation:

)

To keep it more or less repeatable, heres how i did it.

1WU - no app_config.xml

2WU
<gpu_usage>0.5</gpu_usage>
<cpu_usage>1</cpu_usage>

3WU
<gpu_usage>0.33</gpu_usage>
<cpu_usage>1</cpu_usage>

2WU and 3WU runs are with 45 minutes of "warmup" before it, running the same amount of WUs concurrently.

45 minute run.

Power consumption and GPU load

1WU

2WU

3WU

Fan speed

1WU

2WU

3WU

GPU and HBM2 frequency

1WU

2WU

3WU

Temperatures

1WU

2WU

3WU

ikeke1

Joined: 6 Oct 17

Posts: 35

Credit: 24340991

RAC: 0

I'm keeping it on 2WU

22 Oct 2017 15:28:36 UTC

Message 162483

(moderation:

)

I'm keeping it on 2WU concurrency for the time being, as it's most efficient at that. 3WU causes steep rise in power consumption and fanspeed with marginal improvements (compared to 2WU concurrency) in points per day.

PS! WUs were more or less from the same batch for 2/3WU concurrency test (Most were LATeah0042L_1188.* with some LATeah0042L_1172.*/LATeah0041L_1164.*), as i was still going through my previously downloaded ~12h buffer.

Stats from data

	3WU	2WU	1WU
AVG power (W)	162,4	126,6	111,3
AVG gpu load (%)	90,3	68,9	57,0
AVG fanspeed	2689,4	2128,0	1834,7
Hypothetical PPD multiplier vs 1WU	1,45	1,34	1,00
W multiplier vs 1WU	1,46	1,14	1,00

Gavin

Joined: 21 Sep 10

Posts: 191

Credit: 40644337738

RAC: 1

You're making a pretty

22 Oct 2017 16:03:45 UTC

Message 162492

(moderation:

)

You're making a pretty compelling argument for undervolting but it's worth noting that undervolting is as variable as overclocking and not all chips will respond the same way unless, as you appear to be, your lucky :-)

Would you mind sharing details of the brand/model of card you have and driver version you are using. Do you also have full system wall power consumption figures from a plug in meter?
How much effort would it be (assuming you're willing) to re-run all your testing but with 'out the box' voltages and memory clock for same card/system comparison?

My own previous expedition into undervolting was nowhere near as successful as yours and my impression from this thread is that Mumak didn't do as well either!

Both my vega64's are from Sapphire and are air cooled, one is on driver version 17.9.1 and the other is now on 17.10.1. The card on the 17.10 driver is now cooler and capable of maintaining higher boost clocks than the card on 17.9 but GPU Only power draw is still in the order of 250Watts average on each machine as reported by GPU-z (stock clocks and voltages but with +25% power limit).

Gav.

ikeke1

Joined: 6 Oct 17

Posts: 35

Credit: 24340991

RAC: 0

It's an MSI Vega64 "black"

23 Oct 2017 6:08:34 UTC

Message 162494

(moderation:

)

It's a MSI Vega64 "black" reference, I'm running 17.9.3 WHQL x64 driver on latest Win10x64 build.

Total power consumption from the wall

@GPU default - 60W idle 360W load
@GPU tweaked - 60W idle 260W load

Will do a quick 2WU run at all stock (will modify fanspeed though, it will throttle like hell otherwise) and report back.

edit: from the wall (with two modifications - fan speed 400-4900 and temperature limits of 70C and 60C) under load it's 280-300W above system idle with 2WU load.

Im using Seasonic SS-660XP2 platinum PSU.

Also, to make sure - undervolting alone wont work, you have to add power limit and lower temperatures. With undervolt only you remove the possibility of GPU die overvolt as possible limitation for clock stability (increased temperature, power consumption). If you modify fan curve and temperature limits to keep the whole package below thermal threshold (seems to be below 70C) then you get clock stability and lower power consumption. Now, if you also add power limit to the mix you give the package as a whole everything it needs to keep optimal clock speed at optimal temperature with minimal power consumption.

With power state undervolt + power limit increase + thermal limits decrease + fan curve changes you literally get a whole new Vega cake ;)

Or at least thats what seems to be happening.

Take into account that it's a bit like feeding numbers into black box (AMDs gatekeeper inside the GPU juggles all these limitations to generate the best possible outcome it can depending on GPU physical die, memory, thermal, power etc parts of the equasion) - you just have to try n+1 number of times to reach equilibrium, until something changes and you have to start again :)

ikeke1

Joined: 6 Oct 17

Posts: 35

Credit: 24340991

RAC: 0

45 minute comparision between

23 Oct 2017 5:50:26 UTC

Message 162496

(moderation:

)

45 minute comparision between "stock" (default mhz/voltage with fan 400-4900 and temperature 70C/60C) and GPU undervolted+50% power limit increase and memory overclock+undervolt.

2WU concurrent crunching.

Mumak

Joined: 26 Feb 13

Posts: 325

Credit: 3523121392

RAC: 1565997

Thanks! Can you please post a

23 Oct 2017 9:12:10 UTC

Message 162503

(moderation:

)

Thanks! Can you please post a similar comparison graph for GPU voltage ?

-----

Radeon Vega

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner