Ryzen SMT & GPU computing time

Ouiche
Ouiche
Joined: 13 May 17
Posts: 7
Credit: 22584610
RAC: 0
Topic 207758

Hi,

I started to crunch einstein@home a few days ago (gpu & cpu), tested a bunch of settings for my ryzen and i noticed the following with SMT (amd's hyperthreading) :

-With SMT on, a GPU task take approx 810-830s.
-With SMT off, a GPU task take approx 700-730s.
(All current tasks are "Gamma-ray pulsar binary search #1 on GPUs v1.20 (FGRPopencl1K-nvidia) windows_x86_64")

-With SMT on, a CPU task take approx 24000-26000sec (~3000 per task on average with 6 physical core available)
-With SMT off, a CPU task take approx 18000 sec (~2100 per task on average with 12 logical core available)
(All current tasks are "Continuous Gravitational Wave search Galactic Center Tuning lowFreq v1.02 (AVX) windows_x86_64")

I cap the CPU at 80% to keep the machine responsive as i work on it at the same time, and only use 1 task for the gpu for the same reason. There is little variation in the time required for tasks to be finished, but the results are replicable. Note that i'm running win10x64 without any kind of optimization.

With SMT on, i got ~50% more task done on the cpu (not bad!), but lose ~12% on the gpu. At the current credit rating (3450cr for a GPU task and 1000cr for a cpu one) :

-SMT off would get 120 GPU and 28 CPU tasks done per day on average : ~442k cr
-SMT on would get 104 GPU and 41 CPU tasks done per day on average : ~393k cr

The difference is not huge, but i got only one gpu. With a multi gpu cruncher running cpu & gpu tasks at the same time (eg : 4x1070 or 1080), could it be possible to see a bigger credit difference with SMT or hyperthreading off?(Losing ~12% on each card could quickly end up in millions cr lost, the points gained from the extra CPU tasks done could not close the gap). Are intel cpu reacting the same way?

Credits apart, are the GPU tasks more important than the CPU one?

Note : i'm a beginner on boinc, if you spot something wrong or some flawed logic feel free to correct me, i'll be happy to learn.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

Very interesting.  At the

Very interesting.  At the moment, the Gravity Wave work is being done only on the CPUs, and that is the big gun here.  The GPU work is nice also, but I expect less of a big deal.  But I have been thinking that a 16 (real) core Ryzen machine might be nice, instead of trying to contend with 32 virtual cores, if the price is reasonable and I can dissipate the power.  I might then dispense with the GPU entirely.

Filipe
Filipe
Joined: 10 Mar 05
Posts: 186
Credit: 408633011
RAC: 322734

@OUICHEThe actual GPU work

@OUICHE

The actual GPU work at einstein requires 1 CPU core for each GPU task to be faster.

All the work, either for CPU or GPU is worthy to do. For FGRP units, the science done is the same, simply much faster processed by GPU.

 

You can increase your daily output with this:

-Let SMT on but, set up your boing as it use at most 75% of Your CPU's (9 logical core available for CPU tasks), 100% of time.

- That let you 3 logical core available to support your single GPU work unit, without it to slow.

 

Try this configuration and tell us how much it improve your daily credit.

 

Also having 3 logical cores available, i don't thing you will need to cap the CPU to 80% anymore

 

 

 

Ouiche
Ouiche
Joined: 13 May 17
Posts: 7
Credit: 22584610
RAC: 0

@FILIPEThe cpu time is

@FILIPE

The cpu time is already set up to 100%, and i limited boinc to use 80% of the cpu (i did not explained that correctly, it show "maxCPUs used: 12" and no cpu time limit in the event log). There is 4 logical core left (well, 80% of 16 is not 12 so... at least 3 virtual core). The cpu have some power left ( i'm doing desktop tasks most of the time, nothing power hungry)

I'll change to 75% and see how it goes for a few days.

The performance loss may be caused by the GPU using a virtual core on a physical core also used by another virtual core crunching something, but i don't know. I'll try to clear two virtual core with process lasso and force the GPU task on them later in the week, it should be able to tell me if the performance loss is caused by a virtual core, or if there is some kind of architecture bottleneck.

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4981
Credit: 18799304569
RAC: 7840094

I only crunch SETI CPU tasks.

I only crunch SETI CPU tasks.  I use the dual GTX 1070's for SETI, Einstein and MilkyWay.  I find that setting affinity for SETI CPU to physical cores is the best for Ryzen. I run with SMT on and let the HT cores support the GPU tasks at 1 HT core per GPU task.  I do see an increase in compute times for CPU tasks that occasionally get run on a HT core.

 

Ouiche
Ouiche
Joined: 13 May 17
Posts: 7
Credit: 22584610
RAC: 0

I unloaded a physical core

I unloaded a physical core with process lasso by preventing all software from using core 15&16, then forced the GPU project on core 15&16... and it seems to mitigate the loss, i get GPU tasks in the 750-760s with SMT activated and 11 CPU task running. (same batch)

It seems to cut a minute of calculation per GPU task and i don't have to change the number of core used by boinc. Points wise i don't think it's going to change much compared to SMT deactivated, i'll lose a bunch of GPU project a day  (theoretically 7, assuming all tasks take the exact same time to be crunched... which is not the case) without having to sacrifice CPU projects.

The small performance loss compared with SMT deactivated is to be expected, even with a physical core fully dedicated to the task, cache and bandwidth have to be share by 16 core instead of 8.

 

Ouiche wrote:

-With SMT on, a CPU task take approx 24000-26000sec (~3000 per task on average with 6 physical core available)
-With SMT off, a CPU task take approx 18000 sec (~2100 per task on average with 12 logical core available)
(All current tasks are "Continuous Gravitational Wave search Galactic Center Tuning lowFreq v1.02 (AVX) windows_x86_64"

I just realized i inverted two things while writing the post and i can't edit, i should sleep more! The logic and the rest of the post stand :

-41 CPU tasks a day with SMT on and 12 logical core available (an average of a task every 2100sec)
-28.8 CPU tasks a day with SMT off and 6 physical core available (an average of a task every 3000 sec)

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3441636540
RAC: 4163240

I'm wondering how much of the

I'm wondering how much of the GPU time difference due to SMT on/off is after the 90% completion mark when GPU load is 0% and its all CPU.

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

What version AGESA have you

What version AGESA have you got on it? I understand they're beta testing 1.0.0.6. The Asus site says their X370-Pro latest BIOS has 1.0.0.4a. So basically they are fine tuning the microcode.

Update: Apparently 1.0.0.6 is focused on supporting memory speeds above 2666Mhz, but you never know what else they might have fixed. There is no mention of what 1.0.0.5 has/had.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4981
Credit: 18799304569
RAC: 7840094

We Prime X370 users are still

We Prime X370 users are still waiting for the update.  ASUS actually skipped AGESA 1.0.0.5 and went straight to 1.0.0.6.  The CH6 users got AGESA 1.0.0.6 Beta BIOS' 9943 and 9945 on Friday.  Most everyone trying it this weekend has been real happy.  If the pattern follows the last update, the Prime should get an updated BIOS in about a week.

 

Ouiche
Ouiche
Joined: 13 May 17
Posts: 7
Credit: 22584610
RAC: 0

MarkJ wrote:What version

MarkJ wrote:

What version AGESA have you got on it? I understand they're beta testing 1.0.0.6. The Asus site says their X370-Pro latest BIOS has 1.0.0.4a. So basically they are fine tuning the microcode.

Update: Apparently 1.0.0.6 is focused on supporting memory speeds above 2666Mhz, but you never know what else they might have fixed. There is no mention of what 1.0.0.5 has/had.

Got the 1.0.0.4a bios, memory stuck at 2666mhz (it's a dual rank kit). It's not much of a problem, somehow dual rank memory get better performance than single rank at the same frequency / timing on Ryzen and it mitigate the performance.

Reports on the 1.0.0.6 bios looks very good, single rank kits getting over 3200mhz and dual rank kit reaching 3200mhz. Can't wait to get my hands on it (not available on my mb yet)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.