HT vs. Non-HT with Ryzen 7 2700x

Cruncher-American
Cruncher-American
Joined: 24 Mar 05
Posts: 71
Credit: 5689877728
RAC: 3598947
Topic 225282

Tried out a test machine (Ralph) with an 8c/16t Ryzen 7 2700x. 

In both cases, I ran maxing at 90% of cores, so running only cpu work,

I ran 7 wus simultaneously with ht off and and 13 with ht on.

I had some 1.08 snd 2.09 in each case.

HT OFF - all wus ran around 3-3.5 hours.

HT ON - all wus ran around 7-7.5 hours.

This seems odd to me, as when I tried HT vs. NON-HT in seti, I  saw a much smaller difference in relative runtime.  (I was running also 6 threads of gpu work at that timel). 

Am I supposed to see a 2 to 1 time ratio? In the Einstein case it suggests no advantage at all to using ht. Is that an artifact of Ryzen? (My seti machines were dual xeon E5-2680v2, so 40 threads per system).

Just curious...

 

mikey
mikey
Joined: 22 Jan 05
Posts: 12849
Credit: 1884328953
RAC: 399483

Cruncher-American

Cruncher-American wrote:

Tried out a test machine (Ralph) with an 8c/16t Ryzen 7 2700x. 

In both cases, I ran maxing at 90% of cores, so running only cpu work,

I ran 7 wus simultaneously with ht off and and 13 with ht on.

I had some 1.08 snd 2.09 in each case.

HT OFF - all wus ran around 3-3.5 hours.

HT ON - all wus ran around 7-7.5 hours.

This seems odd to me, as when I tried HT vs. NON-HT in seti, I  saw a much smaller difference in relative runtime.  (I was running also 6 threads of gpu work at that timel). 

Am I supposed to see a 2 to 1 time ratio? In the Einstein case it suggests no advantage at all to using ht. Is that an artifact of Ryzen? (My seti machines were dual xeon E5-2680v2, so 40 threads per system).

Just curious...

The problem is with the cache size, with HT on both the HT cpu and the real cpu core are sharing the same cache size, if the tasks doesn't fit inside the cache it's swaps out to the harddrive making it run much slower. They talk about it alot over at PrimeGrid with their very different tasks.

Cruncher-American
Cruncher-American
Joined: 24 Mar 05
Posts: 71
Credit: 5689877728
RAC: 3598947

Ah, that seems to make some

Ah, that seems to make some sense; thanks, Mikey.

Ralph has a 1tb NVME device for it's drive; wouldn't that help alleviate the swapping problem?

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4105
Credit: 48968504948
RAC: 33227456

Cruncher-American wrote: Ah,

Cruncher-American wrote:

Ah, that seems to make some sense; thanks, Mikey.

Ralph has a 1tb NVME device for it's drive; wouldn't that help alleviate the swapping problem?

no that doesnt matter. he's talking about CPU level cache. that 2700 only has 16MB L3 cache. when it "swaps" it rolls over into system memory which is much slower (and the nvme drive is even slower still).

 

you cant change the amount of CPU cache. it's built into the CPU die.

_________________________________________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6739
Credit: 9715845526
RAC: 2383167

Cruncher-American wrote: Am

Cruncher-American wrote:

Am I supposed to see a 2 to 1 time ratio? In the Einstein case it suggests no advantage at all to using ht. Is that an artifact of Ryzen? (My seti machines were dual xeon E5-2680v2, so 40 threads per system).

Just curious...

I have restrained my 3950x (16c/32t) to around 8 Gravity Wave threads because if I run much more than that the time processing more than doubles.  I also discovered that the resources app off the task manager was reporting each task was using 3 CPU threads.

I have a less than maximum # of World Community Grid tasks also processing on that CPU because if I constraint them somewhat I can get "full production" for the tasks I am running from each project.

I have every reason to believe I am running into CPU cache limits because I can start the HD running like mad if I raise the # of threads processing beyond something like 10 Gravity Wave tasks.

I haven't been processing Gamma-Ray CPU tasks because the gpu version runs much faster and "pays" much more.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6739
Credit: 9715845526
RAC: 2383167

That got me curious.  I have

That got me curious.  I have a test rig setup that I seeing how well the MB did 3 gpus plugged directly into the MB (not very well) but it also has an Amd 2700x installed with the stock cooler.  I was just about to "dismount" the cpu/ram/mb but....

So I have just re-installed an Rx 580 and booted it again with a profile of Zero resources and used app_config.xml file to limit the cpu to 1 GR task.  I am also running 1 GR task on the gpu.

I am curious to see how fast that cpu can crunch a GR cpu task when it is essentially got nearly all the cpu cache available.

Yes it has SMT (aka: HT) enabled.

Like you said "Just curious"....

Tom M

 

 

A Proud member of the O.F.A.  (Old Farts Association).

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5052
Credit: 19098798538
RAC: 6111429

Should be negligible, if any

Should be negligible, if any difference.  GR hardly uses the cpu at all.  It is almost entirely gpu constrained till the 89.9997% point and then switches back to the cpu for the final result scoring.

Takes 5 seconds or less on my 3950X.

 

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6739
Credit: 9715845526
RAC: 2383167

Keith Myers wrote: Should be

Keith Myers wrote:

Should be negligible, if any difference.  GR hardly uses the cpu at all.  It is almost entirely gpu constrained till the 89.9997% point and then switches back to the cpu for the final result scoring.

Takes 5 seconds or less on my 3950X.

Keith,

I am confused.  Since when would a cpu task use "any" gpu in the processing?

I can report that the first GR cpu task took 1 hour and about 27 minutes.  This was under a Ubuntu/All-In-One setup.  I just looked at CA is running windows 10 on the 2700x.

I thought that Windows was coming within 10% of Linux in performance on this project?  So one re-test for CA is how fast does a single Gamma Ray task process on the cpu under Win10?  I had forgotten my daily driver is a Win10/Amd 2700x box.  I will get it setup to run on this too.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5052
Credit: 19098798538
RAC: 6111429

Never mind.  I am confused. 

Never mind.  I am confused.  You stated you installed a RX580, thus were talking about about what to expect on a GR gpu task.

 

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6739
Credit: 9715845526
RAC: 2383167

Keith Myers wrote: Never

Keith Myers wrote:

Never mind.  I am confused.  You stated you installed an RX580, thus were talking about what to expect on a GR gpu task.

Another example of my "less than clear" writing in the prior post.  Sorry!

I have my Daily driver up using a single GR CPU task (and the gpu for 1 task) and will report results later.  Since it is a Win 10 box running an AMD 2700x the results should be within yelling distance of what CA would get if he ran one GR task.

It will take more experimentation to figure out how many CPU tasks we can run on a 2700x before the processing speed begins to fall.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5052
Credit: 19098798538
RAC: 6111429

Well I am not running any gpu

Well I am not running any gpu tasks on my 2700X.  Just 14 Universe cpu tasks.

CPU_time = Run_time or within 2 seconds of each other.  Any more trips the overcommit situation and the CPU_time starts to fall dramatically behind the run_time.

But that is not with GR tasks.  But I did run GR tasks about a year ago and I thought I could run about the same amount or maybe two less since I also was running 3 gpu tasks at the same time.

But the differences stretched out to around 1-3 minutes between the timers if I remember correctly.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.