AVX512 support?

darkclown
darkclown
Joined: 27 Sep 06
Posts: 5
Credit: 236,903
RAC: 0
Topic 219455

It's tough to tell, but do any of the CPU apps here take advantage of AVX-512?  I've got one system with 2x AVS-512 units, and on other projects, its tears up compared to FMA3/AVX2.

Robert
Robert
Joined: 5 Nov 05
Posts: 47
Credit: 305,475,781
RAC: 16,167

I have wondered the same for

I have wondered the same for a while.

Short Answer: I did not see a significant speedup for v2.09 of the Gravity Wave application using AVX-512 hardware.

Detail: I have been running an Intel i7-9700 with AVX2 hardware for a while  and had a good baseline for the range of times.  Since it is currently summer and based on the RAM requirements of the current version, I was running 4 GW tasks at a time.

I recently upgraded to an Intel i7-11700 with AVX-512 hardware.  The times decreased 15% across the range of running the same 4 GW tasks as before.  I don't have any way to actually tell if the code utilized AVX-512 instructions, but I did not see the drastic speedup often seen in AVX-512 benchmarks.

I believe 3 things can account for the 15% speedup; 1) 4.6 GHz versus 4.8 GHz CPU speeds, 2) 12 MB versus 16 MB of L3 cache and 3) general instruction performance improvements across 2 generations of CPU.

 

Exard3k
Exard3k
Joined: 25 Jul 21
Posts: 50
Credit: 34,028,838
RAC: 13,961

Installed base of CPUs that

Installed base of CPUs that support AVX512 is very low. Probably not worth the extensive code adaption necessary.  Intel also has ditched AVX512 for their upcoming CPU generation and no AMD CPUs support it.

 

Oh and you should notice an increase in thermals and throttling down of your AVX512 CPU when these instructions are actually run. AVX512 is very demanding.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 921
Credit: 6,780,180,510
RAC: 17,542,402

Exard3k wrote: Oh and you

Exard3k wrote:

Oh and you should notice an increase in thermals and throttling down of your AVX512 CPU when these instructions are actually run.

I remember reading some study where AVX2 was actually faster than AVX512 due to the heavy CPU throttling needed when running AVX512

_____________________________________________

Exard3k
Exard3k
Joined: 25 Jul 21
Posts: 50
Credit: 34,028,838
RAC: 13,961

Ian&Steve C. wrote: I

Ian&Steve C. wrote:

I remember reading some study where AVX2 was actually faster than AVX512 due to the heavy CPU throttling needed when running AVX512

 

There are certainly applications that use AVX512 out there. and I've seen some Phoronix benchmarks where W3175 Xeons (28 core flagship) outperform 64 Core Zen2 Threadrippers. AVX512 is no joke if the application is built for it. But both supported CPUs and optimized applications are niche.

 

Intel retains the instructions on their server CPUs (only big cores there, the atom cores can't handle it) and AMD plans to include it in future Epyc generations and maybe TR Pro.

 

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1,358
Credit: 2,361,151,889
RAC: 3,095,482

Ian&Steve C. wrote: Exard3k

Ian&Steve C. wrote:

Exard3k wrote:

Oh and you should notice an increase in thermals and throttling down of your AVX512 CPU when these instructions are actually run.

I remember reading some study where AVX2 was actually faster than AVX512 due to the heavy CPU throttling needed when running AVX512

 

That shouldn't be the case unless there's something wrong; and I don't recall seeing any results where it was (maybe if you've got a weird mix of normal and avx code or are thrashing the memory subsystem?).  AVX-512 normally runs a bit more than 2/3rds the speed of normal code but is 2x as fast as AVX-256 for around a 50% speedup.  Because CPUs are more efficient at lower clock rates the 50% reduction in power needed to keep in the thermal envelope should result in a decent bit less than a 50% drop in clock rate.

Exard3k
Exard3k
Joined: 25 Jul 21
Posts: 50
Credit: 34,028,838
RAC: 13,961

As much as I'd like to have

As much as I'd like to have AVX512 (it's not simple to work with, looking at some rather intimidating tables), I'd prefer to see AVX2 on FGRP tasks first, the GW app is 5 years newer and already has AVX2 built-in according to wiki and some looks at the runtimes.

 

But I'd really like to see a full-blown Zen4 EPYC Genoa crunching 1.000.000 RAC on it's own in 2023. That is totally possible with good AVX512 code.

petri33
petri33
Joined: 4 Mar 20
Posts: 69
Credit: 1,356,201,563
RAC: 6,714,263

I have taken a look at the

I have taken a look at the GPU code  (OpenCl) of this project. I do not know of the order or the frequency of the GPU kernel calls, but I do know what the kernels do.

To make use  of the AVX512 or whatever instruction set or a new GPU the processor or the compiler must be really good at 32 bit floating point arithmetic, parallelisation of the code and efficient memory access with both reads and writes of stride 6*4 or 8*4 bytes and have an exceptionally good FFT library.

Not even a hand coded cache management scheme can address the issues of a bad selection of data arrangement. As an exercise of memory and thought: How would you write an array of structs or a struct of arrays full of elements like {int32 x; int32 y; int 32 z;} ... most efficiently when the writes or reads are done in parallel?

The array of structs is bad, bad and bad... It always leads to a strided memory access pattern. (you touch only every n:Th of memory location thus losing both memory and cache bandwidth.)

--

petri33

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.