Improvements in the code of the clients

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6365
Credit: 9363004317
RAC: 16889910
Topic 225756

A BOINC friend on another thread noted the following:

Btw,

 

Beside of twiddling constantly with your cooling system, you could ask the developers (official) to take a look at the hw support for PTX atomic global add f32 atleast on NVIDIA hardware. 

 

5 seconds cut away in tha first try. 

 

The next advice is to take a look at the access pattern of twiddle_dee: that would shave off some 50 seconds on NVIDIA and 17% on some ATI model I know of.

 

Keep on crunching!

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

petri33
petri33
Joined: 4 Mar 20
Posts: 123
Credit: 3890765819
RAC: 6190135

Twiddle dee is defined

Twiddle dee is defined __constant float2[3][256] and it is accessed from  different location by each thread resulting to serialized access on the nvidia hardware. SLOW!

Please replace the word __constant with global.

See: https://einsteinathome.org/fi/workunit/565663876

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3926
Credit: 45505592642
RAC: 63155528

petri33 wrote:Twiddle dee

petri33 wrote:

Twiddle dee is defined __constant float2[3][256] and it is accessed from  different location by each thread resulting to serialized access on the nvidia hardware. SLOW!

Please replace the word __constant with global.

See: https://einsteinathome.org/fi/workunit/565663876

this is the biggest thing holding Nvidia back for sure. it's a kind of artificial limit that's not allowing Nvidia cards to operate to their full potential, at least for modern Nvidia cards (Pascal - Ampere). It's the sole reason that Nvidia has long under-performed at Einstein. But that's changing ;)

 

with the above changes,

Pascal can speed up processing ~40-60%

Turing can speed up processing ~65%

Ampere can speed up processing ~100-110%

 

It requires the use of OpenCL 2.0+ drivers though, which conveniently enough Nvidia pumped their drivers up to OpenCL 3.0 since the 465 driver branch. Getting OpenCL 2.0 working on AMD/Linux with older cards is a bit more of a challenge, but not impossible. Newer cards have better ROCm support than the older cards. Nvidia really has a better handle on their drivers than AMD does.

 

I wonder if this fix could be applied to the Windows app as well? 

 

_________________________________________________________________________

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4307
Credit: 249769890
RAC: 34281

I passed that on to our GPU

I passed that on to our GPU App developer.

BM

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3926
Credit: 45505592642
RAC: 63155528

Thanks Bernd

Thanks Bernd. Please keep us updated if this makes it into a new app.

_________________________________________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6365
Credit: 9363004317
RAC: 16889910

Bernd Machenschalk wrote: I

Bernd Machenschalk wrote:

I passed that on to our GPU App developer.

Thank you.

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2942747792
RAC: 716760

I've just downloaded a new

I've just downloaded a new version 1.01 (GW-opencl-nvidia) (beta test) for the Gravitational Wave search O3 All-Sky #1 (O3AS) search - deployed about an hour and a half ago. Could that be related?

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3926
Credit: 45505592642
RAC: 63155528

Possibly. But in my testing

Possibly. But in my testing the mentioned changes have little effect on the GW app since it’s so heavily CPU bound. 
 

they need to update the Gamma Ray apps primarily. 
 

also, if they implement the changes, you’ll need OpenCL 2.0 compatible drivers to make use of it. You’ll get errors otherwise. 

_________________________________________________________________________

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2942747792
RAC: 716760

I noticed the new app first

I noticed the new app first on a Windows machine, which has "device version OpenCL 1.2 CUDA". There's a separate one for Linux, which I've also downloaded.

The first tasks specified to use the new app will reach the head of the cache while I'm out at dinner. I'll see what sort of a mess we have when I get back.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3926
Credit: 45505592642
RAC: 63155528

For nvidia, you need drivers

For nvidia, you need drivers from the 465 or 470 branch. Those include OpenCL 3.0 on both Windows and Linux 

 

on AMD, I believe the Windows drivers support OpenCL 2.0 for the cards that support it. but on Linux it’s a little more complicated. The AMDGPU-Pro drivers only support OpenCL 2.0 on Vega and newer. Older cards like the Polaris based RX500 series GPUs will only get OpenCL 1.2 from the AMDGPU-Pro drivers even though the hardware supports it. It’s an issue/limitation with the ROCm runtime included with the AMD driver installer. You’ll only get OpenCL 2.0 support (at least enough to work with this code change) if you do the full ROCm install. 

_________________________________________________________________________

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2942747792
RAC: 716760

Returned from dinner - new

Returned from dinner - new app tasks are running well, and seemingly quicker then before on both Windows and Linux.

Windows driver version (easiest to check) is 452.06

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.