Improvements in the code of the clients

Tom M
Tom M
Joined: 2 Feb 06
Posts: 1,147
Credit: 2,140,059,706
RAC: 4,571,233
Topic 225756

A BOINC friend on another thread noted the following:

Btw,

 

Beside of twiddling constantly with your cooling system, you could ask the developers (official) to take a look at the hw support for PTX atomic global add f32 atleast on NVIDIA hardware. 

 

5 seconds cut away in tha first try. 

 

The next advice is to take a look at the access pattern of twiddle_dee: that would shave off some 50 seconds on NVIDIA and 17% on some ATI model I know of.

 

Keep on crunching!

Over the hill?  What hill?  I don't REMEMBER any hill...
A Proud member of the O.F.A. (I've forgotten what that stands for.... ;)

 

 

 

 

petri33
petri33
Joined: 4 Mar 20
Posts: 69
Credit: 1,319,101,808
RAC: 6,589,870

Twiddle dee is defined

Twiddle dee is defined __constant float2[3][256] and it is accessed from  different location by each thread resulting to serialized access on the nvidia hardware. SLOW!

Please replace the word __constant with global.

See: https://einsteinathome.org/fi/workunit/565663876

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 913
Credit: 6,735,359,849
RAC: 24,042,608

petri33 wrote:Twiddle dee

petri33 wrote:

Twiddle dee is defined __constant float2[3][256] and it is accessed from  different location by each thread resulting to serialized access on the nvidia hardware. SLOW!

Please replace the word __constant with global.

See: https://einsteinathome.org/fi/workunit/565663876

this is the biggest thing holding Nvidia back for sure. it's a kind of artificial limit that's not allowing Nvidia cards to operate to their full potential, at least for modern Nvidia cards (Pascal - Ampere). It's the sole reason that Nvidia has long under-performed at Einstein. But that's changing ;)

 

with the above changes,

Pascal can speed up processing ~40-60%

Turing can speed up processing ~65%

Ampere can speed up processing ~100-110%

 

It requires the use of OpenCL 2.0+ drivers though, which conveniently enough Nvidia pumped their drivers up to OpenCL 3.0 since the 465 driver branch. Getting OpenCL 2.0 working on AMD/Linux with older cards is a bit more of a challenge, but not impossible. Newer cards have better ROCm support than the older cards. Nvidia really has a better handle on their drivers than AMD does.

 

I wonder if this fix could be applied to the Windows app as well? 

 

_____________________________________________

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,031
Credit: 218,234,762
RAC: 49,706

I passed that on to our GPU

I passed that on to our GPU App developer.

BM

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 913
Credit: 6,735,359,849
RAC: 24,042,608

Thanks Bernd

Thanks Bernd. Please keep us updated if this makes it into a new app.

_____________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 1,147
Credit: 2,140,059,706
RAC: 4,571,233

Bernd Machenschalk wrote: I

Bernd Machenschalk wrote:

I passed that on to our GPU App developer.

Thank you.

Over the hill?  What hill?  I don't REMEMBER any hill...
A Proud member of the O.F.A. (I've forgotten what that stands for.... ;)

 

 

 

 

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,041
Credit: 746,106,017
RAC: 1,141,914

I've just downloaded a new

I've just downloaded a new version 1.01 (GW-opencl-nvidia) (beta test) for the Gravitational Wave search O3 All-Sky #1 (O3AS) search - deployed about an hour and a half ago. Could that be related?

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 913
Credit: 6,735,359,849
RAC: 24,042,608

Possibly. But in my testing

Possibly. But in my testing the mentioned changes have little effect on the GW app since it’s so heavily CPU bound. 
 

they need to update the Gamma Ray apps primarily. 
 

also, if they implement the changes, you’ll need OpenCL 2.0 compatible drivers to make use of it. You’ll get errors otherwise. 

_____________________________________________

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,041
Credit: 746,106,017
RAC: 1,141,914

I noticed the new app first

I noticed the new app first on a Windows machine, which has "device version OpenCL 1.2 CUDA". There's a separate one for Linux, which I've also downloaded.

The first tasks specified to use the new app will reach the head of the cache while I'm out at dinner. I'll see what sort of a mess we have when I get back.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 913
Credit: 6,735,359,849
RAC: 24,042,608

For nvidia, you need drivers

For nvidia, you need drivers from the 465 or 470 branch. Those include OpenCL 3.0 on both Windows and Linux 

 

on AMD, I believe the Windows drivers support OpenCL 2.0 for the cards that support it. but on Linux it’s a little more complicated. The AMDGPU-Pro drivers only support OpenCL 2.0 on Vega and newer. Older cards like the Polaris based RX500 series GPUs will only get OpenCL 1.2 from the AMDGPU-Pro drivers even though the hardware supports it. It’s an issue/limitation with the ROCm runtime included with the AMD driver installer. You’ll only get OpenCL 2.0 support (at least enough to work with this code change) if you do the full ROCm install. 

_____________________________________________

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,041
Credit: 746,106,017
RAC: 1,141,914

Returned from dinner - new

Returned from dinner - new app tasks are running well, and seemingly quicker then before on both Windows and Linux.

Windows driver version (easiest to check) is 452.06

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.