Improvements in the code of the clients

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3955

Credit: 46843312642

RAC: 64382729

Thanks Bernd!

12 Aug 2021 14:12:55 UTC

Message 188193

(moderation:

)

Thanks Bernd!

_________________________________________________________________________

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3955

Credit: 46843312642

RAC: 64382729

Bernd Machenschalk

12 Aug 2021 19:03:05 UTC

Message 188200 in response to message 188192

(moderation:

)

Bernd Machenschalk wrote:

Thanks, probably missed that. I'll take another look next week.

dont forget about the change to "twiddles" also. not just twiddle_dee. there are 3 main conditions being changed in the code I sent you. the change to __global for both twiddles and twiddle_dee, and the change from lds[64][64] to lds[64][65] that I mentioned in the last post. twiddle_was already addressed in v1.25/1.26, but twiddles should be changed too according to petri. it's in the bottom section of code I sent over.

change

__constant float2 twiddles[
to

__global float2 twiddles[

_________________________________________________________________________

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250487444

RAC: 34717

Have another go with 1.27.

17 Aug 2021 13:30:00 UTC

Message 188280

(moderation:

)

Thanks!

Have another go with 1.27.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250487444

RAC: 34717

Hm - clFFT has its own

17 Aug 2021 14:38:18 UTC

Message 188281

(moderation:

)

Hm - clFFT has its own clBuildProgram() calls with own options - I think I'll have to patch these, too.

Try 1.28.

DF1DX

Joined: 14 Aug 10

Posts: 105

Credit: 3858556854

RAC: 4944657

Good improvement.The runtime

18 Aug 2021 7:51:03 UTC

Message 188301

(moderation:

)

Good improvement.The runtime dropped from around 28 to about 17 minutes on my old 1050 Ti! (1 WU, Linux Mint, Driver 490.57).

Does this change also work on AMD cards?

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3955

Credit: 46843312642

RAC: 64382729

v1.28 works well. I see

18 Aug 2021 9:07:37 UTC

Message 188303

(moderation:

)

v1.28 works well. I see similar behavior and runtimes with v1.28 as with our manual code injection.

DF1DX wrote:

Does this change also work on AMD cards?

it should work. but the speed improvement isn't as dramatic as with nvidia cards from our tests. maybe ~20% or less. but we really only tested Polaris and Navi (not "big" navi) cards. so other architectures the improvement is unknown. there might need to be some other changes in the amd app to make it work. we had to tweak the code injection to get it working with AMD. just remember that if implemented you will need OpenCL 2.0 drivers. many people have been running their cards with the legacy (opencl 1.2) install because it was easy and it works, but these new techniques only work with OpenCL 2.0.

right now I think the project admins have only changed the Nvidia apps.

_________________________________________________________________________

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250487444

RAC: 34717

I added app (Beta Test)

27 Aug 2021 10:17:27 UTC

Message 188520

(moderation:

)

I added app (Beta Test) versions for AMD/ATI w. OpenCL 2.0.

bozz4science

Joined: 4 May 20

Posts: 15

Credit: 67643923

RAC: 2621

Thank you all for the

27 Aug 2021 13:01:13 UTC

Message 188526

(moderation:

)

Thank you all for the tremednous contributions! The speed up is greatly appreciated :) It is weird to think that such small clever architectural changes in the code can help NVIDIA cards perform so much more efficiently. And most of that (to the best of my understanding) thanks to a different population of the arrays. Love to see NVIDIA cards getting more competitive on E@H!

But reading through this thread, I got a bit confused. Who is now to thank for this code review and awesome ideas? Where did you exchange ideas?

Cheers

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3955

Credit: 46843312642

RAC: 64382729

Bernd Machenschalk wrote: I

27 Aug 2021 18:27:15 UTC

Message 188536 in response to message 188520

(moderation:

)

Bernd Machenschalk wrote:

I added app (Beta Test) versions for AMD/ATI w. OpenCL 2.0.

thanks Bernd.

at what point do these apps come out of beta testing and released for general use? What is the criteria you’re looking for?

_________________________________________________________________________

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3955

Credit: 46843312642

RAC: 64382729

bozz4science wrote: Thank

27 Aug 2021 18:50:23 UTC

Message 188537 in response to message 188526

(moderation:

)

bozz4science wrote:

Thank you all for the tremednous contributions! The speed up is greatly appreciated :) It is weird to think that such small clever architectural changes in the code can help NVIDIA cards perform so much more efficiently. And most of that (to the best of my understanding) thanks to a different population of the arrays. Love to see NVIDIA cards getting more competitive on E@H!

But reading through this thread, I got a bit confused. Who is now to thank for this code review and awesome ideas? Where did you exchange ideas?

Cheers

user petri33 took it upon himself to examine the code used in the Einstein apps. Him, myself, and several others have had a hunch that there were some inefficiencies in the code that was really holding Nvidia back. The performance difference between comparable nvidia and AMD GPUs was too great to be chalked up to “AMD is just better at this”.

Since the openCL code in the application is in plain text, it’s easier to see what the app is doing. Additionally, you can dump the nvidia compute cache to see what OpenCL code was compiled at runtime.

petri has vast knowledge and experience writing applications and optimizing applications for nvidia GPUs when doing signals analysis. He wrote the custom linux application that dominated over on SETI (4-5x faster than the project provided apps).

petri devised a way to inject code into the application real-time. This allowed fast and easy testing of code changes without the need to modify or recompile the application. On a simplistic view, it’s looking for certain sections of code and swapping them out for better sections of code on the fly. Using this method he looked for “low hanging fruit” changes that would have a big impact. That’s what was done here. No changes to the Einstein code, just optimizations for better memory access on nvidia by using some different types of arrays. I myself and a few others tested that these changes do in fact work and provide faster run times.

I then contacted Bernd via PM and sent him the code with some short explanations of what was done and what changes needed to be made. Bernd made the changes and incorporated them into the application.

Petri is still working on it looking for more optimizations that can be done. But this recent change will probably be the biggest jump in performance for nvidia. And maybe small iterative improvements going forward. But the biggest limiting factor for him is time. He’s a busy guy and doing this in his spare time just for fun.

so petri is responsible for the idea and figuring out how exactly to implement it. I helped him verify findings by testing the code on my systems on several different types of GPUs. A few other members of our team also tested on various platforms and GPUs to sort out any bugs or issues. And then Bernd rolled it all into a more official application that can be used on Linux/windows/Mac for appropriate GPUs.

Most of the credit goes to petri I think, since he found the issues and the project devs weren’t looking for this, and I don’t blame them since they probably don’t have enough time to dedicate to things like this. But we all certainly thank Bernd for his effort in digesting the info we gave him and figuring out how to apply it in the official applications (since the implementation is vastly different, even if the outcome is similar).

I hope that clarifies everyone role in this.

_________________________________________________________________________

Improvements in the code of the clients

Forums › Wish List

Comment viewing options

Forums › Wish List