New Improved Gravational Wave App - Discussion

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6432
Credit: 9562271214
RAC: 9744370

My understanding is the

My understanding is the faster the CPU processes an All Sky GW the less time the whole tasks takes.

So I have shut off the SMT on this system.

The top end of the MHz on the CPU hasn't gone up much, if at all.

I am expecting the total runtime average of the tasks to drop.

And the predicted RAC to increase.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4963
Credit: 18702853724
RAC: 6283086

Tom, your understanding of

Tom, your understanding of SMT is a bit lacking.  In modern processors, SMT does not cause any loss of performance due to contention of cpu architecture resources for the most part.

As you noticed, you did not gain any clock frequency when you turned SMT off.  The faster the clocks, the faster the computation completes.  Only then would work returned go up along with the RAC.

If anything, your RAC will drop because you are only doing half the work you were doing before.

 

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6432
Credit: 9562271214
RAC: 9744370

Keith,I understand your

Keith,

I understand your point. I believe that trying this will not slow the individual task processing down  I believe the CPU processing of the tasks are not coded as multi-threading.

I will note that the 7601 HPC recommendations also suggest turning off SMT.

My All Sky GW tasks appear to spend about 50 percent of their clock time running purely on the CPU.

Tom M

 

 

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46637272642
RAC: 64192848

Tom you should switch to the

Tom you should switch to the 1.08/1.15 CUDA app. It does much better than the opencl app in my opinion.

1.07 is the Nvidia-OpenCL app. this uses CPU for the recalc sections
1.08 was the fist release of the Nvidia-CUDA app. this uses CPU for the recalc also
1.14 is the working release of the Nvidia-CUDA app with GPU based recalc, does not rely on the CPU so much. this app is the default now (non-beta)
1.15 is the same binary file as 1.08. 1.15 is in the beta channel so if you select beta, you will get this one. identical to 1.08 in every way.

I like the 1.14 app (CUDA, GPU based recalc) for my Volta cards. maybe it's the wide memory bus that makes this app faster than 1.08/1.15 here.

i like the 1.08/1.15 app for my 3080Tis. they are consistently faster than 1.14 on that system.

but either app should be faster than the 1.07 app you're using now.

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4963
Credit: 18702853724
RAC: 6283086

Ian, how did you persuade the

Ian, how did you persuade the scheduler to send you the 1.14 app?  Setting beta gets you the 1.15 app.

I don't remember ever testing for the gpu-recalc 1.14 app and wanted to see how it compares to 1.08/1.15.

[Edit] Nevermind I found your post about the team package containing it.

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4963
Credit: 18702853724
RAC: 6283086

I looked at the output of

I looked at the output of both the 1.08 and 1.14 apps and they both state that recalc takes place on the cpu.

I thought the 1.14 app was supposed to do the recalc stages on the gpu?

Did the devs not change the code that produces the output log to indicate the recalcs are taking place on the gpu?

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46637272642
RAC: 64192848

Keith Myers wrote: Ian, how

Keith Myers wrote:

Ian, how did you persuade the scheduler to send you the 1.14 app?  Setting beta gets you the 1.15 app.

I don't remember ever testing for the gpu-recalc 1.14 app and wanted to see how it compares to 1.08/1.15.

[Edit] Nevermind I found your post about the team package containing it.



with beta/test applications selected, you should be sent 1.15.

with no beta/test selected, you should be sent 1.14. this is the default now. but yeah you can just edit your app_info and move the new file over.

_________________________________________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46637272642
RAC: 64192848

Keith Myers wrote:I looked

Keith Myers wrote:

I looked at the output of both the 1.08 and 1.14 apps and they both state that recalc takes place on the cpu.

I thought the 1.14 app was supposed to do the recalc stages on the gpu?

Did the devs not change the code that produces the output log to indicate the recalcs are taking place on the gpu?



1.08 app output: https://einsteinathome.org/task/1598957736

Quote:
...
2024-04-06 17:13:56.0437 (3474657) [normal]: Search FstatMethod used: 'ResampGPU'
2024-04-06 17:13:56.0437 (3474657) [normal]: Recalc FstatMethod used: 'DemodSSE'
...



1.14 output: https://einsteinathome.org/task/1598820483

Quote:
...
2024-04-06 21:10:16.7971 (362925) [normal]: Search FstatMethod used: 'ResampGPU'
2024-04-06 21:10:16.7971 (362925) [normal]: Recalc FstatMethod used: 'DemodGPU'
...



1.14 is definitely using the GPU. there's still a bit of GPU activity during the recalc portions [37.5-50.0%] and [87.5-100%]. the outputs are different between them IMO.

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4963
Credit: 18702853724
RAC: 6283086

I guess I was looking at the

I guess I was looking at the text stating how much memory was used on the cpu.  Didn't notice the part you quoted.

I saw the wattage go down on the gpu to about half it normally uses in the beginning and end of the task.

Maybe a tad bit slower on the gpu than the cpu on my 3090.  Not enough tasks completed yet to definitively state it slower.  Maybe 20-30 seconds slower on average.  Hard to tell because of the task variability.

I just edited in the 1.14 app into my app_info to change over. Only trying it out on this daily driver.  Not seeing enough benefit so far to justify swapping every host over yet.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46637272642
RAC: 64192848

yeah it probably depends a

yeah it probably depends a lot on what your specific GPU/CPU combo is.

my theory is that overall memory bus width and/or bandwidth might be a factor for which is better. I don't have much real evidence for that other than some passing comments from Bernd about random memory access patterns and looking at the specs of things.

the EPYC DDR4 8-ch mem is a wide pipe (512-bit). but the 3072-bit and 4096-bit links of the Titan V and V100 (resp.) HBM is wider, and on these systems the GPU app is better. but on the 3080Ti host they only have a relatively small 384-bit bus, and my host with 3080tis (same 64-core EPYC CPU as the titan V systems) does better with CPU recalc 1.08 app. and it wasnt even close. 1.14 was considerably slower from what i remember on that host like 30% slower.

might be interesting to see how the 1.14 app responds to similar cards on a platform with a more consumer CPU with only dual channel memory vs the 1.08 app.  i forget which of your hosts is the daily driver. if it's one of the 7950x systems, the fast DDR5 with fast 7950x might be closing the gap.

_________________________________________________________________________

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.