CUDA application for the O3ASHF search

Eugene Stemple
Eugene Stemple
Joined: 9 Feb 11
Posts: 67
Credit: 361876849
RAC: 539952

Comparitive times:  low-end

Comparitive times:  low-end GTX 1660Ti  moderately loaded Ryzen 7 5700X;

averages for 10 work units,  run time (seconds) +/- std.dev

1.07   OpenCL     1203.1  +/- 13.6

1.08   cuda         1157.5  +/- 20.6

1.11   cuda         1160.2  +/- 10.3

1.14   cuda  --  work units in cache...

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3923
Credit: 45273422642
RAC: 63328298

1.14 seems considerably

1.14 seems considerably slower than 1.08

at least on my GTX1060/E5-2697Av4 system.

maybe the opposite is true for systems with slow CPUs and faster GPUs.

_________________________________________________________________________

JohnDK
JohnDK
Joined: 25 Jun 10
Posts: 115
Credit: 2510300478
RAC: 2116060

On my 2080 Super running x2

On my 2080 Super running x2 WUs + MPS, 1.08 took around 16 mins, 1.14 takes 22-23 mins.

Ben Scott
Ben Scott
Joined: 30 Mar 20
Posts: 53
Credit: 1482821992
RAC: 3817318

on my RTX 3080 the GW task

on my RTX 3080 the GW task are taking about a third longer with 1.14 vs 1.08.

Eugene Stemple
Eugene Stemple
Joined: 9 Feb 11
Posts: 67
Credit: 361876849
RAC: 539952

... addendum to msg #223070

... addendum to msg #223070 ...

1.14  cuda      1111.3  +/- 7.3

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3923
Credit: 45273422642
RAC: 63328298

Bernd, is the recalc on GPU

Bernd, is the recalc on GPU step for the midpoint and end of 1.14 a FP64 load?

_________________________________________________________________________

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4307
Credit: 249644124
RAC: 34386

Could it be that 1.14 is

Could it be that 1.14 is slower than 1.08 (only) if you run multiple instances/tasks in parallel?

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4307
Credit: 249644124
RAC: 34386

Ian&Steve C. wrote: Bernd,

Ian&Steve C. wrote:

Bernd, is the recalc on GPU step for the midpoint and end of 1.14 a FP64 load?

Actually, the new tasks with reduced memory consumption run two "old"-style tasks on after the other. Each old-style task has a "Recalc"-step at the end, the progress counter reserves 25% for that. You also should be able to track the swithch in the stderr of the respective task ("finished main analysis"). So there are indee two "Recalc"s near 50% and 100% of a "new" task.

BM

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3923
Credit: 45273422642
RAC: 63328298

yes i can see recalc from

yes i can see recalc from 37.5-50% and 87.5-100%.

the question was more about precision used during that step. is it double precision (FP64) during those times?

_________________________________________________________________________

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4307
Credit: 249644124
RAC: 34386

The core operation is single

The core operation is single precision (derived from the SSE version). I'm not entirely sure there aren't any few double precision calcs outside the loop, but these shouldn't matter much. What makes this part slow is the rather random memory access, which causes much more delay on GPU than on CPU.

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.