CUDA application for the O3ASHF search

Eugene Stemple
Eugene Stemple
Joined: 9 Feb 11
Posts: 67
Credit: 374569149
RAC: 547260

Comparitive times:  low-end

Comparitive times:  low-end GTX 1660Ti  moderately loaded Ryzen 7 5700X;

averages for 10 work units,  run time (seconds) +/- std.dev

1.07   OpenCL     1203.1  +/- 13.6

1.08   cuda         1157.5  +/- 20.6

1.11   cuda         1160.2  +/- 10.3

1.14   cuda  --  work units in cache...

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3952
Credit: 46787692642
RAC: 64190731

1.14 seems considerably

1.14 seems considerably slower than 1.08

at least on my GTX1060/E5-2697Av4 system.

maybe the opposite is true for systems with slow CPUs and faster GPUs.

_________________________________________________________________________

JohnDK
JohnDK
Joined: 25 Jun 10
Posts: 116
Credit: 2561880478
RAC: 2371712

On my 2080 Super running x2

On my 2080 Super running x2 WUs + MPS, 1.08 took around 16 mins, 1.14 takes 22-23 mins.

Ben Scott
Ben Scott
Joined: 30 Mar 20
Posts: 53
Credit: 1596753131
RAC: 4978092

on my RTX 3080 the GW task

on my RTX 3080 the GW task are taking about a third longer with 1.14 vs 1.08.

Eugene Stemple
Eugene Stemple
Joined: 9 Feb 11
Posts: 67
Credit: 374569149
RAC: 547260

... addendum to msg #223070

... addendum to msg #223070 ...

1.14  cuda      1111.3  +/- 7.3

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3952
Credit: 46787692642
RAC: 64190731

Bernd, is the recalc on GPU

Bernd, is the recalc on GPU step for the midpoint and end of 1.14 a FP64 load?

_________________________________________________________________________

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250462137
RAC: 35142

Could it be that 1.14 is

Could it be that 1.14 is slower than 1.08 (only) if you run multiple instances/tasks in parallel?

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250462137
RAC: 35142

Ian&Steve C. wrote: Bernd,

Ian&Steve C. wrote:

Bernd, is the recalc on GPU step for the midpoint and end of 1.14 a FP64 load?

Actually, the new tasks with reduced memory consumption run two "old"-style tasks on after the other. Each old-style task has a "Recalc"-step at the end, the progress counter reserves 25% for that. You also should be able to track the swithch in the stderr of the respective task ("finished main analysis"). So there are indee two "Recalc"s near 50% and 100% of a "new" task.

BM

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3952
Credit: 46787692642
RAC: 64190731

yes i can see recalc from

yes i can see recalc from 37.5-50% and 87.5-100%.

the question was more about precision used during that step. is it double precision (FP64) during those times?

_________________________________________________________________________

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250462137
RAC: 35142

The core operation is single

The core operation is single precision (derived from the SSE version). I'm not entirely sure there aren't any few double precision calcs outside the loop, but these shouldn't matter much. What makes this part slow is the rather random memory access, which causes much more delay on GPU than on CPU.

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.