CUDA application for the O3ASHF search

Eugene Stemple
Eugene Stemple
Joined: 9 Feb 11
Posts: 67
Credit: 372992020
RAC: 543367

Comparitive times:  low-end

Comparitive times:  low-end GTX 1660Ti  moderately loaded Ryzen 7 5700X;

averages for 10 work units,  run time (seconds) +/- std.dev

1.07   OpenCL     1203.1  +/- 13.6

1.08   cuda         1157.5  +/- 20.6

1.11   cuda         1160.2  +/- 10.3

1.14   cuda  --  work units in cache...

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46606452642
RAC: 64202133

1.14 seems considerably

1.14 seems considerably slower than 1.08

at least on my GTX1060/E5-2697Av4 system.

maybe the opposite is true for systems with slow CPUs and faster GPUs.

_________________________________________________________________________

JohnDK
JohnDK
Joined: 25 Jun 10
Posts: 116
Credit: 2555160478
RAC: 2363301

On my 2080 Super running x2

On my 2080 Super running x2 WUs + MPS, 1.08 took around 16 mins, 1.14 takes 22-23 mins.

Ben Scott
Ben Scott
Joined: 30 Mar 20
Posts: 53
Credit: 1582230072
RAC: 4918785

on my RTX 3080 the GW task

on my RTX 3080 the GW task are taking about a third longer with 1.14 vs 1.08.

Eugene Stemple
Eugene Stemple
Joined: 9 Feb 11
Posts: 67
Credit: 372992020
RAC: 543367

... addendum to msg #223070

... addendum to msg #223070 ...

1.14  cuda      1111.3  +/- 7.3

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46606452642
RAC: 64202133

Bernd, is the recalc on GPU

Bernd, is the recalc on GPU step for the midpoint and end of 1.14 a FP64 load?

_________________________________________________________________________

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250362713
RAC: 35188

Could it be that 1.14 is

Could it be that 1.14 is slower than 1.08 (only) if you run multiple instances/tasks in parallel?

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250362713
RAC: 35188

Ian&Steve C. wrote: Bernd,

Ian&Steve C. wrote:

Bernd, is the recalc on GPU step for the midpoint and end of 1.14 a FP64 load?

Actually, the new tasks with reduced memory consumption run two "old"-style tasks on after the other. Each old-style task has a "Recalc"-step at the end, the progress counter reserves 25% for that. You also should be able to track the swithch in the stderr of the respective task ("finished main analysis"). So there are indee two "Recalc"s near 50% and 100% of a "new" task.

BM

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46606452642
RAC: 64202133

yes i can see recalc from

yes i can see recalc from 37.5-50% and 87.5-100%.

the question was more about precision used during that step. is it double precision (FP64) during those times?

_________________________________________________________________________

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250362713
RAC: 35188

The core operation is single

The core operation is single precision (derived from the SSE version). I'm not entirely sure there aren't any few double precision calcs outside the loop, but these shouldn't matter much. What makes this part slow is the rather random memory access, which causes much more delay on GPU than on CPU.

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.