CUDA application for the O3ASHF search

Eugene Stemple

Joined: 9 Feb 11

Posts: 67

Credit: 401814317

RAC: 373107

Comparitive times: low-end

7 Mar 2024 15:33:12 UTC

Message 223070

(moderation:

)

Comparitive times: low-end GTX 1660Ti moderately loaded Ryzen 7 5700X;

averages for 10 work units, run time (seconds) +/- std.dev

1.07 OpenCL 1203.1 +/- 13.6

1.08 cuda 1157.5 +/- 20.6

1.11 cuda 1160.2 +/- 10.3

1.14 cuda -- work units in cache...

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4116

Credit: 49144240845

RAC: 32190491

1.14 seems considerably

7 Mar 2024 17:05:00 UTC

Message 223071

(moderation:

)

1.14 seems considerably slower than 1.08

at least on my GTX1060/E5-2697Av4 system.

maybe the opposite is true for systems with slow CPUs and faster GPUs.

_________________________________________________________________________

JohnDK

Joined: 25 Jun 10

Posts: 120

Credit: 2637247332

RAC: 965666

On my 2080 Super running x2

7 Mar 2024 19:41:06 UTC

Message 223073

(moderation:

)

On my 2080 Super running x2 WUs + MPS, 1.08 took around 16 mins, 1.14 takes 22-23 mins.

Ben Scott

Joined: 30 Mar 20

Posts: 54

Credit: 1798595159

RAC: 2645692

on my RTX 3080 the GW task

7 Mar 2024 20:07:17 UTC

Message 223074

(moderation:

)

on my RTX 3080 the GW task are taking about a third longer with 1.14 vs 1.08.

Eugene Stemple

Joined: 9 Feb 11

Posts: 67

Credit: 401814317

RAC: 373107

... addendum to msg #223070

8 Mar 2024 3:12:55 UTC

Message 223082

(moderation:

)

... addendum to msg #223070 ...

1.14 cuda 1111.3 +/- 7.3

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4116

Credit: 49144240845

RAC: 32190491

Bernd, is the recalc on GPU

8 Mar 2024 20:25:53 UTC

Message 223099

(moderation:

)

Bernd, is the recalc on GPU step for the midpoint and end of 1.14 a FP64 load?

_________________________________________________________________________

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4346

Credit: 252831334

RAC: 41033

Could it be that 1.14 is

13 Mar 2024 12:40:00 UTC

Message 223223

(moderation:

)

Could it be that 1.14 is slower than 1.08 (only) if you run multiple instances/tasks in parallel?

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4346

Credit: 252831334

RAC: 41033

Ian&Steve C. wrote: Bernd,

13 Mar 2024 12:46:04 UTC

Message 223224 in response to message 223099

(moderation:

)

Ian&Steve C. wrote:

Bernd, is the recalc on GPU step for the midpoint and end of 1.14 a FP64 load?

Actually, the new tasks with reduced memory consumption run two "old"-style tasks on after the other. Each old-style task has a "Recalc"-step at the end, the progress counter reserves 25% for that. You also should be able to track the swithch in the stderr of the respective task ("finished main analysis"). So there are indee two "Recalc"s near 50% and 100% of a "new" task.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4116

Credit: 49144240845

RAC: 32190491

yes i can see recalc from

13 Mar 2024 13:46:47 UTC

Message 223227

(moderation:

)

yes i can see recalc from 37.5-50% and 87.5-100%.

the question was more about precision used during that step. is it double precision (FP64) during those times?

_________________________________________________________________________

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4346

Credit: 252831334

RAC: 41033

The core operation is single

13 Mar 2024 14:18:00 UTC

Message 223229

(moderation:

)

The core operation is single precision (derived from the SSE version). I'm not entirely sure there aren't any few double precision calcs outside the loop, but these shouldn't matter much. What makes this part slow is the rather random memory access, which causes much more delay on GPU than on CPU.

CUDA application for the O3ASHF search

Forums › Technical News

Comment viewing options

Forums › Technical News