CUDA application for the O3ASHF search

Eugene Stemple

Joined: 9 Feb 11

Posts: 67

Credit: 396497672

RAC: 400603

Comparitive times: low-end

7 Mar 2024 15:33:12 UTC

Message 223070

(moderation:

)

Comparitive times: low-end GTX 1660Ti moderately loaded Ryzen 7 5700X;

averages for 10 work units, run time (seconds) +/- std.dev

1.07 OpenCL 1203.1 +/- 13.6

1.08 cuda 1157.5 +/- 20.6

1.11 cuda 1160.2 +/- 10.3

1.14 cuda -- work units in cache...

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4081

Credit: 48691889570

RAC: 34777151

1.14 seems considerably

7 Mar 2024 17:05:00 UTC

Message 223071

(moderation:

)

1.14 seems considerably slower than 1.08

at least on my GTX1060/E5-2697Av4 system.

maybe the opposite is true for systems with slow CPUs and faster GPUs.

_________________________________________________________________________

JohnDK

Joined: 25 Jun 10

Posts: 120

Credit: 2623174106

RAC: 562512

On my 2080 Super running x2

7 Mar 2024 19:41:06 UTC

Message 223073

(moderation:

)

On my 2080 Super running x2 WUs + MPS, 1.08 took around 16 mins, 1.14 takes 22-23 mins.

Ben Scott

Joined: 30 Mar 20

Posts: 54

Credit: 1762462979

RAC: 2931245

on my RTX 3080 the GW task

7 Mar 2024 20:07:17 UTC

Message 223074

(moderation:

)

on my RTX 3080 the GW task are taking about a third longer with 1.14 vs 1.08.

Eugene Stemple

Joined: 9 Feb 11

Posts: 67

Credit: 396497672

RAC: 400603

... addendum to msg #223070

8 Mar 2024 3:12:55 UTC

Message 223082

(moderation:

)

... addendum to msg #223070 ...

1.14 cuda 1111.3 +/- 7.3

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4081

Credit: 48691889570

RAC: 34777151

Bernd, is the recalc on GPU

8 Mar 2024 20:25:53 UTC

Message 223099

(moderation:

)

Bernd, is the recalc on GPU step for the midpoint and end of 1.14 a FP64 load?

_________________________________________________________________________

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4332

Credit: 252259945

RAC: 35005

Could it be that 1.14 is

13 Mar 2024 12:40:00 UTC

Message 223223

(moderation:

)

Could it be that 1.14 is slower than 1.08 (only) if you run multiple instances/tasks in parallel?

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4332

Credit: 252259945

RAC: 35005

Ian&Steve C. wrote: Bernd,

13 Mar 2024 12:46:04 UTC

Message 223224 in response to message 223099

(moderation:

)

Ian&Steve C. wrote:

Bernd, is the recalc on GPU step for the midpoint and end of 1.14 a FP64 load?

Actually, the new tasks with reduced memory consumption run two "old"-style tasks on after the other. Each old-style task has a "Recalc"-step at the end, the progress counter reserves 25% for that. You also should be able to track the swithch in the stderr of the respective task ("finished main analysis"). So there are indee two "Recalc"s near 50% and 100% of a "new" task.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4081

Credit: 48691889570

RAC: 34777151

yes i can see recalc from

13 Mar 2024 13:46:47 UTC

Message 223227

(moderation:

)

yes i can see recalc from 37.5-50% and 87.5-100%.

the question was more about precision used during that step. is it double precision (FP64) during those times?

_________________________________________________________________________

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4332

Credit: 252259945

RAC: 35005

The core operation is single

13 Mar 2024 14:18:00 UTC

Message 223229

(moderation:

)

The core operation is single precision (derived from the SSE version). I'm not entirely sure there aren't any few double precision calcs outside the loop, but these shouldn't matter much. What makes this part slow is the rather random memory access, which causes much more delay on GPU than on CPU.

CUDA application for the O3ASHF search

Forums › Technical News

Comment viewing options

Forums › Technical News