Bernd, is the recalc on GPU step for the midpoint and end of 1.14 a FP64 load?
Actually, the new tasks with reduced memory consumption run two "old"-style tasks on after the other. Each old-style task has a "Recalc"-step at the end, the progress counter reserves 25% for that. You also should be able to track the swithch in the stderr of the respective task ("finished main analysis"). So there are indee two "Recalc"s near 50% and 100% of a "new" task.
The core operation is single precision (derived from the SSE version). I'm not entirely sure there aren't any few double precision calcs outside the loop, but these shouldn't matter much. What makes this part slow is the rather random memory access, which causes much more delay on GPU than on CPU.
Comparitive times: low-end
)
Comparitive times: low-end GTX 1660Ti moderately loaded Ryzen 7 5700X;
averages for 10 work units, run time (seconds) +/- std.dev
1.07 OpenCL 1203.1 +/- 13.6
1.08 cuda 1157.5 +/- 20.6
1.11 cuda 1160.2 +/- 10.3
1.14 cuda -- work units in cache...
1.14 seems considerably
)
1.14 seems considerably slower than 1.08
at least on my GTX1060/E5-2697Av4 system.
maybe the opposite is true for systems with slow CPUs and faster GPUs.
_________________________________________________________________________
On my 2080 Super running x2
)
On my 2080 Super running x2 WUs + MPS, 1.08 took around 16 mins, 1.14 takes 22-23 mins.
on my RTX 3080 the GW task
)
on my RTX 3080 the GW task are taking about a third longer with 1.14 vs 1.08.
... addendum to msg #223070
)
... addendum to msg #223070 ...
1.14 cuda 1111.3 +/- 7.3
Bernd, is the recalc on GPU
)
Bernd, is the recalc on GPU step for the midpoint and end of 1.14 a FP64 load?
_________________________________________________________________________
Could it be that 1.14 is
)
Could it be that 1.14 is slower than 1.08 (only) if you run multiple instances/tasks in parallel?
BM
Ian&Steve C. wrote: Bernd,
)
Actually, the new tasks with reduced memory consumption run two "old"-style tasks on after the other. Each old-style task has a "Recalc"-step at the end, the progress counter reserves 25% for that. You also should be able to track the swithch in the stderr of the respective task ("finished main analysis"). So there are indee two "Recalc"s near 50% and 100% of a "new" task.
BM
yes i can see recalc from
)
yes i can see recalc from 37.5-50% and 87.5-100%.
the question was more about precision used during that step. is it double precision (FP64) during those times?
_________________________________________________________________________
The core operation is single
)
The core operation is single precision (derived from the SSE version). I'm not entirely sure there aren't any few double precision calcs outside the loop, but these shouldn't matter much. What makes this part slow is the rather random memory access, which causes much more delay on GPU than on CPU.
BM