Did the host with "high power gpus" that you were using run a single task per gpu?
Sure. Running multiple tasks on the same GPU makes no sense for us, as our cluster scheduler can't handle that. And indeed we're using Linux and the CUDA version of the app.
Hm.... on my AMD Pro W7600 run times went up significant, I run 3 tasks paralell, going to try with 2 tasks paralell but I don't see a GPU utilisation problem, it's drawing 124 watts while with the old O3 tasks it was not going above 115 watts. CPU usage is significantly lower as the system now consumes 202 instead of previously 214 watts and it shows in the CPU time per task:
GPU / CPU / Credits
2,002
244
10,000
All-Sky Gravitational Wave search on O3 v1.07 () windows_x86_64
8,025
124
4,000
All-Sky Gravitational Wave search on O3 v1.07 () windows_x86_64
The total CPU recalc time has indeed significantly dropped. I was able to hand time the CPU recalc steps on an old high frequency work unit before they ran out. This was on the i9 14900ks. I ran both tests with the same amount of tasks running, same loads, etc...
Old (high frq): Mid work unit CPU recalc was 76 seconds and end of work unit CPU recalc was 69 seconds for a total CPU time of 145 seconds.
New (low freq): Total CPU recalc time = 32 seconds.
I am confident that one of our old xeons would not see such a percentage reduction in time but it is significantly less across the board.
Bernd, is the CPU recalc being done differently, or just simpler/smaller math to calculate? Or both? Just curious.
I'm seeing run-times a bit
)
I'm seeing run-times a bit differently.
Following is host: HF r-t, Bu r-t, ratio, concurrency, GPU, app version, mps
ora: 2300, 4000, 1.74, x2, 3060ti, 1.07, n/a
tha: 1150, 3290, 2.86, x2, 3070, 1.14, mps 70%
tli: 1070, 2610, 2.44, x2, 3070ti, 1.14, mps 70%
del: 1100, 2630, 2.39, x2, 3070ti, 1.14, mps 70%
tcu: 1000, 2580, 2.58, x3, 3080ti, 1.14, mps 55%
Run times are eyeball estimates from 8hrs ago. GPU's are running with same config for both HF & Bu wu's.
wujj123456 wrote:But the
)
Did the host with "high
)
Sure. Running multiple tasks on the same GPU makes no sense for us, as our cluster scheduler can't handle that. And indeed we're using Linux and the CUDA version of the app.
BM
With the last parameters, I
)
With the last parameters, I was able to calculate around 450-500 WUs/day on the NV 4090 (with 2+2 WUs in parallel and offset for the CPU calculation).
Currently, with a longer runtime (and only 2 units in parallel make sense), it looks more like 100-110 WUs/day.
The estimate of 5-6 months for the planned 150-250 Hz therefore seems a little too optimistic to me.
OK, the validator is running
)
OK, the validator is running for the new tasks. I got 1000 credits per valid tasks.
Harri Liljeroos wrote: OK,
)
Wondering why I got 4000 credits ?
The new tasks got me 4000
)
The new tasks got me 4000 credits also.
The first WUs were issued
)
The first WUs were issued with the credit of the high-freq run, there were a few in between (3000?), last generated ones should give 4000.
BM
Hm.... on my AMD Pro W7600
)
Hm.... on my AMD Pro W7600 run times went up significant, I run 3 tasks paralell, going to try with 2 tasks paralell but I don't see a GPU utilisation problem, it's drawing 124 watts while with the old O3 tasks it was not going above 115 watts. CPU usage is significantly lower as the system now consumes 202 instead of previously 214 watts and it shows in the CPU time per task:
GPU / CPU / Credits
Here's the host:
https://einsteinathome.org/host/13157119
The total CPU recalc time has
)
The total CPU recalc time has indeed significantly dropped. I was able to hand time the CPU recalc steps on an old high frequency work unit before they ran out. This was on the i9 14900ks. I ran both tests with the same amount of tasks running, same loads, etc...
Old (high frq): Mid work unit CPU recalc was 76 seconds and end of work unit CPU recalc was 69 seconds for a total CPU time of 145 seconds.
New (low freq): Total CPU recalc time = 32 seconds.
I am confident that one of our old xeons would not see such a percentage reduction in time but it is significantly less across the board.
Bernd, is the CPU recalc being done differently, or just simpler/smaller math to calculate? Or both? Just curious.