First question is, does the Run Time for a work unit include the CPU time or is CPU time in addition to the Run Time?
I ask from looking at this workunit: https://einsteinathome.org/workunit/390479635
My 11+ year old Pentium E2200 clocked at 2.93GHz with a RX 570 obviously crunched it faster than a 1050ti, but that 1050ti is attached to a much more modern 3.4GHz Xeon but took an incredible amount of CPU run time. Which leads to question number two, what gives with the difference in CPU Run Time?
Copyright © 2024 Einstein@Home. All rights reserved.
GPUs run the application but
)
GPUs run the application but sometimes need the CPU to run more complex calculations. Think of it like fact checking. So Run time maybe much longer than CPU time. ATI cards tend to be better at scientific calculation than Nvidia so they tend to be faster.
At least that is how I think of it...lol
Run time would better be
)
Run time would better be labelled "elapsed time" as in most cases (in which nothing suspended things during execution) it is just the difference in wall clock time from beginning to end.
CPU time measures how much time a CPU core was dedicated to that task. So it can't be more than Run time. For the current Einstein NVidia application it is generally very nearly equal to elapsed time (run time)--and if it is not you are robbing your GPU of performance.
The reason is that the means of communication used for the current Einstein Nvidia application is not interrupt-driven as the CUDA-compiled applications were, but rather a polling loop. The support task on the CPU task asks incessantly of the GPU "do you need something of me"?. Most of the time the answer is "no", and the CPU effort is wasted, but when the answer is "yes", the GPU output is reduced if the polling does not learn that at the first cycle possible and act on it at maximum feasible speed.
Getting maximum GPU output at Einstein from an Nvidia card with the current application means assuring that the support CPU thread is running very nearly all the time. This is one of several reasons that running higher multiplicities (X2, X3...) is far less attractive at Einstein for current cards and the current application than it was a few years ago.
If you have a CPU with lots of cores, and only one GPU, you may like the results from using Process Lasso to elevate the priority of the GPU support task, and cutting back the number of CPU tasks. This may both get rapid response to your GPU service requests, and leave enough cores unused by BOINC to service your personal interactive use. If you have not so many cores of CPU, and more than one GPU, you have a juggling act among the priorities of getting a lot of GPU performance while not ruining your interactive experience.