Why is there no CUDA support for this application?
Thanks
It's a fair question. The answer is that not all algorithms are equally well suited for running on CUDA cards, or GPUs in general.
First, you want to do all or almost all computations in single precision (4 byte floating point representation), because the consumer NVIDIA graphics cards are really not so fast in double precision. This is no problem for the Gravitational Wave search app.
Second, you need to rewrite the software so that the computation that runs as one serial thread on the CPU is broken down into thousands of threads running in parallel, doing similar operations on different chunks of the data. That is the really hard part. You cannot just run 1000 or so individual jobs in parallel (as you would on a hypothetical 1000 core PC) because everything has to fit into the 1GB or so on-board memory close to the GPU to be really fast. So it has to be ONE GW task, done by thousands of threads at a time. Not trivial at all.
Another reason is that we already have plans for a successor method to the current code, and that method, once it is ready, will be easier to put on a GPU. So we are kind of reluctant to invest more effort with uncertain outcome into CUDA-fication of the current code as it might be replaced in the not so distant future.
Gravitational Wave S6 LineVeto CUDA support?
)
It's a fair question. The answer is that not all algorithms are equally well suited for running on CUDA cards, or GPUs in general.
First, you want to do all or almost all computations in single precision (4 byte floating point representation), because the consumer NVIDIA graphics cards are really not so fast in double precision. This is no problem for the Gravitational Wave search app.
Second, you need to rewrite the software so that the computation that runs as one serial thread on the CPU is broken down into thousands of threads running in parallel, doing similar operations on different chunks of the data. That is the really hard part. You cannot just run 1000 or so individual jobs in parallel (as you would on a hypothetical 1000 core PC) because everything has to fit into the 1GB or so on-board memory close to the GPU to be really fast. So it has to be ONE GW task, done by thousands of threads at a time. Not trivial at all.
Another reason is that we already have plans for a successor method to the current code, and that method, once it is ready, will be easier to put on a GPU. So we are kind of reluctant to invest more effort with uncertain outcome into CUDA-fication of the current code as it might be replaced in the not so distant future.
Hope this answers your question,
Cheers
HB
Thanks HB, for the lengthy
)
Thanks HB, for the lengthy explanation. That answers my question :)