Hello,
Can someone be kind enough to explain me why EINSTEIN's WU's, 280000 GFLOPS only, take such a long time to be completed ?
I am running ACERIBO GPU, and need to run 2 WU's at once to reach a GPU use of about 90 %.
Tasks are completed in about 1 hour.
I am not an IT specialist, but how comes that on other projects I don't want to mention here, 13400000 GFLOPS WU's are completed in less than 30 or 45 minutes ? (depending if using CUDA or OpenCL)
Are "optimised apps" available ?
Thank You
Kind Regards
Phil1966
Copyright © 2024 Einstein@Home. All rights reserved.
GPU use optimisation ?
)
It would help if you mentioned the name of the unmentionable project or a comparison is difficult.
Could it be because the unmentionable projects do simple integer calculations that can be completely performed on the GPU while E@H do complicated floating point math (and thus qualify for the FL in FLOP) that can't be completely done on the GPU and therefore requires alot of CPU as well as GPU?
Who knows?
RE: It would help if you
)
I think the old nail took a pretty big whack right on it's head with that explanation!! Some projects can get alot of their units into the much faster gpu memory to crunch, others can't, those that can't suffer by comparison. BUT they are still much faster than those projects whose units don't use the gpu at all!
RE: It would help if you
)
...
Fortunately, some teammates who calculate for over 10 years have given me more detailed and constructive explanations. As far as I know, there are not 25 "GPU's projects". But most other projects propose optimizations, so cruncher who invest into new hardware may increase their participation.
Thank You anyway.
NB : First and last question on this forum.
Phil, don't let a single
)
Phil, don't let a single answer be representative for an entire forum or project.
The problem with GPUs is that they're not very flexible. Hence simple algorithms without many memory accesses and practically without any CPU intervention can achieve the best hardware utilization. By this I don't mean the percentage shown in monitoring utilities, but rather the throughput in terms of GFlops achieved. The project you're comparing very likely belongs into this category. The roblem with such projects is that.. well, not very many real world problems can be tackled this way. That's why some of them work on completely arbitrary "problems".
Einstein, on the other hand, uses some sophisticated and complex calculations. For BRP GPU tasks it can be GPU memory bandwidth limited, whereas PCIe bandwidth and CPU performance also matter. As far as I understand those algorithms are optimized very well. For nVidia GPUs there may be some room for improvement left with newer compilers, but those do not yet work with Einsteins cross-platform compilation scheme due to some bug(s). Currently no user-supplied apps are available, but the source code is. I don't kow if someone already tried to achieve anything better than what the project delivers.
And finally flop counting itself can be problematic. Suffice to say that there are different ways to do it and sometimes one has to rely on estimates (which could be way off).
Edit: does that match what your team collegue told you? If not I'm certainly interested to hear his ideas.
MrS
Scanning for our furry friends since Jan 2002