GPU use optimisation ?

[AF>Amis des Lapins] Phil1966
[AF>Amis des La...
Joined: 25 Jun 13
Posts: 4
Credit: 361,281,496
RAC: 25
Topic 197758

Hello,

Can someone be kind enough to explain me why EINSTEIN's WU's, 280000 GFLOPS only, take such a long time to be completed ?

I am running ACERIBO GPU, and need to run 2 WU's at once to reach a GPU use of about 90 %.

Tasks are completed in about 1 hour.

I am not an IT specialist, but how comes that on other projects I don't want to mention here, 13400000 GFLOPS WU's are completed in less than 30 or 45 minutes ? (depending if using CUDA or OpenCL)

Are "optimised apps" available ?

Thank You

Kind Regards

Phil1966

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1,714,373,961
RAC: 0

GPU use optimisation ?

It would help if you mentioned the name of the unmentionable project or a comparison is difficult.

Could it be because the unmentionable projects do simple integer calculations that can be completely performed on the GPU while E@H do complicated floating point math (and thus qualify for the FL in FLOP) that can't be completely done on the GPU and therefore requires alot of CPU as well as GPU?

Who knows?

mikey
mikey
Joined: 22 Jan 05
Posts: 12,731
Credit: 1,839,130,849
RAC: 3,542

RE: It would help if you

Quote:

It would help if you mentioned the name of the unmentionable project or a comparison is difficult.

Could it be because the unmentionable projects do simple integer calculations that can be completely performed on the GPU while E@H do complicated floating point math (and thus qualify for the FL in FLOP) that can't be completely done on the GPU and therefore requires alot of CPU as well as GPU?

Who knows?

I think the old nail took a pretty big whack right on it's head with that explanation!! Some projects can get alot of their units into the much faster gpu memory to crunch, others can't, those that can't suffer by comparison. BUT they are still much faster than those projects whose units don't use the gpu at all!

[AF>Amis des Lapins] Phil1966
[AF>Amis des La...
Joined: 25 Jun 13
Posts: 4
Credit: 361,281,496
RAC: 25

RE: It would help if you

Quote:

It would help if you mentioned the name of the unmentionable project or a comparison is difficult.

Could it be because the unmentionable projects do simple integer calculations that can be completely performed on the GPU while E@H do complicated floating point math (and thus qualify for the FL in FLOP) that can't be completely done on the GPU and therefore requires alot of CPU as well as GPU?

Who knows?

...

Fortunately, some teammates who calculate for over 10 years have given me more detailed and constructive explanations. As far as I know, there are not 25 "GPU's projects". But most other projects propose optimizations, so cruncher who invest into new hardware may increase their participation.

Thank You anyway.

NB : First and last question on this forum.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 580,198,838
RAC: 171,905

Phil, don't let a single

Phil, don't let a single answer be representative for an entire forum or project.

The problem with GPUs is that they're not very flexible. Hence simple algorithms without many memory accesses and practically without any CPU intervention can achieve the best hardware utilization. By this I don't mean the percentage shown in monitoring utilities, but rather the throughput in terms of GFlops achieved. The project you're comparing very likely belongs into this category. The roblem with such projects is that.. well, not very many real world problems can be tackled this way. That's why some of them work on completely arbitrary "problems".

Einstein, on the other hand, uses some sophisticated and complex calculations. For BRP GPU tasks it can be GPU memory bandwidth limited, whereas PCIe bandwidth and CPU performance also matter. As far as I understand those algorithms are optimized very well. For nVidia GPUs there may be some room for improvement left with newer compilers, but those do not yet work with Einsteins cross-platform compilation scheme due to some bug(s). Currently no user-supplied apps are available, but the source code is. I don't kow if someone already tried to achieve anything better than what the project delivers.

And finally flop counting itself can be problematic. Suffice to say that there are different ways to do it and sometimes one has to rely on estimates (which could be way off).

Edit: does that match what your team collegue told you? If not I'm certainly interested to hear his ideas.

MrS

Scanning for our furry friends since Jan 2002

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.