Low GPU Load

dskagcommunity

Joined: 16 Mar 11

Posts: 89

Credit: 1219701683

RAC: 36277

25 Apr 2011 11:58:44 UTC

Topic 195766

(moderation:

)

Hi all on this Eastermonday! :)

I installed a new 24/7 BOINC only Machine with Opteron 2,6Ghz (AMD, Single Core) CPU and a NVIDIA 9800 GTX+ (1GB RAM, Win XP SP3, Boinc 6.10.58, newest Detonator drivers). So im running my first Binary Radio Pulsar searches with a GPU but it uses only 55% GPU load with max 52% CPU Load. Do i need to change any settings for 70+% on GPU? The CPU has no other Projects running.

Thx for any helps :)

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 762358792

RAC: 1082151

Low GPU Load

25 Apr 2011 13:20:11 UTC

Message 105226

(moderation:

)

Hi!

Volunteers have experimented with customized app_info.xml files to make more than one task in parallel on a single GPU, leading to longer runtimes but higher GPU utilization and overall increased throughput. See this thread:

http://einsteinathome.org/node/195553

Make sure to read to the end of the tread (or read it in-reverse order :-) because some of the app_info.xml posted at the beginning of the thread are now outdated because of newer app versions.

dskagcommunity

Joined: 16 Mar 11

Posts: 89

Credit: 1219701683

RAC: 36277

Ok thx, then it does not work

25 Apr 2011 13:56:35 UTC

Message 105227

(moderation:

)

Ok thx, then it does not work for me cos this Graphicscard has only 512MB of ram, so cant run 2 WUs @ same time :/

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

FrankHagen

Joined: 13 Feb 08

Posts: 102

Credit: 272200

RAC: 0

RE: Volunteers have

25 Apr 2011 14:36:11 UTC

Message 105228 in response to message 105226

(moderation:

)

Quote:

Volunteers have experimented with customized app_info.xml files to make more than one task in parallel on a single GPU, leading to longer runtimes but higher GPU utilization and overall increased throughput.

of course this is macgyvering things once again..

in fact i'm pretty sure it's mainly because the number of threads running on the GPU is simply too low.

i remember watching this on every project moving into GPU-developement and it took quite some effort to get the apps to tun efficiently.

you do know that paper?

http://www.cs.berkeley.edu/~volkov/volkov10-GTC.pdf

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 762358792

RAC: 1082151

Yup, I Know this

25 Apr 2011 15:34:58 UTC

Message 105229

(moderation:

)

Yup, I Know this presentation, it's very interesting as it states that some of the performance recommendations written in the NVIDIA documentation are not the full truth and sometimes should be ignored to get better performance.

As for the BRP3 app, a considerable part of the work is done in NVIDIAs own FFT library cuFFT, which is also discussed in that paper.

I always wanted to play around with some of the ideas given in this paper but had no time yet. But given the fact that the CUFFT part takes a considerable share, the potential for optimization without re-writing the FFT part (noooooooooooo!) is somewhat limited.

Having said that, there sure remains some room for optimization, and just like the GW app, there will be improvements with each iteration.

Quote:

in fact i'm pretty sure it's mainly because the number of threads running on the GPU is simply too low.

Actually, the paper that you reference states the exact opposite: sometimes you get better performance by using FEWER threads!

FrankHagen

Joined: 13 Feb 08

Posts: 102

Credit: 272200

RAC: 0

RE: Yup, I Know this

25 Apr 2011 15:51:44 UTC

Message 105230 in response to message 105229

(moderation:

)

Quote:

Yup, I Know this presentation, it's very interesting as it states that some of the performance recommendations written in the NVIDIA documentation are not the full truth and sometimes should be ignored to get better performance.

and the bottom line says: it's hard to code for optimal performance on all those different architectures out there.

the app needs to check what's available and adapt itself.

Quote:

As for the BRP3 app, a considerable part of the work is done in NVIDIAs own FFT library cuFFT, which is also discussed in that paper.

I always wanted to play around with some of the ideas given in this paper but had no time yet. But given the fact that the CUFFT part takes a considerable share, the potential for optimization without re-writing the FFT part (noooooooooooo!) is somewhat limited.

right - no go!

but giving cuda 4.0 a try (which is available with 270.xx drivers) should be worthy.

at least nvidia says there are many performance improvements..

FrankHagen

Joined: 13 Feb 08

Posts: 102

Credit: 272200

RAC: 0

to bump this up.. i

4 May 2011 13:27:08 UTC

Message 105231 in response to message 105230

(moderation:

)

to bump this up..

i remember DNECT (in fact distributed.net) had implememented an option for manual override like this:

"Core selection:

This option determines core selection. Auto-select is usually best since
it allows the client to pick other cores as they become available. Please
let distributed.net know if you find the client auto-selecting a core that
manual benchmarking shows to be less than optimal.
Cores marked as 'n/a' are not applicable to your particular cpu/os.

RC5-72:-

0) CUDA 1-pipe 64-thd
1) CUDA 1-pipe 128-thd-
2) CUDA 1-pipe 256-thd
3) CUDA 2-pipe 64-thd
4) CUDA 2-pipe 128-thd
5) CUDA 2-pipe 256-thd
6) CUDA 4-pipe 64-thd
7) CUDA 4-pipe 128-thd
8) CUDA 4-pipe 256-thd
9) CUDA 1-pipe 64-thd busy wait
10) CUDA 1-pipe 64-thd sleep 100us
11) CUDA 1-pipe 64-thd sleep dynamic"

on my GTS250 automatic selection went for option 0, but 10 worked a lot better.

~140 Mkeys/sec and 10% CPU load compared to ~ 200 Mkeys/sec and 1% CPU load.

option 9 was even faster, but took a complete CPU-core for that.

Low GPU Load

Forums › Cruncher's Corner

Low GPU Load

Ok thx, then it does not work

RE: Volunteers have

Yup, I Know this

RE: Yup, I Know this

to bump this up.. i

Comment viewing options

Forums › Cruncher's Corner