Letting cuda WUs use a full thread/core

Rechenkuenstler

Joined: 22 Aug 10

Posts: 138

Credit: 102567115

RAC: 0

If you want to run BRP3 Tasks

25 Jun 2011 8:47:54 UTC

Message 105705 in response to message 105704

(moderation:

)

If you want to run BRP3 Tasks on your CPU, why dont' you simply use the CPU version of the app. Just go to the Einstein@Home-Settings and switch on the following parameter:

Run CPU versions of applications for which GPU versions are available

In this case you get BRP3 tasks wich are running completely on CPU and you can run BRP3 GPU version in parallel and then it is good, that the GPU version uses only 0.20 CPUs

dunx

Joined: 13 Aug 10

Posts: 119

Credit: 53470527

RAC: 0

I run four GPU WU's on 2x

25 Jun 2011 11:43:34 UTC

Message 105706

(moderation:

)

I run four GPU WU's on 2x GTX460's whilst allowing 6x CPU tasks, and this gives me :-

66 minutes per WU on the GTX 460's
and 95% CPU utilisation, average.

Thus I can surf the web, and play video without compromise.

Maybe I will get to try the alternative operating system apps. at some point on one of my PC's ?

dunx

P.S. i7-960 @ 4 GHz ( Using 75% of the CPU's for 100% of the time )

FrankHagen

Joined: 13 Feb 08

Posts: 102

Credit: 272200

RAC: 0

RE: I did a few experiments

25 Jun 2011 11:48:12 UTC

Message 105707 in response to message 105704

(moderation:

)

Quote:

I did a few experiments with CPU priority of BRP3 (changing the nice value for BRP3), bud did not see any noticable improvement at all...

Ubuntu 10.04, GTS250, Athlon2 4core, gpu driver 270.29.
4 CPU units and 2 BRP3 units at the time...

this is not enough - limit boinc to using 75% of your CPU's and watch GPU-tasks improving on speed.

Rechenkuenstler

Joined: 22 Aug 10

Posts: 138

Credit: 102567115

RAC: 0

RE: I run four GPU WU's on

25 Jun 2011 12:44:38 UTC

Message 105708 in response to message 105706

(moderation:

)

Quote:

I run four GPU WU's on 2x GTX460's whilst allowing 6x CPU tasks, and this gives me :-

66 minutes per WU on the GTX 460's
and 95% CPU utilisation, average.

Thus I can surf the web, and play video without compromise.

Maybe I will get to try the alternative operating system apps. at some point on one of my PC's ?

dunx

P.S. i7-960 @ 4 GHz ( Using 75% of the CPU's for 100% of the time )

So do I. Same machine.

Dirk

Joined: 4 Jun 08

Posts: 35

Credit: 88264743

RAC: 0

RE: If you want to run BRP3

26 Jun 2011 9:58:54 UTC

Message 105709 in response to message 105705

(moderation:

)

Quote:

If you want to run BRP3 Tasks on your CPU, why dont' you simply use the CPU version of the app. Just go to the Einstein@Home-Settings and switch on the following parameter:

Run CPU versions of applications for which GPU versions are available

In this case you get BRP3 tasks wich are running completely on CPU and you can run BRP3 GPU version in parallel and then it is good, that the GPU version uses only 0.20 CPUs

I never said I wanted to run the CPU apps. What I want is to see how much I can squeeze out of my GPU and for that I'd like to test out if GPU usage goes up if the GPU can be fed by a full thread per cuda WU. Right now each cuda task takes 5% of the CPU while running 4 tasks on my single GPU. I'd like that to be higher, as in 12.5% to see if it increases the GPU usage. Why? Because even while running 4 BRP cuda tasks at once the GPU usage maxes out at maybe 84% and only if I don't run any other CPU tasks.

Haven't found time to fiddle around much with my ubuntu install yet. Will hopefully be able to squeeze that in next week. In the mean time I ran some short tests at night to see how overclocking the GPU improved performance. For this I let it run without any CPU tasks running alongside it. Haven't tried it without overclocking though because I started with a slight overlock.

GTX 480 running 4 BRP cuda tasks
core clock 750 Mhz
shader clock 1500 Mhz
Mem clock 1900 Mhz
Average completion time 91 minutes

GTX 480 running 4 BRP cuda tasks
core clock 800 Mhz
shader clock 1600 Mhz
Memory clock 2000 Mhz
Average completion time 88 minutes

Cpu i7 870 @ 3.40 Ghz ran at 20% for this.

I'll keep a look at my results to see if any tasks failed to validate from the slightly higher overclock.

PS I also wanted to do a run with hyperthreading off but it seemed to make the PC a bit unresponsive at random, maybe it was just a fluke though. GPU usage also didn't seem to be higher than with hyperthreading on, maybe a 1% increase but I wasn't sure.

PPS nice stats with those SLI GTX 460 cards guys, if only I had a 2nd GTX 480 (and my PSU could handle it, not sure if 850 watt is enough for that)

Jord

Joined: 26 Jan 05

Posts: 2952

Credit: 5779100

RAC: 0

You do know that all that the

26 Jun 2011 10:11:17 UTC

Message 105710 in response to message 105709

(moderation:

)

You do know that all that the CPU does is the translation of the actual task from whatever language it is in to kernels that the GPU understands and transporting of those kernels to the GPU, then after the GPU's done with them, transport the data back, translate it back to something the humans can understand and write it to disk?

That's all the CPU does in this. It doesn't do any of the calculations.
That is why the CPU load is so minimalistic, it's because the GPU does all the hard work, all the calculations.

If you want more CPU load, you either use the CPUs with their own science application, or use a lesser CPU, or ask Einstein that they please, please use Nvidia's next thing to keep CUDA alive: CUDA on x86 CPUs.

Dirk

Joined: 4 Jun 08

Posts: 35

Credit: 88264743

RAC: 0

RE: You do know that all

26 Jun 2011 10:22:53 UTC

Message 105711 in response to message 105710

(moderation:

)

Quote:

You do know that all that the CPU does is the translation of the actual task from whatever language it is in to kernels that the GPU understands and transporting of those kernels to the GPU, then after the GPU's done with them, transport the data back, translate it back to something the humans can understand and write it to disk?

That's all the CPU does in this. It doesn't do any of the calculations.
That is why the CPU load is so minimalistic, it's because the GPU does all the hard work, all the calculations.

If you want more CPU load, you either use the CPUs with their own science application, or use a lesser CPU, or ask Einstein that they please, please use Nvidia's next thing to keep CUDA alive: CUDA on x86 CPUs.

Yes... I do know that. But dedicating a full thread to feed the GPU can increase GPU usage a bit (I want to find out by how much with the linux app), but what's also important is that it can increase the stability of the GPU usage. Right now the GPU often has to wait for the CPU to feed it because it's bottlenecked at 5% per cuda WU (is the GPU usage graph I posted before still visible?). Take GPUGRID for example, you can run their tasks fine with and without a variable named swan_sync, but what that variable does is dedicate a thread to the GPU. This helps keep the GPU fed at all times and makes the WU run faster. While not using it the GPU has to fight for CPU cycles and often has to wait a bit. Also keep in mind that I run not just 1 cuda WU on my GPU but 4, this causes a lot of traffic between my CPU and GPU.

Like I've said many times, I just want to see how far I can push my GPU.

FrankHagen

Joined: 13 Feb 08

Posts: 102

Credit: 272200

RAC: 0

RE: Yes... I do know that.

26 Jun 2011 14:54:35 UTC

Message 105712 in response to message 105711

(moderation:

)

Quote:

Yes... I do know that. But dedicating a full thread to feed the GPU can increase GPU usage a bit (I want to find out by how much with the linux app), but what's also important is that it can increase the stability of the GPU usage.

you can not compare the cuda-apps of different projects. some use really simple math like collatz, some have gone a long way of developement and only need very little CPU-support like PPS-sieve on PrimeGrid. (i remember running the first test apps there and it was everything else but that)
some cause heavy screen lag like GCW-sieve on PrimeGrid. DistrRtgen needs more CPU-cycles to run like others....

in the end you'll see that each and every one has it's own story.

btw.: i did run some BRP3cuda32nv270 WU's on my GTS250 during the weekend - about 4.100 secs for a WU with a reserved CPU-core of a 3.3 GHz PHENOM-II..

Dirk

Joined: 4 Jun 08

Posts: 35

Credit: 88264743

RAC: 0

True, you can't compare them

26 Jun 2011 15:53:26 UTC

Message 105713

(moderation:

)

True, you can't compare them but I only used that as an example. Also, my graph showing GPU usage shows quite well that the CPU is bottlenecking the GPU. Note that if I don't let CPU tasks run and don't use the PC (like when I'm sleeping) the GPU usage is very stable, I'd just like it to be that stable by providing it with a full cpu thread (I want to find out if it helps anyways), this is probably too much but maybe it can act like a bit of a buffer to prevent the GPU from being bottlenecked because other running processes are getting their grubby paws on CPU cycles and the GPU has to draw a ticket and wait in line.

I wish CPU usage could be negligible for BRP cuda apps but it obviously isn't, it constantly needs to supply the GPU with what it is supposed to be doing and if it can't do that the GPU sits idle for a short time, but these short idle moments add up over the course of a day.

Jord

Joined: 26 Jan 05

Posts: 2952

Credit: 5779100

RAC: 0

Then do what was told earlier

26 Jun 2011 16:56:34 UTC

Message 105714 in response to message 105713

(moderation:

)

Then do what was told earlier in this thread and tell BOINC to use all but one CPU. With the one less CPU doing science, that one CPU will be able to feed the GPU exclusively. One CPU is all you need, you do not need as many CPUs as you have GPUs, you do not need to use as many CPUs as you have tasks to feed to the one GPU that you have.

By telling BOINC to use all but one CPU, the one free CPU will automatically be used by intensive programs such as the GPU science programs as they run at slightly higher than low priority, to keep the GPU fed.

Yes, it's very much possible that if you let your anti-virus check the system constantly, that this CPU will be used for that as well. Or that other very CPU intensive programs running on your system find it a good time to start using this CPU, such as the multiple Windows indexing programs. But that's for you to figure out, before you can do a complete run as you'd want.

Do know that you're using both BOINC and the GPU in a way that neither was intended to be used in, or programmed to be used as such. Any weird artifacts are very probably due to quirks in your own system or because of your strange use of the system.

Letting cuda WUs use a full thread/core

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner