BRP3 CUDA

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 758
Credit: 152,365,684
RAC: 116,919

Don't know. But what about HT

Don't know. But what about HT on and 7 CPU threads, i.e. reserving one thread for the GPU?

MrS

Scanning for our furry friends since Jan 2002

astro-marwil
astro-marwil
Joined: 28 May 05
Posts: 427
Credit: 149,250,114
RAC: 4,878

The following is just

The following is just interesting for people who like to squeeze out their gpu.

Quote:
If you start the freeware tool MSI Afterburner you can observe the effect of changing the priority from “Less than Normal†to “Highâ€. The graph becomes less cluttered and the mean gpu load increases. This is also after the last changes in the software, but the effect is remarkably less pronounced.

Here you now see the results of some measurements I carried out after the last changes in software , by help of MSI Afterburner and the Task Manager. In the Afterburner the sampling interval was left at 1000[ms] and the “Log history to file†became activated. Then exactly to 1 second every 2 minutes in the Task Manager the “Priority†of the BRP3 file became toggled by hand between “Less than Normal†and “Highâ€. At the end, the data from the log file became sorted for each priority level into different columns. (If you like so, I established by all of this a hand-crafted digital lock-in detector. But I will not do this again soon. It´s simply toooo stupid! And in the near future I will not have time to program this.)

It can clearly be seen in the graph, that the lower priority in the mean has the lower gpu load. In the graph it is somewhat difficult to see, but during priority of “Less than Normal†there was for 38sec no gpu load, at 15.4min for 5sec and at 38.4min for 32sec consecutively. There was no download or else activity recognized during this time. During priority “High†the lowest gpu load was 10% for 2sec and 11% for 1sec. The according probability density curves quantify this.

The main lobe from priority “High†is clearly smaller and at higher value of gpu load than that from priority “Less than Normalâ€. This is also expressed in the cumulative probability density curves.
In the next step each interval and the whole of each priority level became statistically evaluated.

From the overall statistics we get

  • • cpu priority _______ gpu load [%]
    •

“High†_________ 73,0 +/- 4,1
• “Less than Normal†_ 70,3 +/- 9,7
Referred to the lower, the “High†priority has 3,7% higher gpu load.
This advantage is not so easily seen from total measurement time of the files, as the advantage is about double of the variation of the measurement time at me. (19400 +/- 332[s] correspondingly +/- 1.7[%] derived from the last 20 measurements) So the advantage seems to be about equal to the cpu load of about 4% by the Firefox browser.
To establish permanently the priority “High†for the BRP3 files I installed the Process Tamer . In the configuration menu you have to click in the last column “Explicit Rule†on “Force Above Normal†and then on “OKâ€. With the Process Explorer measured within a 2 minutes period gave a mean cpu load for the Process Tamer of 0.036% only. It is mainly active when a new BRP3 file starts.
I tried also priority “Realtimeâ€, but then sometimes the laptop seems to be blocked for my action for up to 10 seconds. If you hit during this time the keyboard impatiently, you get after wakeup some crucial activity on the screen.
By using this principal it seem to be possible to optimize your system. At now I´m missing a tool to record the cpu usage in a log file.

I hope this will help someone.
Kind regards

Stef
Stef
Joined: 8 Mar 05
Posts: 180
Credit: 49,247,694
RAC: 59,873

Thanks for the analysis and

Thanks for the analysis and the beautiful graphs, astro-marwil.
I think for me it's not worth to install Process Tamer for these few percents.

I wonder if it's possible to chance the CUDA code in a manner that it's not that much dependent on CPU timing.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1,281
Credit: 1,407,087,522
RAC: 1,232,028

RE: I wonder if it's

Quote:

I wonder if it's possible to chance the CUDA code in a manner that it's not that much dependent on CPU timing.

Probably. The BRP app has been switching functions from CPU to GPU one at a time instead of doing it all at once. It still uses a lot of CPU vs other GPU apps which indicates that there's probably room for improvement.

Grutte Pier [Wa Oars]~MAB The Frisian
Grutte Pier [Wa...
Joined: 18 Jan 10
Posts: 47
Credit: 1,640,778
RAC: 0

@ astro-marwil : "At now I´m

@ astro-marwil : "At now I´m missing a tool to record the cpu usage in a log file"

Doesn't PT create a logfile ?

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 779
Credit: 25,160,422
RAC: 0

In case you missed it...Linux

In case you missed it...
Linux CUDA application released!

Best,
Oliver

 


Einstein@Home Project

Saenger
Saenger
Joined: 15 Feb 05
Posts: 403
Credit: 18,689,922
RAC: 60,614

RE: In case you missed

Quote:

In case you missed it...
Linux CUDA application released!

Best,
Oliver


Yeeehaw!
Just got one, let's see, how it works :D

Edith says:
You should update this thread;)

Grüße vom Sänger

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 738,419,435
RAC: 0

RE: In case you missed

Quote:

In case you missed it...
Linux CUDA application released!

Best,
Oliver

That is a great news. I have an extra card I can fire up in Linux now and give it a try this evening. Thanks a lot!

Saenger
Saenger
Joined: 15 Feb 05
Posts: 403
Credit: 18,689,922
RAC: 60,614

RE: Yeeehaw! Just got one,

Quote:
Yeeehaw!
Just got one, let's see, how it works :D


It worked quite smooth and fast:[pre]216133931 91659302 19 Jan 2011 15:57:31 UTC 19 Jan 2011 19:42:22 UTC Completed, waiting for validation 5,350.93 5,064.44 38.87 pending Binary Radio Pulsar Search v1.06 (BRP3cuda32fullCPU)
215968545 91589987 18 Jan 2011 5:23:15 UTC 19 Jan 2011 1:35:23 UTC Completed, waiting for validation 43,199.66 40,509.12 310.89 pending Binary Radio Pulsar Search v1.04 [/pre]
43,200 : 5,350 = 8.1
So eight times as fast as only on CPU, while blocking a whole core on my quad.
Set-up here is:
with GT240, driver 260.19.29, OS is ubuntu 10.04 LTS

Grüße vom Sänger

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 738,419,435
RAC: 0

With my Q6600 @ 3.0 GHz, GTX

With my Q6600 @ 3.0 GHz, GTX 295, driver 260.19.29, and the new Linux BRP3 CUDA app, I am seeing the following completion times most recently:

4262 sec
4257 sec

The CPU usage for the CUDA apps is 99.7%.

Normally it would take the CPU by itself between 72-84K seconds to complete a BRP3 work unit.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.