CUDA and openCL Benchmarks

Horacio
Horacio
Joined: 3 Oct 11
Posts: 205
Credit: 80557243
RAC: 0

GTX690: 3x2800s. with 4 free

GTX690: 3x2800s. with 4 free cores, 3x3300s. with 2 or 3 free cores and 3x3600s with just 1 core free...
(I mean 3 tasks in each GPU, that is 6 task in total as this is a double GPU)

I was not able to test for other numbers because this host only does Einstein when SETI has no work and to I had no time to test changing the utilization factors...
I have another host with a GTX680 and the times were the same also doing 3 tasks with 2 free cores. But It finished this Einstein tasks before I noticed it was running them so I have no idea if the times could be better with extra CPU cores reserved...

astrocrab
astrocrab
Joined: 28 Jan 08
Posts: 208
Credit: 429202534
RAC: 0

RE: GTX690: 3x2800s. with 4

Quote:
GTX690: 3x2800s. with 4 free cores, 3x3300s. with 2 or 3 free cores and 3x3600s with just 1 core free...


that's why i don't use cpu crunching at all: gpu time "costs" much much more in "flops"

Horacio
Horacio
Joined: 3 Oct 11
Posts: 205
Credit: 80557243
RAC: 0

RE: RE: GTX690: 3x2800s.

Quote:
Quote:
GTX690: 3x2800s. with 4 free cores, 3x3300s. with 2 or 3 free cores and 3x3600s with just 1 core free...

that's why i don't use cpu crunching at all: gpu time "costs" much much more in "flops"


Yes, but it also depends on the relative speeds of GPUs and CPUs, on my main Eintein host having 2 560TIs reserving more than 2 cores doesnt improove the GPU times enough to compensate the lost due to the unused CPUs... (at least not RAC wise) and, even when I really want one of those certificates for discovering a pulsar, I want also to help with gravitational wave fishing and which is an exclusive CPU task...

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160342159
RAC: 0

RE: Yes, but it also

Quote:
Yes, but it also depends on the relative speeds of GPUs and CPUs, on my main Eintein host having 2 560TIs reserving more than 2 cores doesnt improove the GPU times enough to compensate the lost due to the unused CPUs... (at least not RAC wise) and, even when I really want one of those certificates for discovering a pulsar, I want also to help with gravitational wave fishing and which is an exclusive CPU task...


on the contrary, my dual GTX 560 Ti is powered by a 6-core CPU, and i still see a substantial increase in BRP task efficiency when going from 2 free CPU cores to 3 free CPU cores...so much so that it more than makes up for any lost potential CPU task RAC/PPD. you have to remember it also depends on the projects you're running. pretty much anything run on a GPU is going to outperform the CPU in the flops department. but if i'm also running a project like LHC@Home SixTrack on the CPU at the same time, then the GPUs are going to outperform the CPU in both the flops department and the RAC/PPD department. in other words, if i were that concerned w/ RAC/PPD, i would have to run a CPU project that awards massive credits for each CPU task completed and validated in order to say that its actually worth it to sacrifice GPU performance for it. i'd rather it be the other way around - that is, i'd rather sacrifice CPU task allocation for better GPU task efficiency.

don't get me wrong, i see where you're coming from, and i like to contribute to the gravitational wave search as much as i like to search for pulsars...so when i'm actually running both BRP (GPU) and GW (CPU) applications, i run them on separate machines. now for someone who only has one machine, compromises will have to be made and efficiency sacrificed.

Horacio
Horacio
Joined: 3 Oct 11
Posts: 205
Credit: 80557243
RAC: 0

RE: on the contrary, my

Quote:
on the contrary, my dual GTX 560 Ti is powered by a 6-core CPU, and i still see a substantial increase in BRP task efficiency when going from 2 free CPU cores to 3 free CPU cores...so much so that it more than makes up for any lost potential CPU task RAC/PPD. you have to remember it also depends on the projects you're running. pretty much anything run on a GPU is going to outperform the CPU in the flops department.


In that host I crunch only for Einstein and I use it also for my normal work.
My numbers were that going from 2 free cores to 3 free cores, the GPU times went down no more than 1 minute which means that the 2 GPUs doing 2 task each will be able to do only one more BRP task per day which gives a gain of 500 credits, but each CPU core does 4 S6LV1 per day which gives around 1000 credits per day.
The CPU is an i7-860, and the motherboard sets the PCIe at 8x when there are 2 GPUS, of course the PCIe is version 2.0.

In my other Einstein host which is an 17-2600 with 2 GT430 each GPU needs 2.5 hours when I give 2 free Cores, and they cant go faster no matter how many more cores I give to them.

Anyway, my point was that, the beneffit of reserving cores is something that depends on several things system wide, so the best combination of free cores could be very different even for hosts that have the same model and number of GPUs.

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1695
Credit: 1043047637
RAC: 1369441

Well I found out the running

Well I found out the running cuda X4 on my AthlonII X4 630 Processor and GeForce GTX 550 Ti (1GB) was not a good idea.

It runs X2 @ approx. 54mins but when I tried X4 it started taking over 8 hours to finish.

And now it doesn't want to switch back to X2 (.5)

I finally just had to suspend about 150 tasks so it will only run X2 at the regular time of 54mins

Funny thing is I have a GeForce GTX 550 Ti (2GB) with PhenomII X3 720 having no problem running X4 at the regular time of approx. 51mins

So I hope it switches back to .5 before long so I don't have to check it every 54mins to start another pair of tasks.

For some reason my GeForce 660Ti didn't switch over to .25 and is still running cuda X2 at 44mins

Horacio
Horacio
Joined: 3 Oct 11
Posts: 205
Credit: 80557243
RAC: 0

RE: ... And now it doesn't

Quote:
...
And now it doesn't want to switch back to X2 (.5)
...
For some reason my GeForce 660Ti didn't switch over to .25 and is still running cuda X2 at 44mins


Your first host wont switch back until they receive new BRP units, as the times went from 1 hour to 8 hours, then your cache is currently overcommited and BOINC is not going to ask more work for a while.
If you want to speed up this, you need to exit BOINC completely and then you need to edit the client_state.xml file. In this file you need to locate all the occurrences (should be only one) of the 0.25 and change it to 0.50 (indeed, find the current factor and change it for the desired one)

Just dont do an automatic replace, because if you do something wrong in this file you can loose all the assigned tasks or even you can left BOINC unussuable, and be sure to use a plain text editor. But dont be afraid to edit it, just be sure BOINC is not running and be carefull to not make a typo.

EDIT: obviously, if you set this file with a factor different from what you had in your Einstein settings page, it will be overriden as soon as it gets new work, so first change the Einstein prefferences in the web.

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2694028
RAC: 0

On my i7-2600K host @4.7GHz,

On my i7-2600K host @4.7GHz, using a PCI-E 2.0 x8 connection, my factory overclocked GTX460 running 310.33 drivers does v1.32 BRP4 Cuda tasks in around 1,550 seconds, half of what was reported earlier for a GTX460,

Claggy

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RE: that's why i don't use

Quote:
that's why i don't use cpu crunching at all: gpu time "costs" much much more in "flops"

This post (and a few others getting good results from other Fermis) started me thinking - Would my puny i3/gtx460/gtx460 results improve if I removed the 100% CPU crunching load entirely?

So I tried....

before (PCIe x16, 768MB)
GTX 460 -> 1x3000, 2x4800
after
GTX 460 -> 1x1600, 2x2900

before (PCIe x4, 768MB)
GTX 460 -> 1x4700, 2x8400
after
GTX 460 -> 1x2870, 2x5750

... and the answer is - oh yes.

Now I´m thinking what next...

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1695
Credit: 1043047637
RAC: 1369441

RE: RE: ... And now it

Quote:
Quote:
...
And now it doesn't want to switch back to X2 (.5)
...
For some reason my GeForce 660Ti didn't switch over to .25 and is still running cuda X2 at 44mins

Your first host wont switch back until they receive new BRP units, as the times went from 1 hour to 8 hours, then your cache is currently overcommited and BOINC is not going to ask more work for a while.
If you want to speed up this, you need to exit BOINC completely and then you need to edit the client_state.xml file. In this file you need to locate all the occurrences (should be only one) of the 0.25 and change it to 0.50 (indeed, find the current factor and change it for the desired one)

Just dont do an automatic replace, because if you do something wrong in this file you can loose all the assigned tasks or even you can left BOINC unussuable, and be sure to use a plain text editor. But dont be afraid to edit it, just be sure BOINC is not running and be carefull to not make a typo.

EDIT: obviously, if you set this file with a factor different from what you had in your Einstein settings page, it will be overriden as soon as it gets new work, so first change the Einstein prefferences in the web.

Thanks Horacio I will give that a try as soon as I get a few minutes.

Of course setting the preferences back to 0.5 was the first thing I did last night and it didn't fix the one host yet but the other 550Ti did switch back to 0.5 when I sent in it's finished tasks.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.