CUDA and openCL Benchmarks

Horacio

Joined: 3 Oct 11

Posts: 205

Credit: 80557243

RAC: 0

GTX690: 3x2800s. with 4 free

22 Nov 2012 17:45:49 UTC

Message 110209

(moderation:

)

GTX690: 3x2800s. with 4 free cores, 3x3300s. with 2 or 3 free cores and 3x3600s with just 1 core free...
(I mean 3 tasks in each GPU, that is 6 task in total as this is a double GPU)

I was not able to test for other numbers because this host only does Einstein when SETI has no work and to I had no time to test changing the utilization factors...
I have another host with a GTX680 and the times were the same also doing 3 tasks with 2 free cores. But It finished this Einstein tasks before I noticed it was running them so I have no idea if the times could be better with extra CPU cores reserved...

astrocrab

Joined: 28 Jan 08

Posts: 208

Credit: 429202534

RAC: 0

RE: GTX690: 3x2800s. with 4

22 Nov 2012 18:50:26 UTC

Message 110210 in response to message 110209

(moderation:

)

Quote:

GTX690: 3x2800s. with 4 free cores, 3x3300s. with 2 or 3 free cores and 3x3600s with just 1 core free...

that's why i don't use cpu crunching at all: gpu time "costs" much much more in "flops"

Horacio

Joined: 3 Oct 11

Posts: 205

Credit: 80557243

RAC: 0

RE: RE: GTX690: 3x2800s.

23 Nov 2012 4:39:47 UTC

Message 110211 in response to message 110210

(moderation:

)

Quote:

Quote:
GTX690: 3x2800s. with 4 free cores, 3x3300s. with 2 or 3 free cores and 3x3600s with just 1 core free...

that's why i don't use cpu crunching at all: gpu time "costs" much much more in "flops"

Yes, but it also depends on the relative speeds of GPUs and CPUs, on my main Eintein host having 2 560TIs reserving more than 2 cores doesnt improove the GPU times enough to compensate the lost due to the unused CPUs... (at least not RAC wise) and, even when I really want one of those certificates for discovering a pulsar, I want also to help with gravitational wave fishing and which is an exclusive CPU task...

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

RE: Yes, but it also

23 Nov 2012 14:55:29 UTC

Message 110212 in response to message 110211

(moderation:

)

Quote:

Yes, but it also depends on the relative speeds of GPUs and CPUs, on my main Eintein host having 2 560TIs reserving more than 2 cores doesnt improove the GPU times enough to compensate the lost due to the unused CPUs... (at least not RAC wise) and, even when I really want one of those certificates for discovering a pulsar, I want also to help with gravitational wave fishing and which is an exclusive CPU task...

on the contrary, my dual GTX 560 Ti is powered by a 6-core CPU, and i still see a substantial increase in BRP task efficiency when going from 2 free CPU cores to 3 free CPU cores...so much so that it more than makes up for any lost potential CPU task RAC/PPD. you have to remember it also depends on the projects you're running. pretty much anything run on a GPU is going to outperform the CPU in the flops department. but if i'm also running a project like LHC@Home SixTrack on the CPU at the same time, then the GPUs are going to outperform the CPU in both the flops department and the RAC/PPD department. in other words, if i were that concerned w/ RAC/PPD, i would have to run a CPU project that awards massive credits for each CPU task completed and validated in order to say that its actually worth it to sacrifice GPU performance for it. i'd rather it be the other way around - that is, i'd rather sacrifice CPU task allocation for better GPU task efficiency.

don't get me wrong, i see where you're coming from, and i like to contribute to the gravitational wave search as much as i like to search for pulsars...so when i'm actually running both BRP (GPU) and GW (CPU) applications, i run them on separate machines. now for someone who only has one machine, compromises will have to be made and efficiency sacrificed.

Horacio

Joined: 3 Oct 11

Posts: 205

Credit: 80557243

RAC: 0

RE: on the contrary, my

23 Nov 2012 15:29:24 UTC

Message 110213 in response to message 110212

(moderation:

)

Quote:

on the contrary, my dual GTX 560 Ti is powered by a 6-core CPU, and i still see a substantial increase in BRP task efficiency when going from 2 free CPU cores to 3 free CPU cores...so much so that it more than makes up for any lost potential CPU task RAC/PPD. you have to remember it also depends on the projects you're running. pretty much anything run on a GPU is going to outperform the CPU in the flops department.

In that host I crunch only for Einstein and I use it also for my normal work.
My numbers were that going from 2 free cores to 3 free cores, the GPU times went down no more than 1 minute which means that the 2 GPUs doing 2 task each will be able to do only one more BRP task per day which gives a gain of 500 credits, but each CPU core does 4 S6LV1 per day which gives around 1000 credits per day.
The CPU is an i7-860, and the motherboard sets the PCIe at 8x when there are 2 GPUS, of course the PCIe is version 2.0.

In my other Einstein host which is an 17-2600 with 2 GT430 each GPU needs 2.5 hours when I give 2 free Cores, and they cant go faster no matter how many more cores I give to them.

Anyway, my point was that, the beneffit of reserving cores is something that depends on several things system wide, so the best combination of free cores could be very different even for hosts that have the same model and number of GPUs.

MAGIC Quantum M...

Joined: 18 Jan 05

Posts: 1926

Credit: 1461986144

RAC: 1288212

Well I found out the running

23 Nov 2012 21:29:30 UTC

Message 110214

(moderation:

)

Well I found out the running cuda X4 on my AthlonII X4 630 Processor and GeForce GTX 550 Ti (1GB) was not a good idea.

It runs X2 @ approx. 54mins but when I tried X4 it started taking over 8 hours to finish.

And now it doesn't want to switch back to X2 (.5)

I finally just had to suspend about 150 tasks so it will only run X2 at the regular time of 54mins

Funny thing is I have a GeForce GTX 550 Ti (2GB) with PhenomII X3 720 having no problem running X4 at the regular time of approx. 51mins

So I hope it switches back to .5 before long so I don't have to check it every 54mins to start another pair of tasks.

For some reason my GeForce 660Ti didn't switch over to .25 and is still running cuda X2 at 44mins

Horacio

Joined: 3 Oct 11

Posts: 205

Credit: 80557243

RAC: 0

RE: ... And now it doesn't

23 Nov 2012 22:27:20 UTC

Message 110215 in response to message 110214

(moderation:

)

Quote:

...
And now it doesn't want to switch back to X2 (.5)
...
For some reason my GeForce 660Ti didn't switch over to .25 and is still running cuda X2 at 44mins

Your first host wont switch back until they receive new BRP units, as the times went from 1 hour to 8 hours, then your cache is currently overcommited and BOINC is not going to ask more work for a while.
If you want to speed up this, you need to exit BOINC completely and then you need to edit the client_state.xml file. In this file you need to locate all the occurrences (should be only one) of the 0.25 and change it to 0.50 (indeed, find the current factor and change it for the desired one)

Just dont do an automatic replace, because if you do something wrong in this file you can loose all the assigned tasks or even you can left BOINC unussuable, and be sure to use a plain text editor. But dont be afraid to edit it, just be sure BOINC is not running and be carefull to not make a typo.

EDIT: obviously, if you set this file with a factor different from what you had in your Einstein settings page, it will be overriden as soon as it gets new work, so first change the Einstein prefferences in the web.

Claggy

Joined: 29 Dec 06

Posts: 560

Credit: 2747410

RAC: 1637

On my i7-2600K host @4.7GHz,

23 Nov 2012 22:50:57 UTC

Message 110216

(moderation:

)

On my i7-2600K host @4.7GHz, using a PCI-E 2.0 x8 connection, my factory overclocked GTX460 running 310.33 drivers does v1.32 BRP4 Cuda tasks in around 1,550 seconds, half of what was reported earlier for a GTX460,

Claggy

AgentB

Joined: 17 Mar 12

Posts: 915

Credit: 513211304

RAC: 0

RE: that's why i don't use

23 Nov 2012 23:19:55 UTC

Message 110217 in response to message 110210

(moderation:

)

Quote:

that's why i don't use cpu crunching at all: gpu time "costs" much much more in "flops"

This post (and a few others getting good results from other Fermis) started me thinking - Would my puny i3/gtx460/gtx460 results improve if I removed the 100% CPU crunching load entirely?

So I tried....

before (PCIe x16, 768MB)
GTX 460 -> 1x3000, 2x4800
after
GTX 460 -> 1x1600, 2x2900

before (PCIe x4, 768MB)
GTX 460 -> 1x4700, 2x8400
after
GTX 460 -> 1x2870, 2x5750

... and the answer is - oh yes.

Now IÂ´m thinking what next...

MAGIC Quantum M...

Joined: 18 Jan 05

Posts: 1926

Credit: 1461986144

RAC: 1288212

RE: RE: ... And now it

24 Nov 2012 1:04:08 UTC

Message 110218 in response to message 110215

(moderation:

)

Quote:

Quote:
...
And now it doesn't want to switch back to X2 (.5)
...
For some reason my GeForce 660Ti didn't switch over to .25 and is still running cuda X2 at 44mins

Your first host wont switch back until they receive new BRP units, as the times went from 1 hour to 8 hours, then your cache is currently overcommited and BOINC is not going to ask more work for a while.
If you want to speed up this, you need to exit BOINC completely and then you need to edit the client_state.xml file. In this file you need to locate all the occurrences (should be only one) of the 0.25 and change it to 0.50 (indeed, find the current factor and change it for the desired one)

Just dont do an automatic replace, because if you do something wrong in this file you can loose all the assigned tasks or even you can left BOINC unussuable, and be sure to use a plain text editor. But dont be afraid to edit it, just be sure BOINC is not running and be carefull to not make a typo.

EDIT: obviously, if you set this file with a factor different from what you had in your Einstein settings page, it will be overriden as soon as it gets new work, so first change the Einstein prefferences in the web.

Thanks Horacio I will give that a try as soon as I get a few minutes.

Of course setting the preferences back to 0.5 was the first thing I did last night and it didn't fix the one host yet but the other 550Ti did switch back to 0.5 when I sent in it's finished tasks.

CUDA and openCL Benchmarks

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner