CUDA and openCL Benchmarks

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160342159
RAC: 0

RE: Thanks Horacio I will

Quote:

Thanks Horacio I will give that a try as soon as I get a few minutes.

Of course setting the preferences back to 0.5 was the first thing I did last night and it didn't fix the one host yet but the other 550Ti did switch back to 0.5 when I sent in it's finished tasks.


short of taking the client_state.xml short cut, we can no longer expect the GPU utilization factor to change instantaneously. that same "instantaneous" short cut used to exist in the app_info.xml file (as the n parameter), but ever since E@H got rid of the need for an app_info.xml file several months ago, it can only be changed through the client_state.xml file or through your web preferences. if you don't use the client_state.xml trick, then any amount of tasks that get downloaded after you changed your GPU utilization factor to 0.25 via your web preferences will run 4 at a time. so even though you changed the GPU utilization factor back to 0.5, any BRP tasks in the queue before you changed it back will continue to to run 4 at a time (unless you change the parameter manually in the client_state.xml as Horacio suggested).

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160342159
RAC: 0

RE: RE: that's why i

Quote:
Quote:
that's why i don't use cpu crunching at all: gpu time "costs" much much more in "flops"

This post (and a few others getting good results from other Fermis) started me thinking - Would my puny i3/gtx460/gtx460 results improve if I removed the 100% CPU crunching load entirely?

So I tried....

before (PCIe x16, 768MB)
GTX 460 -> 1x3000, 2x4800
after
GTX 460 -> 1x1600, 2x2900

before (PCIe x4, 768MB)
GTX 460 -> 1x4700, 2x8400
after
GTX 460 -> 1x2870, 2x5750

... and the answer is - oh yes.

Now I´m thinking what next...


since you already know what your GPU task run times are when 100% and 0% of the CPU is allocated to CPU crunching, you should run the same test at 75%, 50%, and 25% CPU just to make sure you aren't leaving any compute performance on the table. going from 100% CPU crunching to 0% CPU crunching doesn't give you the whole picture. your GPU task run times may be just as good or only marginally worse w/ less free CPU cores available, but you'll never know without testing. who knows, you might be able to run a CPU task or two without sacrificing GPU efficiency.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: short of taking the

Quote:

short of taking the client_state.xml short cut, we can no longer expect the GPU utilization factor to change instantaneously. that same "instantaneous" short cut used to exist in the app_info.xml file (as the n parameter), but ever since E@H got rid of the need for an app_info.xml file several months ago, it can only be changed through the client_state.xml file or through your web preferences. if you don't use the client_state.xml trick, then any amount of tasks that get downloaded after you changed your GPU utilization factor to 0.25 via your web preferences will run 4 at a time. so even though you changed the GPU utilization factor back to 0.5, any BRP tasks in the queue before you changed it back will continue to to run 4 at a time (unless you change the parameter manually in the client_state.xml as Horacio suggested).

Sorry Sunny but that's not correct.
See Richards post over here

Richard wrote:

Quote:

The number of WUs to run at once is specified via a setting in the segment of client_state.xml, exactly as it would be with an app_info.xml file - check client_state and sched_reply for confirmation.

Where Bikeman is right is in saying that the new data following a change is only transferred from the server to your host when new work is being allocated. Once received, however, it applies to all tasks - including tasks previously cached - assigned to the same plan_class.


At least this is how it's worked when I've changed that setting.

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160342159
RAC: 0

RE: Sorry Sunny but that's

Quote:

Sorry Sunny but that's not correct.
See Richards post over here

Richard wrote:

Quote:

The number of WUs to run at once is specified via a setting in the segment of client_state.xml, exactly as it would be with an app_info.xml file - check client_state and sched_reply for confirmation.

Where Bikeman is right is in saying that the new data following a change is only transferred from the server to your host when new work is being allocated. Once received, however, it applies to all tasks - including tasks previously cached - assigned to the same plan_class.


At least this is how it's worked when I've changed that setting.


my apologies for posting inaccurate information.

things appeared to be working exactly as i described them on my hosts, but i think i now know why. you see, i i've been swapping GPUs between all my hosts lately in an effort to find the most efficient combination of hardware, and i've been fiddling w/ the GPU utilization factors as well. but instead of leaving my work buffer size alone, i would typically reduce it to 0.1 days before switching GPUs and downloading new work. b/c i probably had more than 0.1 days of work in the queue each time i reduced the work buffer size to 0.1 days, new work was not getting downloaded, let alone being scheduled and allocated to my hosts. so if things actually work the way Richard described them, that means that even if i manually updated the project to transfer the new settings to my host(s), any tasks that were in the queue before i updated the GPU utilization factor would not run at the new factor, and would continue to run at the old factor until the queue dwindled down to less than 0.1 days worth of work. in other words, had i not reduced my work buffer size prior to each time i swapped GPUs and changed the GPU utilization factor, i would have been allocated new work right away, and would have seen any existing tasks in the queue start crunching at the new factor as soon as that newly allocated work got downloaded to my host(s)...

dskagcommunity
dskagcommunity
Joined: 16 Mar 11
Posts: 89
Credit: 1219701683
RAC: 11055

AMD/ATI: (colored are

AMD/ATI: (colored are optimized >=1.28 app values, defined by Petrion)
HD 7970 ----> 1x~650, 2x~950, 4x~1,800, 5x~2,200
HD 7950 ----> 3x~1860
HD 7950 ----> 1x 1,145
HD 7870
HD 7850
HD 7770 ----> 1x~1960, 2x~3600
HD 7750 ------> 2x~11,000
HD 5870 ------> 2x~3,105
HD 5850 ------> 1x 1,800, 2x 6,085
HD 5830 ------> 1x 2,916
HD 6970
HD 6950(1536)-> 2x 6700
HD 6950 ------> 2x 3,500
HD 6990
HD 6870
HD 5970
HD 6850 ------> 1x~2,300
HD 6850 ------> 1x~2,359
HD 6790
HD 5770 ------> 1x 7,750+
HD 6770
HD 5670 ------> 1x 11,100
HD 5570 ------> 1x~15,000
HD 5450 ------> 1x~36,500!

AMD A8 3870 -> 1x 6,489

NVIDIA: (colored are optimized >=1.28 app values, defined by Petrion)
GTX 690 -----> 3x2800
GTX 590
GTX 680 ------> 1x~750
GTX 680 ------> 3x 3,100(Win7)
GTX 680 -----> 2x 1,945(Linux)
GTX 580 ------> 1x 834, 3x~2,500
GTX 580 ------> 3x 3,350(Windows)
GTX 580 -----> 3x 3,050(Linux)
GTX 670 ------> 3x~4,300(vista)
GTX 660Ti ----> 1x~1,180, 2x~2,170
GTX 660Ti ----> 1x~1,700, 2x~2,900, 3x~4,500, 4x~6,030, 5x~8,660, 6x~12,760
gtx650 ----> 1x2630 sec, 2x4340 sec
GTX 650 Ti ----> 3x ~ 5900 (Linux ,PCIe 2)
GTX 570
GTX 670
GTX 480 ------> 2x~2,200
GTX 470 ------> 2x~3,000, 3x 3,800
GTX 560 [448] -> 1x 1,550, 2x 2,500
gtx 560 TI ----> 2x2030
GTX 560 Ti ----> 1x~1,100, 2x 2,654, 6x 6,400
GTX 560 Ti ----> 1x~1,100, 2x 2,000, 4x 4,100, 5x 5,200
GTX 560 ------> 2x 2,300
GTX 560 ------> 1x 3,300, 2x 4800
GTX 460 -> 1x1600, 2x2900
GTX 465
GTX 550 Ti ---> 1x 1,793, 2x 2,961
GT 640 -------> 1x~5,700
GT 440
GTS 450 ----> 1x~2,200, 2x 4,200
GF 610M ------> 1x~7,800
GT 430 -------> 2x 9,100
GT 430 -------> 1* 4860
GT 520 -------> 1x~9,600(Linux)

FirePro V4800-> 1x 10,620

Older cards (not openCL v1.1 capable) but still interesting comparison:
GT 295 -------> 1x 2,000(Linux)
GTX 285 ----> 2*3000
GTX 260 ----> 1*2200
8800GT G92 ---> 1x 2,940(Linux)
8800GT G92 ---> 1x 3,600(Linux)
8800GTS G80 --> 1x 4,020(Linux)
GTS 250 ------> 2x~5,484
GT 240 ------> 1x 4,035(OC'd)
GT 240 -------> 1x~4,500
GT 240 ----> 1x~5,400, 2x 10,500
GT 240 ----> 1x~3460 (Linux)
GT 220 -------> 2x 19,400[/b]

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 515493535
RAC: 393427

HD7990 available 4096 stream

HD7990 available
4096 stream processors, 3x8 pin power connector!
theoretical performance: 12 wu's / hr , 144000 credits/day with 6 free cores.

astrocrab
astrocrab
Joined: 28 Jan 08
Posts: 208
Credit: 429202534
RAC: 0

two 7970 are faster and

two 7970 are faster and cheaper

dskagcommunity
dskagcommunity
Joined: 16 Mar 11
Posts: 89
Credit: 1219701683
RAC: 11055

AMD/ATI: (colored are

AMD/ATI: (colored are optimized >=1.28 app values, defined by Petrion)
HD 7970 ----> 1x~650, 2x~950, 4x~1,800, 5x~2,200
HD 7950 ----> 3x~1860
HD 7950 ----> 1x 1,145
HD 7870
HD 7850

........

Soo thats over, i made now a Excel Sheet (visible as PDF here) based on all the Values i posted before.
I deleted all old entries, from before 1.28, so everybody who wants to buy a new (or ebay ;)) card can inform himself in this sheet with the nearly real Values he will get :)

http://www.dskag.at/images/Research/EinsteinGPUperformancelist.pdf

Have fun ^^

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2747472
RAC: 1576

Don't forget the Nvidia GPUs

Don't forget the Nvidia GPUs are running a Cuda app, not an OpenCL app, so using the 'No Open CL 1.1' description for Legacy Nvidia GPUs is pointless,

Which also makes talk about Nvidia completion times Off Topic for this thread, perhaps the Mods can change the thread Title to 'Cuda and OpenCL Benchmarks' ;-)

But again, Cruncher's Corner's description is 'Credit, leaderboards, CPU performance' making GPU performance Off Topic, perhaps the admins can make the description 'Credit, leaderboards, CPU performance and GPU performance' ;-)

Claggy

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 777989645
RAC: 1202134

RE: Which also makes talk

Quote:


Which also makes talk about Nvidia completion times Off Topic for this thread, perhaps the Mods can change the thread Title to 'Cuda and OpenCL Benchmarks' ;-)

Done :-). I hope the thread starter doesn't mind, but it's really a more appropriate title.

Cheers
HB

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.