CUDA and openCL Benchmarks

Bigeagle
Bigeagle
Joined: 3 Apr 07
Posts: 12
Credit: 51883053
RAC: 75396

i've searched a while for

i've searched a while for some information like this
now i can add some
7870 running arecibo 1.32 opencl
1x 18minutes (about 1100s)
2x 29minutes (about 1800s) (14.5m per WU)
3x 46minutes (about 2800s) (15.33m per WU)

gpu usage measured via MSI Afterburner, on 3 tasks still at 60% with 90% peaks, there seems to be not much difference to 2 tasks
cpu is an 3570k slightly overclocked at 4,2ghz and one core free for gpu feeding, without this the performance is really bad, about 2.5 hours for one task (compared to 18 minutes that is really a bad out of the box performance i think)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118267276246
RAC: 24813265

RE: i've searched a while

Quote:
i've searched a while for some information like this
now i can add some
7870 running arecibo 1.32 opencl
1x 18minutes (about 1100s)
2x 29minutes (about 1800s) (14.5m per WU)
3x 46minutes (about 2800s) (15.33m per WU)


You should have asked sooner :-).

For quite a while, E@H has had a special project preference called 'GPU utilisation factor' which allows you to run multiple GPU tasks simultaneously. You don't need to have a 'bleeding edge' version of BOINC to use it either. With the app config feature BOINC is catching up with what Bernd had put in place for multiple simultaneous GPU tasks. It does add the ability to tweak and as well but I don't think that will be of much benefit until tha opencl-ati app improves further - at which point the Devs will tweak the default values anyway.

Quote:
gpu usage measured via MSI Afterburner, on 3 tasks still at 60% with 90% peaks, there seems to be not much difference to 2 tasks


With 2 tasks, BOINC would have automatically kept one CPU core free. When you changed to 3x, the total requirement would have changed to 1.5 so that there would have been still only 1 CPU core kept free. Did you try changing prefs to free up a second core? If you do that I would guess that you might see a further improvement in GPU utilisation and a reduction in crunch time. Probably not enough improvement to compensate for the loss of a CPU core to crunch CPU tasks but you don't know until you try :-).

Quote:
cpu is an 3570k slightly overclocked at 4,2ghz and one core free for gpu feeding, without this the performance is really bad, about 2.5 hours for one task (compared to 18 minutes that is really a bad out of the box performance i think)


Slightly overclocked :-). I know that overclock is very easy to achieve but I would hardly call it "slight" :-). It's a very decent overclock indeed and if they can get a bit cheaper, it would make them quite attractive for future budget builds.

EDIT: I've just noticed that you are using BOINC 7.0.28 which means you can't be using the app configuration feature described by Jord since you need the latest 7.0.42 for that. So are you using the special GPU utilisation pref setting after all? Because your message immediately followed Jord's report on the app config feature, I assumed you were using what he had just described. I should have checked your host before making that assumption :-).

Cheers,
Gary.

Bigeagle
Bigeagle
Joined: 3 Apr 07
Posts: 12
Credit: 51883053
RAC: 75396

sorry if my post caused

sorry if my post caused misunderstanding, i wasn't referring to the app_config variant, but the start post/topic in general
i only use the settings in the project preferences itself, since i had much problems to get an app_info.xml working in other projects, i don't use them anymore, it just produces too much errors if i had to check them permanently for correctness

mhm, i hope you understand what i want to say, i'm afraid my english is not the best

for giving 2 cores to gpu feeding, i just forget about that, i will try it again with 2 cores, maybe even 4 tasks are possible without filling the vram

4.2 ghz i would call slightly overclocked, because the single cores are guaranteed to work at 3.8, so it is only about 10% more clock speed at effectively the same voltage my cpu use with turbo boost activated. as long as one does not have to give more voltage to the cores i would call the overclocking 'slightly', or am i wrong?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118267276246
RAC: 24813265

RE: sorry if my post caused

Quote:
sorry if my post caused misunderstanding, i wasn't referring to the app_config variant, but the start post/topic in general


That's quite OK. My fault - I made a bad assumption which I later corrected when I looked closely at your BOINC version.

Quote:
i only use the settings in the project preferences itself, since i had much problems to get an app_info.xml working in other projects, i don't use them anymore, it just produces too much errors if i had to check them permanently for correctness


Yes, I quite agree. Anonymous platform (AP) can be tricky. Especially when you want to back out of it. It also requires a lot of manual intervention every time the app version changes. That's why I'm so interested in app config. No changes to be made when the app version changes.

Quote:
mhm, i hope you understand what i want to say, i'm afraid my english is not the best


Your English is fine - far better than my non-existent German. I'm sure I understand you perfectly.

Quote:
for giving 2 cores to gpu feeding, i just forget about that, i will try it again with 2 cores, maybe even 4 tasks are possible without filling the vram


When you try with 2 free cores, please post your findings. It would be very interesting to know, for a 7870, if you get a significant improvement when running 3x. Your GPU has 2GB RAM which is enough to run 4x. You would automatically have 2 free cores for this. It would be useful to see if adding a third free core made any further improvement. Most people are using NVIDIA GPUs so good data about AMD is not very common in this thread.

Quote:
4.2 ghz i would call slightly overclocked, because the single cores are guaranteed to work at 3.8, so it is only about 10% more clock speed at effectively the same voltage my cpu use with turbo boost activated. as long as one does not have to give more voltage to the cores i would call the overclocking 'slightly', or am i wrong?


You are not really wrong but it would probably be more correct to describe the overclock as 'moderate' or 'medium' rather than 'slight'. 'Slight' means 'quite small' or 'hardly noticeable'. If you increased the stock speed from 3.4GHz to say 3.6 - 3.7GHz, you could call that slight. 4.2GHz is very noticeable :-). After all, even some professional overclockers using voltage and fancy cooling solutions sometimes have trouble getting past 4.6-4.8GHz.

In any case, I was really just making a joke - there was absolutely no criticism intended :-).

Cheers,
Gary.

dskagcommunity
dskagcommunity
Joined: 16 Mar 11
Posts: 89
Credit: 1219701683
RAC: 65713

Updated list after 2 Weeks

Updated list after 2 Weeks without change ;)

http://www.dskag.at/images/Research/EinsteinGPUperformancelist.pdf

Happy new Year :)

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Bigeagle
Bigeagle
Joined: 3 Apr 07
Posts: 12
Credit: 51883053
RAC: 75396

a little bit strange the

a little bit strange the behavior of BRPS
on 7870, running at 1150MHz chip clock, 1300 MHz RAM clock
3570k running at 4,2GHz

3x need about 45 minutes per WU
4x needs about an hour, but varying from 3237s (http://einsteinathome.org/workunit/142484182) up to 3700s, but beside really good runs and disturbed ones i see relatively stable 59minutes wich makes 3540s

if i ignore the 50MHz higher clock since the 1x and 2x tries i get
1x 1080s per WU
2x 880s per WU
3x 900s per WU
4x 885s per WU

as i don't calculate the accurate average and disturb the 'benchmarks' with silly things like using the computer by myself i would say that 2 parallel tasks are enough to make good, maybe optimal use of the gpu while it is the easiest setting on my system, occupies one core and not two

interesting is that two cores for gpu give no advantage compared to one core, i use the tool 'process lasso' to bind gpu tasks fixed to one core and exclude most of the tasks on my pc from that core because i have found that the 'core hopping' often eats decent performance for absolutely nothing
as far as i can see every process that don't uses multiple cores at the same time profits from fixed core binding under win7 (prof)
seems to be a really awful scheduler that ignores the fact that cpu cache hits are better than cache misses
unfortunately i have no idea how to bind each application to a specific core, i can only get an exclusive core for gpu applications
it would be interesting if there is ANY single core application that profits from the ability to hop from one core to another

edit:
hey, seems like there is something positive about ati
the 7870 seems to be faster than 660OC ;)

NikolaG
NikolaG
Joined: 16 Jan 09
Posts: 1
Credit: 816041
RAC: 0

HD6790 1Gb DDR5 (with AMD

HD6790 1Gb DDR5 (with AMD Phenom II 945) gets 1 wu in about 2900 sec (2850-2950).

My AMD gets one BRP4SSE in 124,500sec. One wu for pulsar search#2 is done in 15,200 sec.

oz
oz
Joined: 28 Feb 05
Posts: 7
Credit: 54902288
RAC: 0

Hi, New Linux Driver Catalyst

Hi,
New Linux Driver Catalyst 13.1 runtime decrease ~ 10%! (AMD-ATI-5770 Juniper)
12.10 runtime ~ 4300
13.1 runtime ~ 3900

Nobody316
Nobody316
Joined: 14 Jan 13
Posts: 141
Credit: 2008126
RAC: 0

1 unit 1,863.28s to

1 unit 1,863.28s to 1,927.96s. 2 units 3,423.72s to 3,356.81s. System specs in signature. 7 cpu's working 1 free core. GPU not overclocked. So 1 unit is around 34 mins and 2 units gets done around 58 mins. 1 and 2 units run the same GPU Load and Memory Load. "roughly very little difference"

PC setup MSI-970A-G46 AMD FX-8350 8 core OC'd 4.45GHz 16GB ram PC3-10700 Geforce GTX 650Ti Windows 7 x64 Einstein@Home

Nobody316
Nobody316
Joined: 14 Jan 13
Posts: 141
Credit: 2008126
RAC: 0

sorry can't seem to edit my

sorry can't seem to edit my last post now... :( I am testing the 306.97 driver now and so far it seems about a min. faster at times. Will post more after some more testing and times.

PC setup MSI-970A-G46 AMD FX-8350 8 core OC'd 4.45GHz 16GB ram PC3-10700 Geforce GTX 650Ti Windows 7 x64 Einstein@Home

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.