Einstein@Home GPU Application for ATI/AMD Graphics Cards

Submitted on 15 May 2012 12:21:17 UTC

After more than a year of work by Oliver Bock, Bernd Machenschalk, Heinz-Bernd Eggenstein and other developers, we are pleased to announce the release of the first Einstein@Home application for ATI/AMD Graphics Cards.

This OpenCL application, which searches Arecibo data for new radio pulsars, is about a factor of ten faster than the same search running on a typical CPU. The application is currently available for Windows and Linux computers with Radeon HD 5000 or better graphics cards. We hope to have a version for Macintosh (Apple OS X 10.8, Mountain Lion) sometime this summer, but there are still some problems that need to be fixed or worked around.

Volunteers who wish to run this application will need to install version 7.0.27 or later of the BOINC client. Please see this thread for more information, or if you want to ask questions.

Many thanks to the AMD/ATI team for their support in the OpenCL software development effort.

Bruce Allen
Director, Einstein@Home

Comments

5pot

Joined: 8 Apr 12

Posts: 107

Credit: 7577619

RAC: 0

Einstein@Home GPU Application for ATI/AMD Graphics Cards

15 May 2012 13:14:17 UTC

Message 109501

Quote

(moderation:

)

SP I would presume?

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 800821269

RAC: 1220955

SP indeed. No support is

15 May 2012 13:17:26 UTC

Message 109502 in response to message 109501

Quote

(moderation:

)

SP indeed. No support is offered for HD 4xxx generation cards (without support for OpenCL 1.1), anything more recent should do fine.

Cheers
HB

Vit

Joined: 7 Jan 08

Posts: 23

Credit: 393350695

RAC: 0

Seems like it is time to buy

15 May 2012 20:54:33 UTC

Message 109503

Quote

(moderation:

)

Seems like it is time to buy AMD-card! :) (had no reasons to buy card before and still use Sandy Bridge HD3000)
Is it possible to take a look at some comparison charts for compute power of typical CPU and GPU for Einstein@Home-type of calculations?

5pot

Joined: 8 Apr 12

Posts: 107

Credit: 7577619

RAC: 0

CPU does not even compare.

15 May 2012 21:34:26 UTC

Message 109504

Quote

(moderation:

)

CPU does not even compare. CPUs are also incredibly inefficient compared to GPU. NVIDIA.I believe they said is currently 20x faster, and currently AMD will be 10x faster. I would believe this number will increase as changes are made IMHO.

Michael Becker

Joined: 15 Jul 05

Posts: 3

Credit: 10398405

RAC: 0

thx for the ati/amd app, it

15 May 2012 21:50:31 UTC

Message 109505 in response to message 109504

Quote

(moderation:

)

thx for the ati/amd app, it works on my hd5870
my first result:http://einsteinathome.org/task/288520169
but the gpu-load is only ~60%, one cpu-core is only for gpu-tasks (cpu: i7-2600k)
should i set 'GPU utilization factor' to 0,5?
on my second machine is a 560ti running, there i have the best results with 'GPU utilization factor' 0,33

michel

5pot

Joined: 8 Apr 12

Posts: 107

Credit: 7577619

RAC: 0

Would assume the same

15 May 2012 21:54:03 UTC

Message 109506

Quote

(moderation:

)

Would assume the same applies, my 680 also runs at around 60% with one applied, and around 90% with .33 set.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 800821269

RAC: 1220955

Nice! Yes, I would

15 May 2012 22:55:55 UTC

Message 109507

Quote

(moderation:

)

Nice!

Yes, I would encourage experiments with the utilization factor. You cn use different "venues" in BOINC-speak to assign different settings to different hosts.

CU
HB

TRuEQ & TuVaLu

Joined: 11 Sep 06

Posts: 2

Credit: 33588262

RAC: 0

It

16 May 2012 12:45:26 UTC

Message 109508

Quote

(moderation:

)

It works!!!

http://einsteinathome.org/host/5353241/tasks

I run it with the "dangerous" option of 0.5 2tasks at once.
And it runs 1 + 1 on the GPU together with Milkyway, SETI, Primegrid and POEM Which are also using 0.5 in the app_info.xml file.

dskagcommunity

Joined: 16 Mar 11

Posts: 89

Credit: 1219701683

RAC: 1859

RE: SP indeed. No support

16 May 2012 13:33:43 UTC

Message 109509 in response to message 109502

Quote

(moderation:

)

Quote:

SP indeed. No support is offered for HD 4xxx generation cards (without support for OpenCL 1.1), anything more recent should do fine.

Cheers
HB

So i think 4xxx will never be supported? or only @ the beginning now?

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4345

Credit: 252817542

RAC: 41160

RE: RE: SP indeed. No

16 May 2012 13:39:53 UTC

Message 109510 in response to message 109509

Quote

(moderation:

)

Quote:

Quote:
SP indeed. No support is offered for HD 4xxx generation cards (without support for OpenCL 1.1), anything more recent should do fine.

So i think 4xxx will never be supported? or only @ the beginning now?

I don't think we will make an OpenCL 1.0 App, at least not for BRP4. It would be another code branch to maintain and it would almost double the memory requirements, thus thus a task would not fit in 512MB.

Also I doubt that the limited computing power of the 4xxx would gain us much.

dskagcommunity

Joined: 16 Mar 11

Posts: 89

Credit: 1219701683

RAC: 1859

ok i see, where does this

16 May 2012 15:57:35 UTC

Message 109511

Quote

(moderation:

)

ok i see, where does this 512MB limit comes from? OPENCL 1.0?

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4345

Credit: 252817542

RAC: 41160

This is not a hard limit,

16 May 2012 19:15:59 UTC

Message 109512 in response to message 109511

Quote

(moderation:

)

This is not a hard limit, just a target that we set to get the most of our population of ATI cards.

dskagcommunity

Joined: 16 Mar 11

Posts: 89

Credit: 1219701683

RAC: 1859

Hm are there so much 5xxx or

17 May 2012 11:03:43 UTC

Message 109513

Quote

(moderation:

)

Hm are there so much 5xxx or higher with 512MB?

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 987

Credit: 25171438

RAC: 0

Yes, in fact, all of them

17 May 2012 12:15:13 UTC

Message 109514 in response to message 109513

Quote

(moderation:

)

Yes, in fact, all of them :-)

Oliver

Einstein@Home Project

dskagcommunity

Joined: 16 Mar 11

Posts: 89

Credit: 1219701683

RAC: 1859

I ment ONLY 512MB ;) ;) But i

17 May 2012 12:37:30 UTC

Message 109515

Quote

(moderation:

)

I ment ONLY 512MB ;) ;) But i see half of the cards could possible have only 512MB. Hm sad.

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 800821269

RAC: 1220955

Hi! As Bernd mentioned

17 May 2012 14:14:52 UTC

Message 109516 in response to message 109515

Quote

(moderation:

)

Hi!
As Bernd mentioned already, more than 512 MB would only be needed for a second code branch that would be able to support OpenCL1.0. It's mostly the 4xxx cards that would benefit from support of OpennCL1.0, not the HD 5000 series. So the question would be: are there many 4000-series cards with just 512 MB? Those were popular when? 2008? 2009?. More than 512 MB video RAM wasn't the norm back then. So by supporting OpenCL1.0, we would be able to utilize only a certain fraction of the already shrinking population of older cards, to get a not-so-great performance per card ===> it just doesn't make too much sense.

Cheers
HB

TRuEQ & TuVaLu

Joined: 11 Sep 06

Posts: 2

Credit: 33588262

RAC: 0

I'll leave my 4850 to do

17 May 2012 14:43:49 UTC

Message 109517

Quote

(moderation:

)

I'll leave my 4850 to do Milkyway and Collatz for as long as it lives.
I can't put any better card in that computor since the PCIE-Express card is only 1.x something.....

noderaser

Joined: 9 Feb 05

Posts: 50

Credit: 123204885

RAC: 670200

My HD 4870 with 1 GB (and a

18 May 2012 2:04:23 UTC

Message 109518

Quote

(moderation:

)

My HD 4870 with 1 GB (and a GeForce 320M with 256 MB) is eagerly awaiting something cooler than boring math projects. No dice as of yet.

Click Here to see My Detailed BOINC Stats

dskagcommunity

Joined: 16 Mar 11

Posts: 89

Credit: 1219701683

RAC: 1859

noderaser: use it like im do

18 May 2012 7:28:08 UTC

Message 109519

Quote

(moderation:

)

noderaser: use it like im do with my 4850 with 1GB onto POEM (when you dont like MW). Possible Three WUs @ once, 84000 Credits/day. Thx god they supporting OpenCL since a short time. Its medical research

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Dr.Alexx

Joined: 14 Aug 05

Posts: 22

Credit: 5135173

RAC: 1

The BOINC site is unavailable

18 May 2012 12:44:01 UTC

Message 109520

Quote

(moderation:

)

The BOINC site is unavailable for 2 days! Cannot download new client! Can somewone send 64bit Windows client ver 27 to kido00 (a t) ya.ru ?

tullio

Joined: 22 Jan 05

Posts: 2118

Credit: 61407735

RAC: 0

RE: The BOINC site is

18 May 2012 12:59:18 UTC

Message 109521 in response to message 109520

Quote

(moderation:

)

Quote:

The BOINC site is unavailable for 2 days! Cannot download new client! Can somewone send 64bit Windows client ver 27 to kido00 (a t) ya.ru ?

There has been a power failure at Berkeley due to a shorted underground cable. It has been repaired but the servers are still down.
Tullio

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 987

Credit: 25171438

RAC: 0

RE: The BOINC site is

18 May 2012 13:07:14 UTC

Message 109522 in response to message 109520

Quote

(moderation:

)

Quote:

The BOINC site is unavailable for 2 days! Cannot download new client! Can somewone send 64bit Windows client ver 27 to kido00 (a t) ya.ru ?

Please see main thread...

Einstein@Home Project

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4345

Credit: 252817542

RAC: 41160

Local copies of 7.0.28 are

18 May 2012 13:30:47 UTC

Message 109523

Quote

(moderation:

)

Local copies of 7.0.28 are linked here.

`COMMUNISTIS G....

Joined: 14 Feb 12

Posts: 11

Credit: 73208

RAC: 0

[img][/img][url]ok george

18 May 2012 18:02:17 UTC

Message 109524 in response to message 109518

Quote

(moderation:

)

[img][/img][url]ok george kalemakis.[/url]

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 520873483

RAC: 257597

RE: Seems like it is time

18 May 2012 19:58:30 UTC

Message 109525 in response to message 109503

Quote

(moderation:

)

Quote:

Seems like it is time to buy AMD-card! :) (had no reasons to buy card before and still use Sandy Bridge HD3000)
Is it possible to take a look at some comparison charts for compute power of typical CPU and GPU for Einstein@Home-type of calculations?

You can find some performance figures here: http://albert.phys.uwm.edu/forum_thread.php?id=8888&nowrap=true#112053
A HD6950 runs 1% /min @ 2 wu's concurrent.

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 520873483

RAC: 257597

Yesterday I attached my

19 May 2012 9:30:55 UTC

Message 109526

Quote

(moderation:

)

Yesterday I attached my mainsys with 2 AMD-cards to the project.
Until now 12 wu's were done, 4 validated (2 against AMD, 1 against Cuda and one against SSE).
Looks like the team has done a wonderful job! THX!

(retired account)

Joined: 28 Sep 11

Posts: 16

Credit: 7357648

RAC: 0

Thanks to everyone involved

19 May 2012 15:34:39 UTC

Message 109527

Quote

(moderation:

)

Thanks to everyone involved in the development of the AMD/OpenCL app!

I'm still waiting for the first result to validate, but so far so good on a HD 7950 with Win8 preview:

http://einsteinathome.org/task/289196474

BOINC 7.0.27 (x64)
Catalyst 12.4 (installed in Win7 comp. mode)
Windows 8 Dev. Preview x64

Performance not overwelming yet, appears to be only ~9% faster than my GTX 560 Ti, but since this is the first public release (and in my case not the real win8 driver), who knows what is still to come ... :-)

Regards

Mark my words and remember me. - 11th Hour, Lamb of God

Vit

Joined: 7 Jan 08

Posts: 23

Credit: 393350695

RAC: 0

RE: RE: Seems like it is

19 May 2012 21:17:27 UTC

Message 109528 in response to message 109525

Quote

(moderation:

)

Quote:

Quote:
Seems like it is time to buy AMD-card! :) (had no reasons to buy card before and still use Sandy Bridge HD3000)
Is it possible to take a look at some comparison charts for compute power of typical CPU and GPU for Einstein@Home-type of calculations?

You can find some performance figures here: http://albert.phys.uwm.edu/forum_thread.php?id=8888&nowrap=true#112053
A HD6950 runs 1% /min @ 2 wu's concurrent.

I just checked: I have 0.6% /min @ 3 concurrent einstein tasks at i7-2600K 4.5GHz. So is possible average GPU-acceleration(AMD or NV) in my case just about 2 times?..

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 520873483

RAC: 257597

RE: RE: RE: Seems like

19 May 2012 22:04:40 UTC

Message 109529 in response to message 109528

Quote

(moderation:

)

Quote:

Quote:
Quote:
Seems like it is time to buy AMD-card! :) (had no reasons to buy card before and still use Sandy Bridge HD3000)
Is it possible to take a look at some comparison charts for compute power of typical CPU and GPU for Einstein@Home-type of calculations?

You can find some performance figures here: http://albert.phys.uwm.edu/forum_thread.php?id=8888&nowrap=true#112053
A HD6950 runs 1% /min @ 2 wu's concurrent.

I just checked: I have 0.6% /min @ 3 concurrent einstein tasks at i7-2600K 4.5GHz. So is possible average GPU-acceleration(AMD or NV) in my case just about 2 times?..

You need to compare the same types of wu's. AMD-wu's are BRP(Arecibo) wu's (500 credits).
2 concurrent means that 2 wu's are running simultanous on one GPU; my PC finishes 2 wu's every 1:45 on the HD6950 and 2 wu's every 2:45 on the HD5850 (no overclocking).
CPU is i7-860 @ 2.8GHz, win7-64
Midrange HD7xxx should perform better.

(retired account)

Joined: 28 Sep 11

Posts: 16

Credit: 7357648

RAC: 0

RE: I'm still waiting for

20 May 2012 10:25:16 UTC

Message 109530 in response to message 109527

Quote

(moderation:

)

Quote:

I'm still waiting for the first result to validate,

Three results have been validated ok. And there are noticeable run time differences between all results. I guess it is not the amount of calculations which is varying that much? So the OpenCL application is affected maybe more by other processes than the CUDA app?

Regards

Mark my words and remember me. - 11th Hour, Lamb of God

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 987

Credit: 25171438

RAC: 0

RE: So the OpenCL

22 May 2012 7:40:22 UTC

Message 109531 in response to message 109530

Quote

(moderation:

)

Quote:

So the OpenCL application is affected maybe more by other processes than the CUDA app?

Yes, that's an observation we also made during our testing phase. OpenCL, well at least AMD's implementation, is much more sensitive to the amount of CPU-power available to serve the GPU than NVIDIA's CUDA. Also, for CUDA we (as developers) can decide and influence how to trade GPU efficiency against CPU-consumption quite a bit - OpenCL doesn't offer such fine-tuning.

Best,
Oliver

Einstein@Home Project

Vit

Joined: 7 Jan 08

Posts: 23

Credit: 393350695

RAC: 0

RE: RE: I just checked:

22 May 2012 21:53:45 UTC

Message 109532 in response to message 109529

Quote

(moderation:

)

Quote:

Quote:

I just checked: I have 0.6% /min @ 3 concurrent einstein tasks at i7-2600K 4.5GHz. So is possible average GPU-acceleration(AMD or NV) in my case just about 2 times?..

You need to compare the same types of wu's. AMD-wu's are BRP(Arecibo) wu's (500 credits).
2 concurrent means that 2 wu's are running simultanous on one GPU; my PC finishes 2 wu's every 1:45 on the HD6950 and 2 wu's every 2:45 on the HD5850 (no overclocking). CPU is i7-860 @ 2.8GHz, win7-64. Midrange HD7xxx should perform better.

It was not easy to find the same task... well, it is http://einsteinathome.org/task/288218220 , and BRP(Arecibo) 500 credits task used 21,678 sec of my CPU, so 1% takes 217 sec. Your device1 speed is 1% for 97 sec. So again your device1-OpenCL is just about 2 times faster, and your device0-GPU is about 3 times faster. I don't know about evolution in energy efficiency for HD7970 relatively to your 160 Wt HD5850, but you have 75 Wt per task, and my i7 - 35 Wt per task (4 core overclocked i7-2600 4.5 GHz consumption is 150 Wt). We have almost the same energy efficiency! No reason for GPU installation?..
By the way, the same task (see http://einsteinathome.org/workunit/123117782 ) used 10 times more time on Pentium-4 3GHz. This progress in CPU is more impressive than current version OpenGL benefits. So sad...

5pot

Joined: 8 Apr 12

Posts: 107

Credit: 7577619

RAC: 0

When crunching, gpus use less

22 May 2012 22:36:59 UTC

Message 109533

Quote

(moderation:

)

When crunching, gpus use less energy than if they were playing video. No video output.

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 520873483

RAC: 257597

RE: It was not easy to

23 May 2012 6:38:42 UTC

Message 109534 in response to message 109532

Quote

(moderation:

)

Quote:

It was not easy to find the same task... well, it is http://einsteinathome.org/task/288218220 , and BRP(Arecibo) 500 credits task used 21,678 sec of my CPU, so 1% takes 217 sec. Your device1 speed is 1% for 97 sec. So again your device1-OpenCL is just about 2 times faster, and your device0-GPU is about 3 times faster. I don't know about evolution in energy efficiency for HD7970 relatively to your 160 Wt HD5850, but you have 75 Wt per task, and my i7 - 35 Wt per task (4 core overclocked i7-2600 4.5 GHz consumption is 150 Wt). We have almost the same energy efficiency! No reason for GPU installation?..
By the way, the same task (see http://einsteinathome.org/workunit/123117782 ) used 10 times more time on Pentium-4 3GHz. This progress in CPU is more impressive than current version OpenGL benefits. So sad...

Sorry, you missed one important fact:
in these ' 1% for 97sec ' TWO wu's make this progress, they run at the same time, not one after the other.
So the gps's are not 2 / 3 times faster, they are 4 / 6 times faster and the powerconsumption is not almost the same but 50% of your cpu /wu.

Another way to calculate it: in 21678 sec (~6 hrs) my faster gpu crunches 6,8 wu's. And don't forget: my both gpu's are outdated, actual ones are faster and consume less power.

Another fact: my mobo has one x16 slot and one x8 slot, here at einstein you can find another thread explaining the differeces. A better mobo would give better figures. It does not really reflect the capabilities of the 'slower' gpu.

Anyway, we do not fight a war 'CPU against GPU', we do scientific work. There are different ways to do that. Speaking for myself, I'm happy to participate in science with the capabilities I have.

Vit

Joined: 7 Jan 08

Posts: 23

Credit: 393350695

RAC: 0

RE: Sorry, you missed one

23 May 2012 10:07:38 UTC

Message 109535 in response to message 109534

Quote

(moderation:

)

Quote:

Sorry, you missed one important fact: in these ' 1% for 97sec ' TWO wu's make this progress, they run at the same time, not one after the other.

I did not miss. That's why I wrote "you have 75Wt per task", not "150 Wt per task" (not sure if 2 tasks use 100% of your GPU power). And my consumption is 140Wt/4=35 Wt per task, because 4 tasks can run simultaneously and speed is the same.

Quote:

My both gpu's are outdated, actual ones are faster and consume less power.

Yes, but nobody here answers about 7970 or 680 speed and efficiency. Thank you for your information, even for outdated GPU. I wander if Bruce Allen team has no such kind of information to share with us.

Quote:

Anyway, we do not fight a war 'CPU against GPU'

Sure! Peace, dude :) My initial question is "Is it possible to take a look at some comparison charts for compute power of typical CPU and GPU for Einstein@Home-type of calculations?". We still have no charts and just trying to find out the truth: is it worth to buy 7970 for powerful and energy-efficient calculations. Because if it is worth - I will buy.

5pot

Joined: 8 Apr 12

Posts: 107

Credit: 7577619

RAC: 0

why would anyone buy ati for

23 May 2012 12:35:35 UTC

Message 109536

Quote

(moderation:

)

why would anyone buy ati for this project is beyond me. CUDA runs faster here. If you're going to buy a card specifically for this project you should buy NVIDIA. 680 on W7 can run 3 tasks at a time and average around 3000 seconds per task with PCIe 3.0, a little more if CPU is running other projects (3100).

On Linux it's even a little faster

Vit

Joined: 7 Jan 08

Posts: 23

Credit: 393350695

RAC: 0

RE: why would anyone buy

23 May 2012 13:18:01 UTC

Message 109537 in response to message 109536

Quote

(moderation:

)

Quote:

why would anyone buy ati for this project is beyond me. CUDA runs faster here.

Why do you thik so? Do you have some charts for 680 CUDA vs 7970 OpenCL (both codes is written by perfect programmers)? This is what I am looking for! Afaik right-written OpenCL code has performance equal to CUDA for FFT and almost all other kinds of math.
I prefer AMD at least because OpenCL provides an open, industry-standard framework. No one but NVidia can use CUDA - this is wrong way. And don't believe NVidia advertising, it is very aggressive, half-truth, biased and often even deceptive.

Quote:

680 on W7 can run 3 tasks at a time and average around 3000 seconds per task with PCIe 3.0

This is useful information, thank you. So it is ~7 times faster and takes ~2..3 times more power consumption, therefore it is ~2..3 times more energy-efficient. Let us wait for someone's 7970 report.

5pot

Joined: 8 Apr 12

Posts: 107

Credit: 7577619

RAC: 0

Found this on

23 May 2012 14:00:24 UTC

Message 109538

Quote

(moderation:

)

Found this on Albert

http://albert.phys.uwm.edu/results.php?hostid=2209&offset=0&show_names=0&state=3&appid=

Based on the time stamps, I only managed to find one where they were close enough together that would give me the GUESS that they were running 2x at a time. The CPU being used can bring in some speculation. But even with those times, and the increase in TDP of the 7970, CUDA is still a better choice.

Also don't forget what Oliver stated, "Yes, that's an observation we also made during our testing phase. OpenCL, well at least AMD's implementation, is much more sensitive to the amount of CPU-power available to serve the GPU than NVIDIA's CUDA. Also, for CUDA we (as developers) can decide and influence how to trade GPU efficiency against CPU-consumption quite a bit - OpenCL doesn't offer such fine-tuning.

This statement does not seem to be in favor of OpenCl from the dev's perspective. What is very noted from the PCIe discussion in cruncher's corner, is that PCIe 3 is MUCH better than 2 when loading multiple tasks.

EDIT: Since many people will not have a 7970, I would send them a private message. That person ive seen in quite a few forums, so I'm "sure" they would be willing to help.

EDIT 2x= Even if this person was running 2, the time would still be higher than my 680 running 3. Thereby DRASTICALLY reducing efficiency.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 800821269

RAC: 1220955

Well, this type of "vendor A

23 May 2012 14:55:07 UTC

Message 109539

Quote

(moderation:

)

Well, this type of "vendor A vs. vendor B" discussions have a tendency to spin out of control sooner or later and I don't want to get too involved into it :-), I'd just like to stress one important point: the BRP4 app versions for CUDA and OpenCL respectively should NOT, IMHO, be used in a "benchmark" type of sense to make general comparisons between AMD vs NVIDIA or CUDA vs. OpenCL.

The two versions use completely different libraries for the FFT, they even use slightly different approaches for the FFT because of limitations of the FFT lib used in the OpenCL case. The OpenCL app is "younger" and in general I would consider it less optimized to its target platform.

Cheers
HBE

Vit

Joined: 7 Jan 08

Posts: 23

Credit: 393350695

RAC: 0

Bikeman, no war, no spinning

23 May 2012 16:25:23 UTC

Message 109540 in response to message 109539

Quote

(moderation:

)

Bikeman, no war, no spinning out - just measurement, some statistics and a few ideas about attractiveness of different approaches for GPGPU. You are deep in OpenCL and CUDA for this project, so can you give us estimation of new 7970 energy-efficiency in this particular kind of calculations? ( "Is it possible to take a look at some comparison charts for compute power of typical CPU and GPU for Einstein@Home-type of calculations?" ) I believe you have some info and measurements results. Sure OpenCL is yonger, and your version of this GPGPU-library is the first and may be not perfect yet. I don't even try to use it in war "AMD vs NVidia" as a benchmark.
I remember Jul 2011 we had been told at Hannover meeting that in average GPGPU is approximately 5 times more energy efficient than CPU. But one year passed: i7, AMD 7970, NVidia 680 and you OpenCL-library appeared. So please tell me like Holy Father to the parishioner: should I buy AMD GPU or not ( games only are not enough motivation for me :) ).

.clair.

Joined: 20 Nov 06

Posts: 62

Credit: 1051176770

RAC: 0

Here are some tasks from my

23 May 2012 21:44:13 UTC

Message 109541

Quote

(moderation:

)

Here are some tasks from my 7970 running one wu at a time while seti was down cos of their power grid failing.
http://einsteinathome.org/host/5365549/tasks
The motherboard is an xfx780i with PCIe v2 16x slot.
the cpu is 3.6 P4 with ht on and running other cpu projects so the gpu is starved of cpu time to run einstein, so times are longer than should be,
I also have other problems with this pc having now had to go back one month with system restore which removed ccc 12.4 and BM 7:0:28
When i am shure the other problems are gone/fixed i will upgrade again and try again with einstein.
I built this system with ATI gpu so that it can run SETI VLAR workunits
einstein work was just for fun and fill in time,
I was lucky that E@H come up with OCL app in time :Â¬)

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 987

Credit: 25171438

RAC: 0

RE: No one but NVidia can

24 May 2012 12:22:13 UTC

Message 109542 in response to message 109537

Quote

(moderation:

)

Quote:

No one but NVidia can use CUDA - this is wrong way.

I'd like to correct this one, while CUDA itself isn't an open standard like OpenCL, NVIDIA opened their LLVM-based CUDA compiler. This allows all interesting parties to target their GPUs, APUs and CPUs with CUDA. There is already a CUDA compiler targeting multi/many-core CPUs (by PGI). In this sense CUDA has now become a full-fledged competitor for OpenCL. It's now up to the Khronos Group to win this competition - as always, survival of the fittest...

I'm also in favor of open standards but they also need to deliver and be turned into marketable products. The best standard doesn't help if it's not adopted by a critical mass. If the Khronos Group would adopt something like the Java Community Process to develop the OpenCL standard itself things might work out, but right now they don't perform as they probably should in a competitive environment.

JM2C,
Oliver

PS: Back to topic! :-)

Einstein@Home Project

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 520873483

RAC: 257597

RE: just measurement, some

24 May 2012 20:27:39 UTC

Message 109543 in response to message 109540

Quote

(moderation:

)

Quote:

just measurement, some statistics and a few ideas about attractiveness of different approaches for GPGPU.

You can compare HD79XX against my HD6950 here:
http://albert.phys.uwm.edu/workunit.php?wuid=75885
Computer 2209 runs a HD79XX
You are familiar with my configuration

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 800821269

RAC: 1220955

These two jobs are not really

24 May 2012 21:19:05 UTC

Message 109544 in response to message 109543

Quote

(moderation:

)

These two jobs are not really comparable tho: one is a brand-new (1.25) prototype for an OpenCL app specifically modified to cure validation problems of HD6900 series cards. It is slower than the previous version 1.24.

Cheers
HB

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 520873483

RAC: 257597

RE: These two jobs are not

24 May 2012 21:28:49 UTC

Message 109545 in response to message 109544

Quote

(moderation:

)

Quote:

These two jobs are not really comparable tho: one is a brand-new (1.25) prototype for an OpenCL app specifically modified to cure validation problems of HD6900 series cards. It is slower than the previous version 1.24.

Cheers
HB

1:45 @ Einstein (1.24) : 2:09 @ Albert (1.25)

Vit

Joined: 7 Jan 08

Posts: 23

Credit: 393350695

RAC: 0

Bikeman, Oliver Bock, Alex

26 May 2012 12:10:53 UTC

Message 109546 in response to message 109545

Quote

(moderation:

)

Bikeman, Oliver Bock, Alex and others - thank you all! I bought it. Let me share results of my new 7970, for Arecibo 1.24 atiOpenCL:
1 task on GPU (0.5 CPU + 1 ATI GPU): 18-25 min, GPU Load by GPU-Z 40-45% (although Catalist Control Center show "activity 60%"), CPU Load by W7 TaskManager - 5% (it is ~40% load of one core)
2 tasks on GPU (0.5 CPU + 0.5 ATI GPU): ~38 min, GPU Load by GPU-Z 58-62% (Catalist CC - 80-84%), CPU Load - 3%
Have no idea why dispersion (17-25 min) in the case of 1 task so large if tasks need equal(?) amount of calculation. CPU is not heavy loaded by other tasks.

Whatever... if we assume that "1.22 BRP4 SSE" and "1.24 atiOpenCL" need the same amount of calculations, then for one task even on PCIe2.0 GPU 7970 1GHz is ~20 times faster (and ~5-7 times more energy efficient) than my i7-2600k 4.5 GHz CPU. Good job!
And thank you for inducement me to by GPU, I am going to check the progress in game industry for the last 7-8 years.

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 520873483

RAC: 257597

Congratulations! To get the

26 May 2012 14:20:13 UTC

Message 109547 in response to message 109546

Quote

(moderation:

)

Congratulations!
To get the same results with my both cards I need to switch over to the 36h-day!

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 800821269

RAC: 1220955

You're welcome! We will

26 May 2012 15:32:15 UTC

Message 109548

Quote

(moderation:

)

You're welcome!

We will continue to improve the ATi/AMD app, so that the energy efficiency should even go up some more in the not so distant future.

Cheers
HB

Petrion

Joined: 30 Apr 08

Posts: 53

Credit: 1243186

RAC: 0

RE: Bikeman, Oliver Bock,

26 May 2012 23:29:02 UTC

Message 109549 in response to message 109546

Quote

(moderation:

)

Quote:

Bikeman, Oliver Bock, Alex and others - thank you all! I bought it. Let me share results of my new 7970, for Arecibo 1.24 atiOpenCL:
1 task on GPU (0.5 CPU + 1 ATI GPU): 18-25 min, GPU Load by GPU-Z 40-45% (although Catalist Control Center show "activity 60%"), CPU Load by W7 TaskManager - 5% (it is ~40% load of one core)
2 tasks on GPU (0.5 CPU + 0.5 ATI GPU): ~38 min, GPU Load by GPU-Z 58-62% (Catalist CC - 80-84%), CPU Load - 3%
Have no idea why dispersion (17-25 min) in the case of 1 task so large if tasks need equal(?) amount of calculation. CPU is not heavy loaded by other tasks.

Nice. My rig isn't as buff but I can crunch one on average in 65 min (Catalyst GPU load 80%), but doing 2 tasks splits my GPU work between them taking 125 min (Catalyst GPU load 92%) to do both. Both ways uses 5% CPU load.

Win 7Pro X64, i5-2500K CPU @ 3.30GHz (OC 4.5GHz), AMD HD6850, PCIe 2.0, 8GB 1600 RAM, BOINC 7.0.28

Run time(sec) 3,739.87
CPU time(sec) 530.31
Claimed credit 6.90
Granted credit 500.00

Petrion

Joined: 30 Apr 08

Posts: 53

Credit: 1243186

RAC: 0

RE: And thank you for

26 May 2012 23:33:10 UTC

Message 109550 in response to message 109546

Quote

(moderation:

)

Quote:

And thank you for inducement me to by GPU, I am going to check the progress in game industry for the last 7-8 years.

I'm a power-gamer and I use my gaming rig to crunch, thus my HD 6850. And the gaming industry has progressed a lot in the last 7-8 years. You should have fun. :)