Einstein@Home GPU Application for ATI/AMD Graphics Cards

After more than a year of work by Oliver Bock, Bernd Machenschalk, Heinz-Bernd Eggenstein and other developers, we are pleased to announce the release of the first Einstein@Home application for ATI/AMD Graphics Cards.

This OpenCL application, which searches Arecibo data for new radio pulsars, is about a factor of ten faster than the same search running on a typical CPU. The application is currently available for Windows and Linux computers with Radeon HD 5000 or better graphics cards. We hope to have a version for Macintosh (Apple OS X 10.8, Mountain Lion) sometime this summer, but there are still some problems that need to be fixed or worked around.

Volunteers who wish to run this application will need to install version 7.0.27 or later of the BOINC client. Please see this thread for more information, or if you want to ask questions.

Many thanks to the AMD/ATI team for their support in the OpenCL software development effort.

Bruce Allen
Director, Einstein@Home


Comments

5pot
5pot
Joined: 8 Apr 12
Posts: 107
Credit: 7577619
RAC: 0

Einstein@Home GPU Application for ATI/AMD Graphics Cards

SP I would presume?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 759160415
RAC: 1132753

SP indeed. No support is

SP indeed. No support is offered for HD 4xxx generation cards (without support for OpenCL 1.1), anything more recent should do fine.

Cheers
HB

Vit
Vit
Joined: 7 Jan 08
Posts: 23
Credit: 393350695
RAC: 0

Seems like it is time to buy

Seems like it is time to buy AMD-card! :) (had no reasons to buy card before and still use Sandy Bridge HD3000)
Is it possible to take a look at some comparison charts for compute power of typical CPU and GPU for Einstein@Home-type of calculations?

5pot
5pot
Joined: 8 Apr 12
Posts: 107
Credit: 7577619
RAC: 0

CPU does not even compare.

CPU does not even compare. CPUs are also incredibly inefficient compared to GPU. NVIDIA.I believe they said is currently 20x faster, and currently AMD will be 10x faster. I would believe this number will increase as changes are made IMHO.

Michael Becker
Michael Becker
Joined: 15 Jul 05
Posts: 3
Credit: 10398405
RAC: 0

thx for the ati/amd app, it

thx for the ati/amd app, it works on my hd5870
my first result:http://einsteinathome.org/task/288520169
but the gpu-load is only ~60%, one cpu-core is only for gpu-tasks (cpu: i7-2600k)
should i set 'GPU utilization factor' to 0,5?
on my second machine is a 560ti running, there i have the best results with 'GPU utilization factor' 0,33

michel

5pot
5pot
Joined: 8 Apr 12
Posts: 107
Credit: 7577619
RAC: 0

Would assume the same

Would assume the same applies, my 680 also runs at around 60% with one applied, and around 90% with .33 set.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 759160415
RAC: 1132753

Nice! Yes, I would

Nice!

Yes, I would encourage experiments with the utilization factor. You cn use different "venues" in BOINC-speak to assign different settings to different hosts.

CU
HB

TRuEQ & TuVaLu
TRuEQ & TuVaLu
Joined: 11 Sep 06
Posts: 2
Credit: 33588262
RAC: 0

It

It works!!!

http://einsteinathome.org/host/5353241/tasks

I run it with the "dangerous" option of 0.5 2tasks at once.
And it runs 1 + 1 on the GPU together with Milkyway, SETI, Primegrid and POEM Which are also using 0.5 in the app_info.xml file.

dskagcommunity
dskagcommunity
Joined: 16 Mar 11
Posts: 89
Credit: 1219701683
RAC: 53907

RE: SP indeed. No support

Quote:

SP indeed. No support is offered for HD 4xxx generation cards (without support for OpenCL 1.1), anything more recent should do fine.

Cheers
HB

So i think 4xxx will never be supported? or only @ the beginning now?

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4330
Credit: 251578723
RAC: 36805

RE: RE: SP indeed. No

Quote:
Quote:
SP indeed. No support is offered for HD 4xxx generation cards (without support for OpenCL 1.1), anything more recent should do fine.

So i think 4xxx will never be supported? or only @ the beginning now?

I don't think we will make an OpenCL 1.0 App, at least not for BRP4. It would be another code branch to maintain and it would almost double the memory requirements, thus thus a task would not fit in 512MB.

Also I doubt that the limited computing power of the 4xxx would gain us much.

BM

BM

dskagcommunity
dskagcommunity
Joined: 16 Mar 11
Posts: 89
Credit: 1219701683
RAC: 53907

ok i see, where does this

ok i see, where does this 512MB limit comes from? OPENCL 1.0?

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4330
Credit: 251578723
RAC: 36805

This is not a hard limit,

This is not a hard limit, just a target that we set to get the most of our population of ATI cards.

BM

BM

dskagcommunity
dskagcommunity
Joined: 16 Mar 11
Posts: 89
Credit: 1219701683
RAC: 53907

Hm are there so much 5xxx or

Hm are there so much 5xxx or higher with 512MB?

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 987
Credit: 25171438
RAC: 2

Yes, in fact, all of them

Yes, in fact, all of them :-)

Oliver

Einstein@Home Project

dskagcommunity
dskagcommunity
Joined: 16 Mar 11
Posts: 89
Credit: 1219701683
RAC: 53907

I ment ONLY 512MB ;) ;) But i

I ment ONLY 512MB ;) ;) But i see half of the cards could possible have only 512MB. Hm sad.

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 759160415
RAC: 1132753

Hi! As Bernd mentioned

Hi!
As Bernd mentioned already, more than 512 MB would only be needed for a second code branch that would be able to support OpenCL1.0. It's mostly the 4xxx cards that would benefit from support of OpennCL1.0, not the HD 5000 series. So the question would be: are there many 4000-series cards with just 512 MB? Those were popular when? 2008? 2009?. More than 512 MB video RAM wasn't the norm back then. So by supporting OpenCL1.0, we would be able to utilize only a certain fraction of the already shrinking population of older cards, to get a not-so-great performance per card ===> it just doesn't make too much sense.

Cheers
HB

TRuEQ & TuVaLu
TRuEQ & TuVaLu
Joined: 11 Sep 06
Posts: 2
Credit: 33588262
RAC: 0

I'll leave my 4850 to do

I'll leave my 4850 to do Milkyway and Collatz for as long as it lives.
I can't put any better card in that computor since the PCIE-Express card is only 1.x something.....

noderaser
noderaser
Joined: 9 Feb 05
Posts: 50
Credit: 101924204
RAC: 513554

My HD 4870 with 1 GB (and a

My HD 4870 with 1 GB (and a GeForce 320M with 256 MB) is eagerly awaiting something cooler than boring math projects. No dice as of yet.

dskagcommunity
dskagcommunity
Joined: 16 Mar 11
Posts: 89
Credit: 1219701683
RAC: 53907

noderaser: use it like im do

noderaser: use it like im do with my 4850 with 1GB onto POEM (when you dont like MW). Possible Three WUs @ once, 84000 Credits/day. Thx god they supporting OpenCL since a short time. Its medical research

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Dr.Alexx
Dr.Alexx
Joined: 14 Aug 05
Posts: 22
Credit: 5135173
RAC: 29

The BOINC site is unavailable

The BOINC site is unavailable for 2 days! Cannot download new client! Can somewone send 64bit Windows client ver 27 to kido00 (a t) ya.ru ?

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

RE: The BOINC site is

Quote:
The BOINC site is unavailable for 2 days! Cannot download new client! Can somewone send 64bit Windows client ver 27 to kido00 (a t) ya.ru ?


There has been a power failure at Berkeley due to a shorted underground cable. It has been repaired but the servers are still down.
Tullio

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 987
Credit: 25171438
RAC: 2

RE: The BOINC site is

Quote:
The BOINC site is unavailable for 2 days! Cannot download new client! Can somewone send 64bit Windows client ver 27 to kido00 (a t) ya.ru ?

Please see main thread...

Einstein@Home Project

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4330
Credit: 251578723
RAC: 36805
`COMMUNISTIS G. KALEMAKIS
`COMMUNISTIS G....
Joined: 14 Feb 12
Posts: 11
Credit: 73208
RAC: 0

[img][/img][url]ok george

[img][/img][url]ok george kalemakis.[/url]

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 509540239
RAC: 116681

RE: Seems like it is time

Quote:
Seems like it is time to buy AMD-card! :) (had no reasons to buy card before and still use Sandy Bridge HD3000)
Is it possible to take a look at some comparison charts for compute power of typical CPU and GPU for Einstein@Home-type of calculations?

You can find some performance figures here: http://albert.phys.uwm.edu/forum_thread.php?id=8888&nowrap=true#112053
A HD6950 runs 1% /min @ 2 wu's concurrent.

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 509540239
RAC: 116681

Yesterday I attached my

Yesterday I attached my mainsys with 2 AMD-cards to the project.
Until now 12 wu's were done, 4 validated (2 against AMD, 1 against Cuda and one against SSE).
Looks like the team has done a wonderful job! THX!

(retired account)
(retired account)
Joined: 28 Sep 11
Posts: 16
Credit: 7357648
RAC: 0

Thanks to everyone involved

Thanks to everyone involved in the development of the AMD/OpenCL app!

I'm still waiting for the first result to validate, but so far so good on a HD 7950 with Win8 preview:

http://einsteinathome.org/task/289196474

BOINC 7.0.27 (x64)
Catalyst 12.4 (installed in Win7 comp. mode)
Windows 8 Dev. Preview x64

Performance not overwelming yet, appears to be only ~9% faster than my GTX 560 Ti, but since this is the first public release (and in my case not the real win8 driver), who knows what is still to come ... :-)

Regards

Mark my words and remember me. - 11th Hour, Lamb of God

Vit
Vit
Joined: 7 Jan 08
Posts: 23
Credit: 393350695
RAC: 0

RE: RE: Seems like it is

Quote:
Quote:
Seems like it is time to buy AMD-card! :) (had no reasons to buy card before and still use Sandy Bridge HD3000)
Is it possible to take a look at some comparison charts for compute power of typical CPU and GPU for Einstein@Home-type of calculations?

You can find some performance figures here: http://albert.phys.uwm.edu/forum_thread.php?id=8888&nowrap=true#112053
A HD6950 runs 1% /min @ 2 wu's concurrent.

I just checked: I have 0.6% /min @ 3 concurrent einstein tasks at i7-2600K 4.5GHz. So is possible average GPU-acceleration(AMD or NV) in my case just about 2 times?..

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 509540239
RAC: 116681

RE: RE: RE: Seems like

Quote:
Quote:
Quote:
Seems like it is time to buy AMD-card! :) (had no reasons to buy card before and still use Sandy Bridge HD3000)
Is it possible to take a look at some comparison charts for compute power of typical CPU and GPU for Einstein@Home-type of calculations?

You can find some performance figures here: http://albert.phys.uwm.edu/forum_thread.php?id=8888&nowrap=true#112053
A HD6950 runs 1% /min @ 2 wu's concurrent.

I just checked: I have 0.6% /min @ 3 concurrent einstein tasks at i7-2600K 4.5GHz. So is possible average GPU-acceleration(AMD or NV) in my case just about 2 times?..


You need to compare the same types of wu's. AMD-wu's are BRP(Arecibo) wu's (500 credits).
2 concurrent means that 2 wu's are running simultanous on one GPU; my PC finishes 2 wu's every 1:45 on the HD6950 and 2 wu's every 2:45 on the HD5850 (no overclocking).
CPU is i7-860 @ 2.8GHz, win7-64
Midrange HD7xxx should perform better.

(retired account)
(retired account)
Joined: 28 Sep 11
Posts: 16
Credit: 7357648
RAC: 0

RE: I'm still waiting for

Quote:
I'm still waiting for the first result to validate,

Three results have been validated ok. And there are noticeable run time differences between all results. I guess it is not the amount of calculations which is varying that much? So the OpenCL application is affected maybe more by other processes than the CUDA app?

Regards

Mark my words and remember me. - 11th Hour, Lamb of God

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 987
Credit: 25171438
RAC: 2

RE: So the OpenCL

Quote:
So the OpenCL application is affected maybe more by other processes than the CUDA app?

Yes, that's an observation we also made during our testing phase. OpenCL, well at least AMD's implementation, is much more sensitive to the amount of CPU-power available to serve the GPU than NVIDIA's CUDA. Also, for CUDA we (as developers) can decide and influence how to trade GPU efficiency against CPU-consumption quite a bit - OpenCL doesn't offer such fine-tuning.

Best,
Oliver

Einstein@Home Project

Vit
Vit
Joined: 7 Jan 08
Posts: 23
Credit: 393350695
RAC: 0

RE: RE: I just checked:

Quote:
Quote:

I just checked: I have 0.6% /min @ 3 concurrent einstein tasks at i7-2600K 4.5GHz. So is possible average GPU-acceleration(AMD or NV) in my case just about 2 times?..

You need to compare the same types of wu's. AMD-wu's are BRP(Arecibo) wu's (500 credits).
2 concurrent means that 2 wu's are running simultanous on one GPU; my PC finishes 2 wu's every 1:45 on the HD6950 and 2 wu's every 2:45 on the HD5850 (no overclocking). CPU is i7-860 @ 2.8GHz, win7-64. Midrange HD7xxx should perform better.


It was not easy to find the same task... well, it is http://einsteinathome.org/task/288218220 , and BRP(Arecibo) 500 credits task used 21,678 sec of my CPU, so 1% takes 217 sec. Your device1 speed is 1% for 97 sec. So again your device1-OpenCL is just about 2 times faster, and your device0-GPU is about 3 times faster. I don't know about evolution in energy efficiency for HD7970 relatively to your 160 Wt HD5850, but you have 75 Wt per task, and my i7 - 35 Wt per task (4 core overclocked i7-2600 4.5 GHz consumption is 150 Wt). We have almost the same energy efficiency! No reason for GPU installation?..
By the way, the same task (see http://einsteinathome.org/workunit/123117782 ) used 10 times more time on Pentium-4 3GHz. This progress in CPU is more impressive than current version OpenGL benefits. So sad...

5pot
5pot
Joined: 8 Apr 12
Posts: 107
Credit: 7577619
RAC: 0

When crunching, gpus use less

When crunching, gpus use less energy than if they were playing video. No video output.

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 509540239
RAC: 116681

RE: It was not easy to

Quote:

It was not easy to find the same task... well, it is http://einsteinathome.org/task/288218220 , and BRP(Arecibo) 500 credits task used 21,678 sec of my CPU, so 1% takes 217 sec. Your device1 speed is 1% for 97 sec. So again your device1-OpenCL is just about 2 times faster, and your device0-GPU is about 3 times faster. I don't know about evolution in energy efficiency for HD7970 relatively to your 160 Wt HD5850, but you have 75 Wt per task, and my i7 - 35 Wt per task (4 core overclocked i7-2600 4.5 GHz consumption is 150 Wt). We have almost the same energy efficiency! No reason for GPU installation?..
By the way, the same task (see http://einsteinathome.org/workunit/123117782 ) used 10 times more time on Pentium-4 3GHz. This progress in CPU is more impressive than current version OpenGL benefits. So sad...

Sorry, you missed one important fact:
in these ' 1% for 97sec ' TWO wu's make this progress, they run at the same time, not one after the other.
So the gps's are not 2 / 3 times faster, they are 4 / 6 times faster and the powerconsumption is not almost the same but 50% of your cpu /wu.

Another way to calculate it: in 21678 sec (~6 hrs) my faster gpu crunches 6,8 wu's. And don't forget: my both gpu's are outdated, actual ones are faster and consume less power.

Another fact: my mobo has one x16 slot and one x8 slot, here at einstein you can find another thread explaining the differeces. A better mobo would give better figures. It does not really reflect the capabilities of the 'slower' gpu.

Anyway, we do not fight a war 'CPU against GPU', we do scientific work. There are different ways to do that. Speaking for myself, I'm happy to participate in science with the capabilities I have.

Vit
Vit
Joined: 7 Jan 08
Posts: 23
Credit: 393350695
RAC: 0

RE: Sorry, you missed one

Quote:
Sorry, you missed one important fact: in these ' 1% for 97sec ' TWO wu's make this progress, they run at the same time, not one after the other.


I did not miss. That's why I wrote "you have 75Wt per task", not "150 Wt per task" (not sure if 2 tasks use 100% of your GPU power). And my consumption is 140Wt/4=35 Wt per task, because 4 tasks can run simultaneously and speed is the same.

Quote:
My both gpu's are outdated, actual ones are faster and consume less power.


Yes, but nobody here answers about 7970 or 680 speed and efficiency. Thank you for your information, even for outdated GPU. I wander if Bruce Allen team has no such kind of information to share with us.

Quote:
Anyway, we do not fight a war 'CPU against GPU'


Sure! Peace, dude :) My initial question is "Is it possible to take a look at some comparison charts for compute power of typical CPU and GPU for Einstein@Home-type of calculations?". We still have no charts and just trying to find out the truth: is it worth to buy 7970 for powerful and energy-efficient calculations. Because if it is worth - I will buy.

5pot
5pot
Joined: 8 Apr 12
Posts: 107
Credit: 7577619
RAC: 0

why would anyone buy ati for

why would anyone buy ati for this project is beyond me. CUDA runs faster here. If you're going to buy a card specifically for this project you should buy NVIDIA. 680 on W7 can run 3 tasks at a time and average around 3000 seconds per task with PCIe 3.0, a little more if CPU is running other projects (3100).

On Linux it's even a little faster

Vit
Vit
Joined: 7 Jan 08
Posts: 23
Credit: 393350695
RAC: 0

RE: why would anyone buy

Quote:
why would anyone buy ati for this project is beyond me. CUDA runs faster here.


Why do you thik so? Do you have some charts for 680 CUDA vs 7970 OpenCL (both codes is written by perfect programmers)? This is what I am looking for! Afaik right-written OpenCL code has performance equal to CUDA for FFT and almost all other kinds of math.
I prefer AMD at least because OpenCL provides an open, industry-standard framework. No one but NVidia can use CUDA - this is wrong way. And don't believe NVidia advertising, it is very aggressive, half-truth, biased and often even deceptive.

Quote:
680 on W7 can run 3 tasks at a time and average around 3000 seconds per task with PCIe 3.0


This is useful information, thank you. So it is ~7 times faster and takes ~2..3 times more power consumption, therefore it is ~2..3 times more energy-efficient. Let us wait for someone's 7970 report.

5pot
5pot
Joined: 8 Apr 12
Posts: 107
Credit: 7577619
RAC: 0

Found this on

Found this on Albert

http://albert.phys.uwm.edu/results.php?hostid=2209&offset=0&show_names=0&state=3&appid=

Based on the time stamps, I only managed to find one where they were close enough together that would give me the GUESS that they were running 2x at a time. The CPU being used can bring in some speculation. But even with those times, and the increase in TDP of the 7970, CUDA is still a better choice.

Also don't forget what Oliver stated, "Yes, that's an observation we also made during our testing phase. OpenCL, well at least AMD's implementation, is much more sensitive to the amount of CPU-power available to serve the GPU than NVIDIA's CUDA. Also, for CUDA we (as developers) can decide and influence how to trade GPU efficiency against CPU-consumption quite a bit - OpenCL doesn't offer such fine-tuning.

This statement does not seem to be in favor of OpenCl from the dev's perspective. What is very noted from the PCIe discussion in cruncher's corner, is that PCIe 3 is MUCH better than 2 when loading multiple tasks.

EDIT: Since many people will not have a 7970, I would send them a private message. That person ive seen in quite a few forums, so I'm "sure" they would be willing to help.

EDIT 2x= Even if this person was running 2, the time would still be higher than my 680 running 3. Thereby DRASTICALLY reducing efficiency.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 759160415
RAC: 1132753

Well, this type of "vendor A

Well, this type of "vendor A vs. vendor B" discussions have a tendency to spin out of control sooner or later and I don't want to get too involved into it :-), I'd just like to stress one important point: the BRP4 app versions for CUDA and OpenCL respectively should NOT, IMHO, be used in a "benchmark" type of sense to make general comparisons between AMD vs NVIDIA or CUDA vs. OpenCL.

The two versions use completely different libraries for the FFT, they even use slightly different approaches for the FFT because of limitations of the FFT lib used in the OpenCL case. The OpenCL app is "younger" and in general I would consider it less optimized to its target platform.

Cheers
HBE

Vit
Vit
Joined: 7 Jan 08
Posts: 23
Credit: 393350695
RAC: 0

Bikeman, no war, no spinning

Bikeman, no war, no spinning out - just measurement, some statistics and a few ideas about attractiveness of different approaches for GPGPU. You are deep in OpenCL and CUDA for this project, so can you give us estimation of new 7970 energy-efficiency in this particular kind of calculations? ( "Is it possible to take a look at some comparison charts for compute power of typical CPU and GPU for Einstein@Home-type of calculations?" ) I believe you have some info and measurements results. Sure OpenCL is yonger, and your version of this GPGPU-library is the first and may be not perfect yet. I don't even try to use it in war "AMD vs NVidia" as a benchmark.
I remember Jul 2011 we had been told at Hannover meeting that in average GPGPU is approximately 5 times more energy efficient than CPU. But one year passed: i7, AMD 7970, NVidia 680 and you OpenCL-library appeared. So please tell me like Holy Father to the parishioner: should I buy AMD GPU or not ( games only are not enough motivation for me :) ).

.clair.
.clair.
Joined: 20 Nov 06
Posts: 62
Credit: 1051176770
RAC: 0

Here are some tasks from my

Here are some tasks from my 7970 running one wu at a time while seti was down cos of their power grid failing.
http://einsteinathome.org/host/5365549/tasks
The motherboard is an xfx780i with PCIe v2 16x slot.
the cpu is 3.6 P4 with ht on and running other cpu projects so the gpu is starved of cpu time to run einstein, so times are longer than should be,
I also have other problems with this pc having now had to go back one month with system restore which removed ccc 12.4 and BM 7:0:28
When i am shure the other problems are gone/fixed i will upgrade again and try again with einstein.
I built this system with ATI gpu so that it can run SETI VLAR workunits
einstein work was just for fun and fill in time,
I was lucky that E@H come up with OCL app in time :¬)

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 987
Credit: 25171438
RAC: 2

RE: No one but NVidia can

Quote:
No one but NVidia can use CUDA - this is wrong way.

I'd like to correct this one, while CUDA itself isn't an open standard like OpenCL, NVIDIA opened their LLVM-based CUDA compiler. This allows all interesting parties to target their GPUs, APUs and CPUs with CUDA. There is already a CUDA compiler targeting multi/many-core CPUs (by PGI). In this sense CUDA has now become a full-fledged competitor for OpenCL. It's now up to the Khronos Group to win this competition - as always, survival of the fittest...

I'm also in favor of open standards but they also need to deliver and be turned into marketable products. The best standard doesn't help if it's not adopted by a critical mass. If the Khronos Group would adopt something like the Java Community Process to develop the OpenCL standard itself things might work out, but right now they don't perform as they probably should in a competitive environment.

JM2C,
Oliver

PS: Back to topic! :-)

Einstein@Home Project

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 509540239
RAC: 116681

RE: just measurement, some

Quote:
just measurement, some statistics and a few ideas about attractiveness of different approaches for GPGPU.

You can compare HD79XX against my HD6950 here:
http://albert.phys.uwm.edu/workunit.php?wuid=75885
Computer 2209 runs a HD79XX
You are familiar with my configuration

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 759160415
RAC: 1132753

These two jobs are not really

These two jobs are not really comparable tho: one is a brand-new (1.25) prototype for an OpenCL app specifically modified to cure validation problems of HD6900 series cards. It is slower than the previous version 1.24.

Cheers
HB

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 509540239
RAC: 116681

RE: These two jobs are not

Quote:

These two jobs are not really comparable tho: one is a brand-new (1.25) prototype for an OpenCL app specifically modified to cure validation problems of HD6900 series cards. It is slower than the previous version 1.24.

Cheers
HB

1:45 @ Einstein (1.24) : 2:09 @ Albert (1.25)

Vit
Vit
Joined: 7 Jan 08
Posts: 23
Credit: 393350695
RAC: 0

Bikeman, Oliver Bock, Alex

Bikeman, Oliver Bock, Alex and others - thank you all! I bought it. Let me share results of my new 7970, for Arecibo 1.24 atiOpenCL:
1 task on GPU (0.5 CPU + 1 ATI GPU): 18-25 min, GPU Load by GPU-Z 40-45% (although Catalist Control Center show "activity 60%"), CPU Load by W7 TaskManager - 5% (it is ~40% load of one core)
2 tasks on GPU (0.5 CPU + 0.5 ATI GPU): ~38 min, GPU Load by GPU-Z 58-62% (Catalist CC - 80-84%), CPU Load - 3%
Have no idea why dispersion (17-25 min) in the case of 1 task so large if tasks need equal(?) amount of calculation. CPU is not heavy loaded by other tasks.

Whatever... if we assume that "1.22 BRP4 SSE" and "1.24 atiOpenCL" need the same amount of calculations, then for one task even on PCIe2.0 GPU 7970 1GHz is ~20 times faster (and ~5-7 times more energy efficient) than my i7-2600k 4.5 GHz CPU. Good job!
And thank you for inducement me to by GPU, I am going to check the progress in game industry for the last 7-8 years.

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 509540239
RAC: 116681

Congratulations! To get the

Congratulations!
To get the same results with my both cards I need to switch over to the 36h-day!

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 759160415
RAC: 1132753

You're welcome! We will

You're welcome!

We will continue to improve the ATi/AMD app, so that the energy efficiency should even go up some more in the not so distant future.

Cheers
HB

Petrion
Petrion
Joined: 30 Apr 08
Posts: 53
Credit: 1243186
RAC: 0

RE: Bikeman, Oliver Bock,

Quote:

Bikeman, Oliver Bock, Alex and others - thank you all! I bought it. Let me share results of my new 7970, for Arecibo 1.24 atiOpenCL:
1 task on GPU (0.5 CPU + 1 ATI GPU): 18-25 min, GPU Load by GPU-Z 40-45% (although Catalist Control Center show "activity 60%"), CPU Load by W7 TaskManager - 5% (it is ~40% load of one core)
2 tasks on GPU (0.5 CPU + 0.5 ATI GPU): ~38 min, GPU Load by GPU-Z 58-62% (Catalist CC - 80-84%), CPU Load - 3%
Have no idea why dispersion (17-25 min) in the case of 1 task so large if tasks need equal(?) amount of calculation. CPU is not heavy loaded by other tasks.

Nice. My rig isn't as buff but I can crunch one on average in 65 min (Catalyst GPU load 80%), but doing 2 tasks splits my GPU work between them taking 125 min (Catalyst GPU load 92%) to do both. Both ways uses 5% CPU load.

Win 7Pro X64, i5-2500K CPU @ 3.30GHz (OC 4.5GHz), AMD HD6850, PCIe 2.0, 8GB 1600 RAM, BOINC 7.0.28

Run time(sec) 3,739.87
CPU time(sec) 530.31
Claimed credit 6.90
Granted credit 500.00

Petrion
Petrion
Joined: 30 Apr 08
Posts: 53
Credit: 1243186
RAC: 0

RE: And thank you for

Quote:

And thank you for inducement me to by GPU, I am going to check the progress in game industry for the last 7-8 years.

I'm a power-gamer and I use my gaming rig to crunch, thus my HD 6850. And the gaming industry has progressed a lot in the last 7-8 years. You should have fun. :)