It is unlikely that we will ever put work into an ATI Stream application.
We're looking into OpenCL on ATI, NVidia and Cell, but it may take some time. Currently the speed of our OpenCL Einstein@home App is rather disappointing. In addition before Mac OS 10.6.2 OpenCL on ATI was unusable due to a horribly buggy runtime compiler.
If, and when, the OpenCL GPU clients are released you will be pleased with the ATI GPUs - especially the HD48xx series and the new HD58xx series, which is 4x faster than it's predecessor. On MW and Collatz these cards are giving RACs in the order of 250K, so a lot of science is being done quickly and accurately (double precision with MW)
Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!
If, and when, the OpenCL GPU clients are released you will be pleased with the ATI GPUs - especially the HD48xx series and the new HD58xx series, which is 4x faster than it's predecessor. On MW and Collatz these cards are giving RACs in the order of 250K, so a lot of science is being done quickly and accurately (double precision with MW)
The problem is not the speed of calculations, but the memory access pattern of the current (GW search) algorithm. It's so wild that on NVidia hundreds of threads are blocking each other. I don't think that ATI cards have a significantly different memory architecture than NVidia has, or that the compiler is more advanced so that it could unscramble that.
The problem is not the speed of calculations, but the memory access pattern of the current (GW search) algorithm.
BM
may be it is needed to be rewrited?
you losing an enormous computing power of modern GPUs what world give to you just for free. i'm (as many others) ready to give you my 2.7 Tflops gpu too, but you don't want to use it. so i'm switched to another projects, what else i can do?
The problem is not the speed of calculations, but the memory access pattern of the current (GW search) algorithm.
BM
may be it is needed to be rewrited?
you losing an enormous computing power of modern GPUs what world give to you just for free. i'm (as many others) ready to give you my 2.7 Tflops gpu too, but you don't want to use it. so i'm switched to another projects, what else i can do?
not as big as you might think,, the "average" user still uses pc's made with onboard graphics. so no big loss.
it the gamers and power users that e@h is missing out on. a fraction of a percentage.
seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.
you losing an enormous computing power of modern GPUs what world give to you just for free. i'm (as many others) ready to give you my 2.7 Tflops gpu too, but you don't want to use it. so i'm switched to another projects, what else i can do?
That's the benefit of BOINC, you have other projects as options. MilkyWay makes good use of ATI GPUs. You still can opt Einstein CPU only, and MilkyWay for GPU if it makes you feel better.
it the gamers and power users that e@h is missing out on. a fraction of a percentage.
but top gpus performs as many hundreds of cpus.
for example, computing power of radeon 5870 is 2.7 Tflops, one core of modern amd cpu only about 3 Gflops, so 2700/3=900 times. so even fraction of a percentage can be significally enought to take them into account. am i rigth?
You still can opt Einstein CPU only, and MilkyWay for GPU if it makes you feel better.
but i want to dedicate my gpu for processing data from real world observation, which, i'm not sure, present in milkyway. as far, as i understand, MW@H perform a theoretical calculation.
but i want to dedicate my gpu for processing data from real world observation, which, i'm not sure, present in milkyway. as far, as i understand, MW@H perform a theoretical calculation.
Well E@H has caution in mind, so the GPU aspect is an evolving programme. The developers are aware of all sorts of possibilities but mainly want to get it right and build upon success.
One key feature is that GPUs aren't simply faster CPUs. They have a massively parallel/pipelined aspect ( thousands of threads ) that a general purpose processor doesn't have. The speed advantage comes from that parallelism. So the basic calculations in a task need to be able to be chopped up into that large a number of similiar subtasks. As Bernd mentioned the memory interactions b/w threads is significant - many threads are awaiting completion from other threads, so the parallel advantage is being lost. GPUs are really designed for matters which are truly parallel - like a thread per pixel or somesuch. The E@H parameter spaces aren't that simple.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
it the gamers and power users that e@h is missing out on. a fraction of a percentage.
but top gpus performs as many hundreds of cpus.
for example, computing power of radeon 5870 is 2.7 Tflops, one core of modern amd cpu only about 3 Gflops, so 2700/3=900 times. so even fraction of a percentage can be significally enought to take them into account. am i rigth?
I apologise that I put in a ward, but you compare very different things.
2.7 TFLOPS it is peak (theoretically possible, but not accessible in practice) speed. Really possible speed of this chip much more low.
And 3 GFLOPS which you have mentioned as an example is a real (practical) speed of one CPU Core without usage of special optimization (and in E@H they are used)
So to compare them it is incorrect - it is possible to compare only real speed (defined at usage of the practical application) with real, or as a last resort peak to the peak.
For example peak speed of the Intel Core 2 Quad processor(widespread and already slightly become outdated), working at 3 GHz will be: 4 (quantity of cores) *3 (frequency) *4 (instructions for clock tick) =48 GPLOPS
So ratio in peak speed will be 2700/48=56 times.
Real speed will differ even less since on CPU it is possible to reach easily enough efficiency in 60-90 % (60-90 % of real speed in comparison with theoretical/peak), and on GPU depending on the carried out task, real speed can be only 10-20 % from theoretical (depends on algorithm and sort of the processed data).
And in the end it is necessary to mark that the designated speed of Radeon 5870 (2700 GFLOPS) is fair only for calculations with single precision, and at double precision usage, its peak speed falls to 544 GFLOPS.
Usually all scientific projects use double precision (single precision is good for entertainments to which first of all GPUs are oriented, instead of for a science) and if E@H too use double precision real speed of Radeon 5870 will be in 5 times more low.
P.S.
Despite all aforesaid modern GPUs still much faster in comparison with modern CPUs. But is simple in a reality a difference much less, than it seems at first sight, or after perusal of marketing press releases of manufacturers of graphics chips. :)
ATI GPU
)
It is unlikely that we will ever put work into an ATI Stream application.
We're looking into OpenCL on ATI, NVidia and Cell, but it may take some time. Currently the speed of our OpenCL Einstein@home App is rather disappointing. In addition before Mac OS 10.6.2 OpenCL on ATI was unusable due to a horribly buggy runtime compiler.
BM
BM
If, and when, the OpenCL GPU
)
If, and when, the OpenCL GPU clients are released you will be pleased with the ATI GPUs - especially the HD48xx series and the new HD58xx series, which is 4x faster than it's predecessor. On MW and Collatz these cards are giving RACs in the order of 250K, so a lot of science is being done quickly and accurately (double precision with MW)
Shih-Tzu are clever, cuddly, playful and rule!! Jack Russell are feisty!
RE: If, and when, the
)
The problem is not the speed of calculations, but the memory access pattern of the current (GW search) algorithm. It's so wild that on NVidia hundreds of threads are blocking each other. I don't think that ATI cards have a significantly different memory architecture than NVidia has, or that the compiler is more advanced so that it could unscramble that.
BM
BM
RE: The problem is not the
)
may be it is needed to be rewrited?
you losing an enormous computing power of modern GPUs what world give to you just for free. i'm (as many others) ready to give you my 2.7 Tflops gpu too, but you don't want to use it. so i'm switched to another projects, what else i can do?
RE: RE: The problem is
)
not as big as you might think,, the "average" user still uses pc's made with onboard graphics. so no big loss.
it the gamers and power users that e@h is missing out on. a fraction of a percentage.
seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.
RE: may be it is needed to
)
That's the benefit of BOINC, you have other projects as options. MilkyWay makes good use of ATI GPUs. You still can opt Einstein CPU only, and MilkyWay for GPU if it makes you feel better.
RE: it the gamers and
)
but top gpus performs as many hundreds of cpus.
for example, computing power of radeon 5870 is 2.7 Tflops, one core of modern amd cpu only about 3 Gflops, so 2700/3=900 times. so even fraction of a percentage can be significally enought to take them into account. am i rigth?
RE: You still can opt
)
but i want to dedicate my gpu for processing data from real world observation, which, i'm not sure, present in milkyway. as far, as i understand, MW@H perform a theoretical calculation.
RE: but i want to dedicate
)
Well E@H has caution in mind, so the GPU aspect is an evolving programme. The developers are aware of all sorts of possibilities but mainly want to get it right and build upon success.
One key feature is that GPUs aren't simply faster CPUs. They have a massively parallel/pipelined aspect ( thousands of threads ) that a general purpose processor doesn't have. The speed advantage comes from that parallelism. So the basic calculations in a task need to be able to be chopped up into that large a number of similiar subtasks. As Bernd mentioned the memory interactions b/w threads is significant - many threads are awaiting completion from other threads, so the parallel advantage is being lost. GPUs are really designed for matters which are truly parallel - like a thread per pixel or somesuch. The E@H parameter spaces aren't that simple.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
RE: RE: it the gamers
)
I apologise that I put in a ward, but you compare very different things.
2.7 TFLOPS it is peak (theoretically possible, but not accessible in practice) speed. Really possible speed of this chip much more low.
And 3 GFLOPS which you have mentioned as an example is a real (practical) speed of one CPU Core without usage of special optimization (and in E@H they are used)
So to compare them it is incorrect - it is possible to compare only real speed (defined at usage of the practical application) with real, or as a last resort peak to the peak.
For example peak speed of the Intel Core 2 Quad processor(widespread and already slightly become outdated), working at 3 GHz will be: 4 (quantity of cores) *3 (frequency) *4 (instructions for clock tick) =48 GPLOPS
So ratio in peak speed will be 2700/48=56 times.
Real speed will differ even less since on CPU it is possible to reach easily enough efficiency in 60-90 % (60-90 % of real speed in comparison with theoretical/peak), and on GPU depending on the carried out task, real speed can be only 10-20 % from theoretical (depends on algorithm and sort of the processed data).
And in the end it is necessary to mark that the designated speed of Radeon 5870 (2700 GFLOPS) is fair only for calculations with single precision, and at double precision usage, its peak speed falls to 544 GFLOPS.
Usually all scientific projects use double precision (single precision is good for entertainments to which first of all GPUs are oriented, instead of for a science) and if E@H too use double precision real speed of Radeon 5870 will be in 5 times more low.
P.S.
Despite all aforesaid modern GPUs still much faster in comparison with modern CPUs. But is simple in a reality a difference much less, than it seems at first sight, or after perusal of marketing press releases of manufacturers of graphics chips. :)