ABP2 CUDA applications

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,221
Credit: 137,332,827
RAC: 24,309

Two successful ABP2's (

Two successful ABP2's ( cuda23 ) finished without adverse note ( here and here ). Just under 2600 seconds ( 43 minutes ) each on my i7 with NVidia GForce 9800 GT.

There's another ABP2cuda23 in the pipe right now, with another waiting ( and 2 CPU ABP2 in the queue ).

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

tolafoph
tolafoph
Joined: 14 Sep 07
Posts: 122
Credit: 74,642,612
RAC: 0

RE: My i5 750 paired with a

Message 96264 in response to message 96262

Quote:
My i5 750 paired with a gtx 260 is plowing through the abp2 cuda tasks in 30-35 minutes. That compares to roughly 280 minutes for abp1 tasks. 8.8x +/- speedup. Very nice. :)

Correct if I´m wrong but I´m not sure about the speedup. The granted credits for ABP2 are only 50 vs. the 250 for the old ABP1 ones. But this is still a speedup of 80-90%.

Sascha

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1,358
Credit: 2,360,136,644
RAC: 3,121,159

RE: Two successful ABP2's (

Message 96265 in response to message 96263

Quote:

Two successful ABP2's ( cuda23 ) finished without adverse note ( here and here ). Just under 2600 seconds ( 43 minutes ) each on my i7 with NVidia GForce 9800 GT.

There's another ABP2cuda23 in the pipe right now, with another waiting ( and 2 CPU ABP2 in the queue ).

Cheers, Mike.

what is your GPU/CPU load with these WU's? 1600 credits/day is rather underwhelming. MY 260 (twice as fast) got ~ 10k/day on GPU grid and about 30k/day on collatz.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1,358
Credit: 2,360,136,644
RAC: 3,121,159

RE: RE: My i5 750 paired

Message 96266 in response to message 96264

Quote:
Quote:
My i5 750 paired with a gtx 260 is plowing through the abp2 cuda tasks in 30-35 minutes. That compares to roughly 280 minutes for abp1 tasks. 8.8x +/- speedup. Very nice. :)

Correct if I´m wrong but I´m not sure about the speedup. The granted credits for ABP2 are only 50 vs. the 250 for the old ABP1 ones. But this is still a speedup of 80-90%.

Sascha

The granted credit is hardcoded and they probably chose to err on the high side initially to avoid any public outcry.

Svenie25
Svenie25
Joined: 21 Mar 05
Posts: 139
Credit: 2,436,862
RAC: 0

RE: RE: Something is very

Message 96267 in response to message 96261

Quote:
Quote:
Something is very strange. The BOINC Manager is requesting work for CPU and GPU but only gets S5R6 tasks. This repeats every minute. There will be no ABP2 work. Because of this behaviour I ran yesterday and today into the mayium daily quote. I am now full of S5R6 tasks, the manager is in high priority mode and I hope to get the work finished befor the deadline.

I discovered the same problem yesterday on one of my systems. I told it not to request more work (until the cache runs down a bit).

Today the problem is the same again. BOINC is running in high priority and requesting nevertheless work for the CPU. I set it to NNT and will do some math, if I can do the work in time.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,221
Credit: 137,332,827
RAC: 24,309

RE: RE: Two successful

Message 96268 in response to message 96265

Quote:
Quote:

Two successful ABP2's ( cuda23 ) finished without adverse note ( here and here ). Just under 2600 seconds ( 43 minutes ) each on my i7 with NVidia GForce 9800 GT.

There's another ABP2cuda23 in the pipe right now, with another waiting ( and 2 CPU ABP2 in the queue ).

Cheers, Mike.

what is your GPU/CPU load with these WU's? 1600 credits/day is rather underwhelming. MY 260 (twice as fast) got ~ 10k/day on GPU grid and about 30k/day on collatz.


Well, the other threads ( 8 in all ) are CPU/GW's. RAC is ~ 3500 at present.

I'll pass on some numbers ( first ~ 2000 valid work units reported back ) from Dr Allen that indicate a range of 20 minutes to 7 hours, with an average of 61 minutes and standard deviation of about 33 minutes. The times of ABP1 to ABP2 is about 7.3 to 1. Top performers 'typically have CUDA GeForce 275/285/295 cards with ~ 1GB of memory'.

( Thus the 250 -> 50 credits per WU/task is generous, ~ 34 would be the strict number )

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,041
Credit: 746,825,177
RAC: 753,311

RE: ( Thus the 250 -> 50

Message 96269 in response to message 96268

Quote:

( Thus the 250 -> 50 credits per WU/task is generous, ~ 34 would be the strict number )

Cheers, Mike.


Be careful, Mike - you'll get into deep water here. Look before you leap.

How are you defining 'strict' in that sentence?

Originally, credit was defined as a function of [CPU] time * benchmark. Because of the bogus benchmark phenomenon and other anomalies, Einstein switched to fixed credits - but has always striven to keep them comparable.

How does that translate to CUDA? One could argue that a 'full' CUDA app, with all processing transferred to the GPU - not the half-and-half achieved so far - should be scored on [GPU] time * [GPU] benchmark.

Newer BOINC clients report two runtimes back to the server - elapsed_time and CPU_time. Neither is a true measure of GPU usage, but for that hypothetical 'full' CUDA application, elapsed_ will be a reasonable surrogate for GPU_, and CPU_ will be small-to-vanishing. I don't know what is being stored in the Einstein database currently, but Einstein's web page code - which is very old - is only displaying CPU_time for completed results, and that gives a very skewed impression for CUDA tasks. Newer BOINC server revisions store and display both elapsed_ ("Run time") and CPU_time - have a look at one of my GPUGrid hosts to see what a dramatic difference this makes.

And the other half of the function is the benchmark.

GPUs aren't benchmarked by BOINC, but their speed is known - derived from the shader clock speed and the number of shaders on the card. Unfortunately, different versions of BOINC report the speed on different scales.

BOINC v6.10.14 and above reports "GFLOPS peak", which I refer to as 'marketing flops': they bear no relationship to real-world scientific computing power.

BOINC v6.10.13 and below reports "est. GFLOPS". This is a made-up figure derived from observations of the performance of the original SETI CUDA application on David Anderson's Quadro FX 3700 GPU. I call these 'BOINC flops': in spite of the dodgy scientific standing of their derivation, I think they're actually not too bad, and would be a reasonable figure to plug into the credit function to get a first rough guesstimate of CUDA credit. There are 5.6 'marketing' flops in each 'BOINC' flop.

There are no equivalents to 'BOINC flops' for ATI cards: ever since BOINC started to support ATI cards, their speeds have been reported on the 'marketing' scale.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,221
Credit: 137,332,827
RAC: 24,309

RE: RE: ( Thus the 250 ->

Message 96270 in response to message 96269

Quote:
Quote:

( Thus the 250 -> 50 credits per WU/task is generous, ~ 34 would be the strict number )

Cheers, Mike.


Be careful, Mike - you'll get into deep water here. Look before you leap.

How are you defining 'strict' in that sentence?

Originally, credit was defined as a function of [CPU] time * benchmark. Because of the bogus benchmark phenomenon and other anomalies, Einstein switched to fixed credits - but has always striven to keep them comparable.

How does that translate to CUDA? One could argue that a 'full' CUDA app, with all processing transferred to the GPU - not the half-and-half achieved so far - should be scored on [GPU] time * [GPU] benchmark.

Newer BOINC clients report two runtimes back to the server - elapsed_time and CPU_time. Neither is a true measure of GPU usage, but for that hypothetical 'full' CUDA application, elapsed_ will be a reasonable surrogate for GPU_, and CPU_ will be small-to-vanishing. I don't know what is being stored in the Einstein database currently, but Einstein's web page code - which is very old - is only displaying CPU_time for completed results, and that gives a very skewed impression for CUDA tasks. Newer BOINC server revisions store and display both elapsed_ ("Run time") and CPU_time - have a look at one of my GPUGrid hosts to see what a dramatic difference this makes.

And the other half of the function is the benchmark.

GPUs aren't benchmarked by BOINC, but their speed is known - derived from the shader clock speed and the number of shaders on the card. Unfortunately, different versions of BOINC report the speed on different scales.

BOINC v6.10.14 and above reports "GFLOPS peak", which I refer to as 'marketing flops': they bear no relationship to real-world scientific computing power.

BOINC v6.10.13 and below reports "est. GFLOPS". This is a made-up figure derived from observations of the performance of the original SETI CUDA application on David Anderson's Quadro FX 3700 GPU. I call these 'BOINC flops': in spite of the dodgy scientific standing of their derivation, I think they're actually not too bad, and would be a reasonable figure to plug into the credit function to get a first rough guesstimate of CUDA credit. There are 5.6 'marketing' flops in each 'BOINC' flop.

There are no equivalents to 'BOINC flops' for ATI cards: ever since BOINC started to support ATI cards, their speeds have been reported on the 'marketing' scale.

Err, 250 / 7.3 ~ 34 < 50 ???

I'm doing ( I think ) an ABP1 to ABP2 comparison ..... are you ?

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

Michael Goetz
Joined: 11 Feb 05
Posts: 19
Credit: 1,957,843
RAC: 0

RE: I'll pass on some

Message 96271 in response to message 96268

Quote:

I'll pass on some numbers ( first ~ 2000 valid work units reported back ) from Dr Allen that indicate a range of 20 minutes to 7 hours, with an average of 61 minutes and standard deviation of about 33 minutes. The times of ABP1 to ABP2 is about 7.3 to 1. Top performers 'typically have CUDA GeForce 275/285/295 cards with ~ 1GB of memory'.

( Thus the 250 -> 50 credits per WU/task is generous, ~ 34 would be the strict number )

Cheers, Mike.

I haven't run any of the new ABP2s on my GPU yet, but for argument's sake, let's say my GTX280 comes in around 30 minutes. A rate of 100 credits per hour (or a RAC of 2,400) is significantly lower than what is seen on other projects.

On GPUGRID, a 6 to 9 hour hour WU yields 6,000 credits, yielding a RAC of around 17,000 to 24,000.

Milkyway WUs get crunched in around 15 minutes for about 213 credits, yielding a RAC of about 18,000.

SETI averages about 145 credits per 24 minute WU, yielding a RAC of about 8,700.

That puts ABP2 anywhere from 1/3 to 1/10 of what I see on other projects.

Now, the credits aren't really important or of any significance, but if they're consistent across projects a much lower RAC should also indicate that less science is being computed. If the actual work being done on my hardware is also only 1/3 to 1/10 of the work other projects do on my GPU, then that would be bad, and a waste of the substantial computing power available on that class of GPU.

That being said, RAC is far from the most accurate way to compare work between projects. For a GPU, I find that measuring the GPU temperature is a very good metric for how efficiently an application is utilizing the GPU.

ABP1's GPU temperature ran very, very, cold. The GPU was hardly being used at all by ABP1. But what about ABP2?

I enabled GPU tasks on Einstein to test ABP2, but, alas, I wasn't given any. I'll leave GPU enabled and CPU disabled for now hoping I can observe the operating temperature while it's running, which should provide a good indication of how heavily it's using the GPU. I don't care if the RAC is low, but if ABP2 is tying up my GPU for half an hour, it needs to be using that GPU efficiently.

Mike

Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,041
Credit: 746,825,177
RAC: 753,311

RE: I'm doing ( I think )

Message 96272 in response to message 96270

Quote:

I'm doing ( I think ) an ABP1 to ABP2 comparison ..... are you ?

Cheers, Mike.


No, I'm staying well away from suggesting any figure at all!

The trouble is, the current 'hybrid' part-CPU and part-CUDA application is a grossly inefficient use of hardware - there are many messages on other forums advising people with CUDA cards not to treat Einstein as a 'production' CUDA project yet.

For the time being, I suggest you use the CPU application as the comparator for the value of the work done - that's a relatively well known animal, and in the other thread Gary is reporting that the estimate of 0.2x is proving pretty accurate.

Then, consider the CUDA hardware consumption from first principles using the speeds and times reported by Bruce Allen (you'll have to get him to supply a breakdown of elapsed_ and CPU_time), and plug them into the official BOINC credit formula. I think you'll be frightened by the result: use it as a measure of the current (in)efficiency of the CUDA app at this work-in-progress stage.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.