Binary Radio Pulsar Search (Parkes PMPS XT) "BRP6"

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7221874931
RAC: 954127

So, if you mix different task

So, if you mix different task types (let alone different projects) on the GPU at one time, and compare completion times for different revisions of one of those task types, you are less comparing productivity of the two revisions than you are comparing their mean to time task turnover. When we say we are running 2X or 3X on a GPU, it is a bit of a lie, though a convenient shorthand, as at any given moment they are really actively running just one, though lots of useful state for another calculation is often residing inside the GPU, ready to resume at a moments notice. That notice arises when the currently running task needs external resource (data or computation obtained from the host system).

So, imagine for a moment, two releases having identical total resource consumption in all other respects, for which one on average gets twice as much internal work done as the other before needing external resource. Running these things unmixed you'll get the right answer as to total productivity. Running them mixed the one which releases the GPU to the next task in line twice as often (assuming that other task, say from SETI, is unchanged between the two test conditions) will get a smaller share of total GPU time, thus take longer to complete, thus be deemed less productive by the flawed measure of simple mixed load assessment.

Speaking as one who contributed multiple posts to the performance thread, on my own hosts I was quite diligent to avoid more than a few seconds of mixed load time, and intended not to report results from others where I had any indication that mixed loads were in use.

Not to engage too many topics in one post, but a little thought will extend this concern to conclusions on "playing well with others".

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1589509055
RAC: 761863

Thanx, you confirm my theory.

Thanx, you confirm my theory. I shall not mix tasks on GPUs.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250482473
RAC: 35099

Updated Mac OSX CUDA 5.5

Updated Mac OSX CUDA 5.5 version 1.56 is out for Beta testing.

BM

BM

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 578036876
RAC: 201329

Betreger wrote:Thanx, you

Betreger wrote:
Thanx, you confirm my theory. I shall not mix tasks on GPUs.


That should be "I shall not mix tasks on GPUs if I want to compare app differences for one of the projects". Otherwise this mixing will likely work just fine.

@Tom: Einstein depends heavily on GPU memory bandwidth. The GTX660 has 192 bit GDDR5 at 6.0 GHz data rate. The GTX960 has 128 bit GDDR5 at 7.0 GHz data rate. In practice the GTX960 runs its memory at 6.0 GHz under CUDA or OpenCL loads, just like the other Maxwell 2 cards (see there). The new color compression of Maxwell 2 (to save bandwidth) only works on games, so doesn't help here. All in all the good old GTX660 has a 50% bandwidth advantage! You should be able to tie it with your new card, though, if you fix the memory clock. You should also be able to OC it to somewhere between 7.5 and 7.8 GHz (judging by what we've seen on the other cards).

Keith wrote:
I know why. The FP32 and FP64 performance is better on the old 660 vs the 900 series. 1/24 FP32 vs 1/32 FP32. Look at this table. There are a lot of good things to say for the older designs with regard to math performance.


This is mostly true, but FP64 performance does not affect BRP in any way. And I wouldn't call 1/24 "good", it's barely enough for developing or an occasional instruction. But completely unsuitable for FP64 projects like Milkyway. There Tahiti with 1/4 FP64 is still the king, tied by the more expensive Hawaii (1/8 on gamer cards) and even beats AMDs new Fiji flagship with 1/16 (like the other GCN chips).

MrS

Scanning for our furry friends since Jan 2002

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18724203352
RAC: 6553772

RE: So, if you mix

Quote:

So, if you mix different task types (let alone different projects) on the GPU at one time, and compare completion times for different revisions of one of those task types, you are less comparing productivity of the two revisions than you are comparing their mean to time task turnover. When we say we are running 2X or 3X on a GPU, it is a bit of a lie, though a convenient shorthand, as at any given moment they are really actively running just one, though lots of useful state for another calculation is often residing inside the GPU, ready to resume at a moments notice. That notice arises when the currently running task needs external resource (data or computation obtained from the host system).

So, imagine for a moment, two releases having identical total resource consumption in all other respects, for which one on average gets twice as much internal work done as the other before needing external resource. Running these things unmixed you'll get the right answer as to total productivity. Running them mixed the one which releases the GPU to the next task in line twice as often (assuming that other task, say from SETI, is unchanged between the two test conditions) will get a smaller share of total GPU time, thus take longer to complete, thus be deemed less productive by the flawed measure of simple mixed load assessment.

Speaking as one who contributed multiple posts to the performance thread, on my own hosts I was quite diligent to avoid more than a few seconds of mixed load time, and intended not to report results from others where I had any indication that mixed loads were in use.

Not to engage too many topics in one post, but a little thought will extend this concern to conclusions on "playing well with others".

I guess I still haven't made myself clear. I am comparing apples to apples when evaluating the performance of the 1.57 app vs the 1.52. I never said it was a solo performance issue. I've never run a solo task on a GPU of ANY project. I have always done a mix of tasks. Two tasks per card back when I was using twin 670s and now three tasks per card on twin 970s. I am seeing a pretty significant increase in processing times with the 1.57 app, same conditions as with the 1.52 app. That is my apples to apples comparison. In my case of three tasks per card and multiple projects possible at any time on any card, I will be reverting back to the 1.52 app once I've cleared my cache of work because it processes faster. I was hoping for an improvement based on the supposed benefit of the CUDA 5.5 runtime libraries, but have not seen any in my environment. They may be working for other but they aren't for me.

 

Filipe
Filipe
Joined: 10 Mar 05
Posts: 186
Credit: 406198798
RAC: 381157

There is so much BRP6 work to

There is so much BRP6 work to do...

Server status page shows >460 days left with all the project GPU power comitted...

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 578036876
RAC: 201329

Keith, the runtime that BOINC

Keith, the runtime that BOINC shows is actually the elapsed time. It's not the actual GPU crunching time (otherwise that time woudln't increase upon running multi WUs concurrently). So if the new app gave more GPU time slots away to your other tasks, this could easily explain your observation. Did the SETI and/or MW tasks speed up since you switched to the new Einstein app?

MrS

Scanning for our furry friends since Jan 2002

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18724203352
RAC: 6553772

No, both the SETI and MW took

No, both the SETI and MW took a hit in increased runtimes with the 1.57 app. That is part of what I found to be so disappointing, the runtimes for all projects increased. I've reverted back to 1.52 on Pipsqueek when the cache of 1.57 ran out. Still doing 1.57 on the main system and will go back to 1.52 once it cleans out too. So far nothing has finished yet for the old 1.52 app but it looks like the progress is what I remembered and the main thing is that the runtimes for SETI and MW seem to have fallen back to normal. Of course, it could just be the current mix of tasks both computers are doing. I ran for four days on the new 1.57 app. Don't know maybe that wasn't long enough to get a really good baseline. Of course, it is probably expected that in the future that the beta 1.57 app is going to make it to main anyway and I will just have to accept the performance loss. Just has such high expectations and they have been pretty well crushed.

 

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi Keith, RE: No, both

Hi Keith,

Quote:
No, both the SETI and MW took a hit in increased runtimes with the 1.57 app. That is part of what I found to be so disappointing, the runtimes for all projects increased. I've reverted back to 1.52 on Pipsqueek when the cache of 1.57 ran out. Still doing 1.57 on the main system and will go back to 1.52 once it cleans out too. So far nothing has finished yet for the old 1.52 app but it looks like the progress is what I remembered and the main thing is that the runtimes for SETI and MW seem to have fallen back to normal. Of course, it could just be the current mix of tasks both computers are doing. I ran for four days on the new 1.57 app. Don't know maybe that wasn't long enough to get a really good baseline. Of course, it is probably expected that in the future that the beta 1.57 app is going to make it to main anyway and I will just have to accept the performance loss. Just has such high expectations and they have been pretty well crushed.

I've been experimenting with S@H MB tasks and E@H BRP6 1.57 ones.

I found that if I ran 2x 1.57 WU per GPU [GTX980)dev0] They took 1h 34m 44s to
complete;-( lousy throughput for a GTX980!

However if I ran 1x s@H MB wu in tandem with 1x BRP6 1.57 wu it was quite a bit faster. the 1.57 task completed in 1h 13m 17s. The S@H wu ran slower...
For BRP4G 1.52 running in tandem completion was 0h 22m 03s

The main problem with this method of crunching WU is its done manually:-/

It 'may' be possible to write a program to feed 2 diff project WU's to a GPU but if it is it wont be me doing the writing, I cant get my head around the app_info syntax let alone owt else:-)

NB:- Even on my slower GTX980 it was a lot faster..

Regards,

Cliff,

Been there, Done that, Still no damm T Shirt.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 578036876
RAC: 201329

Keith, the runtimes your PCs

Keith, the runtimes your PCs are achieving with 1.52 don't look any better than 1.57. There's certainly a large variation in runtime, so it's difficult to judge performance by eye. Did something else change along with the new app?

MrS

Scanning for our furry friends since Jan 2002

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.