Binary Radio Pulsar Search (Parkes PMPS XT) "BRP6"

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7226788261

RAC: 1073009

So, if you mix different task

13 Jul 2015 2:39:44 UTC

Message 130091

(moderation:

)

So, if you mix different task types (let alone different projects) on the GPU at one time, and compare completion times for different revisions of one of those task types, you are less comparing productivity of the two revisions than you are comparing their mean to time task turnover. When we say we are running 2X or 3X on a GPU, it is a bit of a lie, though a convenient shorthand, as at any given moment they are really actively running just one, though lots of useful state for another calculation is often residing inside the GPU, ready to resume at a moments notice. That notice arises when the currently running task needs external resource (data or computation obtained from the host system).

So, imagine for a moment, two releases having identical total resource consumption in all other respects, for which one on average gets twice as much internal work done as the other before needing external resource. Running these things unmixed you'll get the right answer as to total productivity. Running them mixed the one which releases the GPU to the next task in line twice as often (assuming that other task, say from SETI, is unchanged between the two test conditions) will get a smaller share of total GPU time, thus take longer to complete, thus be deemed less productive by the flawed measure of simple mixed load assessment.

Speaking as one who contributed multiple posts to the performance thread, on my own hosts I was quite diligent to avoid more than a few seconds of mixed load time, and intended not to report results from others where I had any indication that mixed loads were in use.

Not to engage too many topics in one post, but a little thought will extend this concern to conclusions on "playing well with others".

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1592472348

RAC: 777638

Thanx, you confirm my theory.

13 Jul 2015 3:28:06 UTC

Message 130092 in response to message 130091

(moderation:

)

Thanx, you confirm my theory. I shall not mix tasks on GPUs.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250604155

RAC: 34692

Updated Mac OSX CUDA 5.5

13 Jul 2015 8:28:52 UTC

Message 130093

(moderation:

)

Updated Mac OSX CUDA 5.5 version 1.56 is out for Beta testing.

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 578753537

RAC: 199628

Betreger wrote:Thanx, you

13 Jul 2015 20:04:34 UTC

Message 130094 in response to message 130092

(moderation:

)

Betreger wrote:

Thanx, you confirm my theory. I shall not mix tasks on GPUs.

That should be "I shall not mix tasks on GPUs if I want to compare app differences for one of the projects". Otherwise this mixing will likely work just fine.

@Tom: Einstein depends heavily on GPU memory bandwidth. The GTX660 has 192 bit GDDR5 at 6.0 GHz data rate. The GTX960 has 128 bit GDDR5 at 7.0 GHz data rate. In practice the GTX960 runs its memory at 6.0 GHz under CUDA or OpenCL loads, just like the other Maxwell 2 cards (see there). The new color compression of Maxwell 2 (to save bandwidth) only works on games, so doesn't help here. All in all the good old GTX660 has a 50% bandwidth advantage! You should be able to tie it with your new card, though, if you fix the memory clock. You should also be able to OC it to somewhere between 7.5 and 7.8 GHz (judging by what we've seen on the other cards).

Keith wrote:

I know why. The FP32 and FP64 performance is better on the old 660 vs the 900 series. 1/24 FP32 vs 1/32 FP32. Look at this table. There are a lot of good things to say for the older designs with regard to math performance.

This is mostly true, but FP64 performance does not affect BRP in any way. And I wouldn't call 1/24 "good", it's barely enough for developing or an occasional instruction. But completely unsuitable for FP64 projects like Milkyway. There Tahiti with 1/4 FP64 is still the king, tied by the more expensive Hawaii (1/8 on gamer cards) and even beats AMDs new Fiji flagship with 1/16 (like the other GCN chips).

MrS

Scanning for our furry friends since Jan 2002

Keith Myers

Joined: 11 Feb 11

Posts: 4965

Credit: 18755887887

RAC: 7164152

RE: So, if you mix

13 Jul 2015 20:27:27 UTC

Message 130095 in response to message 130091

(moderation:

)

Quote:

So, if you mix different task types (let alone different projects) on the GPU at one time, and compare completion times for different revisions of one of those task types, you are less comparing productivity of the two revisions than you are comparing their mean to time task turnover. When we say we are running 2X or 3X on a GPU, it is a bit of a lie, though a convenient shorthand, as at any given moment they are really actively running just one, though lots of useful state for another calculation is often residing inside the GPU, ready to resume at a moments notice. That notice arises when the currently running task needs external resource (data or computation obtained from the host system).

So, imagine for a moment, two releases having identical total resource consumption in all other respects, for which one on average gets twice as much internal work done as the other before needing external resource. Running these things unmixed you'll get the right answer as to total productivity. Running them mixed the one which releases the GPU to the next task in line twice as often (assuming that other task, say from SETI, is unchanged between the two test conditions) will get a smaller share of total GPU time, thus take longer to complete, thus be deemed less productive by the flawed measure of simple mixed load assessment.

Speaking as one who contributed multiple posts to the performance thread, on my own hosts I was quite diligent to avoid more than a few seconds of mixed load time, and intended not to report results from others where I had any indication that mixed loads were in use.

Not to engage too many topics in one post, but a little thought will extend this concern to conclusions on "playing well with others".

I guess I still haven't made myself clear. I am comparing apples to apples when evaluating the performance of the 1.57 app vs the 1.52. I never said it was a solo performance issue. I've never run a solo task on a GPU of ANY project. I have always done a mix of tasks. Two tasks per card back when I was using twin 670s and now three tasks per card on twin 970s. I am seeing a pretty significant increase in processing times with the 1.57 app, same conditions as with the 1.52 app. That is my apples to apples comparison. In my case of three tasks per card and multiple projects possible at any time on any card, I will be reverting back to the 1.52 app once I've cleared my cache of work because it processes faster. I was hoping for an improvement based on the supposed benefit of the CUDA 5.5 runtime libraries, but have not seen any in my environment. They may be working for other but they aren't for me.

Filipe

Joined: 10 Mar 05

Posts: 186

Credit: 407297606

RAC: 356395

There is so much BRP6 work to

15 Jul 2015 16:31:02 UTC

Message 130096

(moderation:

)

There is so much BRP6 work to do...

Server status page shows >460 days left with all the project GPU power comitted...

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 578753537

RAC: 199628

Keith, the runtime that BOINC

15 Jul 2015 21:00:19 UTC

Message 130097 in response to message 130095

(moderation:

)

Keith, the runtime that BOINC shows is actually the elapsed time. It's not the actual GPU crunching time (otherwise that time woudln't increase upon running multi WUs concurrently). So if the new app gave more GPU time slots away to your other tasks, this could easily explain your observation. Did the SETI and/or MW tasks speed up since you switched to the new Einstein app?

MrS

Scanning for our furry friends since Jan 2002

Keith Myers

Joined: 11 Feb 11

Posts: 4965

Credit: 18755887887

RAC: 7164152

No, both the SETI and MW took

15 Jul 2015 21:41:36 UTC

Message 130098 in response to message 130097

(moderation:

)

No, both the SETI and MW took a hit in increased runtimes with the 1.57 app. That is part of what I found to be so disappointing, the runtimes for all projects increased. I've reverted back to 1.52 on Pipsqueek when the cache of 1.57 ran out. Still doing 1.57 on the main system and will go back to 1.52 once it cleans out too. So far nothing has finished yet for the old 1.52 app but it looks like the progress is what I remembered and the main thing is that the runtimes for SETI and MW seem to have fallen back to normal. Of course, it could just be the current mix of tasks both computers are doing. I ran for four days on the new 1.57 app. Don't know maybe that wasn't long enough to get a really good baseline. Of course, it is probably expected that in the future that the beta 1.57 app is going to make it to main anyway and I will just have to accept the performance loss. Just has such high expectations and they have been pretty well crushed.

cliff

Joined: 15 Feb 12

Posts: 176

Credit: 283452444

RAC: 0

Hi Keith, RE: No, both

17 Jul 2015 5:35:08 UTC

Message 130099 in response to message 130098

(moderation:

)

Hi Keith,

Quote:

No, both the SETI and MW took a hit in increased runtimes with the 1.57 app. That is part of what I found to be so disappointing, the runtimes for all projects increased. I've reverted back to 1.52 on Pipsqueek when the cache of 1.57 ran out. Still doing 1.57 on the main system and will go back to 1.52 once it cleans out too. So far nothing has finished yet for the old 1.52 app but it looks like the progress is what I remembered and the main thing is that the runtimes for SETI and MW seem to have fallen back to normal. Of course, it could just be the current mix of tasks both computers are doing. I ran for four days on the new 1.57 app. Don't know maybe that wasn't long enough to get a really good baseline. Of course, it is probably expected that in the future that the beta 1.57 app is going to make it to main anyway and I will just have to accept the performance loss. Just has such high expectations and they have been pretty well crushed.

I've been experimenting with S@H MB tasks and E@H BRP6 1.57 ones.

I found that if I ran 2x 1.57 WU per GPU [GTX980)dev0] They took 1h 34m 44s to
complete;-( lousy throughput for a GTX980!

However if I ran 1x s@H MB wu in tandem with 1x BRP6 1.57 wu it was quite a bit faster. the 1.57 task completed in 1h 13m 17s. The S@H wu ran slower...
For BRP4G 1.52 running in tandem completion was 0h 22m 03s

The main problem with this method of crunching WU is its done manually:-/

It 'may' be possible to write a program to feed 2 diff project WU's to a GPU but if it is it wont be me doing the writing, I cant get my head around the app_info syntax let alone owt else:-)

NB:- Even on my slower GTX980 it was a lot faster..

Regards,

Cliff,

Been there, Done that, Still no damm T Shirt.

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 578753537

RAC: 199628

Keith, the runtimes your PCs

17 Jul 2015 21:03:58 UTC

Message 130100

(moderation:

)

Keith, the runtimes your PCs are achieving with 1.52 don't look any better than 1.57. There's certainly a large variation in runtime, so it's difficult to judge performance by eye. Did something else change along with the new app?

MrS

Scanning for our furry friends since Jan 2002

Binary Radio Pulsar Search (Parkes PMPS XT) "BRP6"

Forums › Technical News

Comment viewing options

Forums › Technical News