New CUDA BRP4 app versions 1.28

ggesmundo

Joined: 3 Jun 12

Posts: 31

Credit: 18699116

RAC: 0

On my host with the 560: ~35

2 Sep 2012 4:46:17 UTC

Message 111215

(moderation:

)

On my host with the 560: ~35 mins (1.28) vs ~46 mins (1.25)
On my host with the 550TI ~110 mins (1.28) vs ~135 mins (1.25)

Doing 2 WUs per GPU with 1 CPU core free for each GPU in both hosts.

Sparrow

Joined: 4 Jul 11

Posts: 29

Credit: 10701417

RAC: 0

One task at a time on my 560

2 Sep 2012 12:03:13 UTC

Message 111216

(moderation:

)

One task at a time on my 560 Ti:
1.24: about 1720 seconds
1.28: about 1100 seconds

So it only takes 64% of the time now. A fine performance improvement!

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7232807922

RAC: 1166269

Horacio wrote:As was said,

2 Sep 2012 14:38:09 UTC

Message 111217 in response to message 111214

(moderation:

)

Horacio wrote:

As was said, the speed up is much more awesome on faster GPUs.

I'd guess that a host with a fast GPU and a CPU with slow single-core performance might show the biggest improvement.

I'd also guess that the whole relationship of optimal number of simultaneous GPU tasks and optimal number of CPU cores held back from pure CPU processing has probably changed for many hosts. As I am running on summer power-throttling and preparing for some time away from home it is not a good time for me to rerun tests, but even without them I can see clear signs of major improvement caused by the new application on two hosts with GTX460 GPUs.

Christoph

Joined: 25 Aug 05

Posts: 41

Credit: 5954206

RAC: 0

RE: So far that doesn't

2 Sep 2012 21:10:01 UTC

Message 111218 in response to message 111212

(moderation:

)

Quote:

So far that doesn't seem to be making any difference as far as time per task Bikeman,

Any other tips?

Or is it because I run 2-core T4T's and LHC's at the same time as these cuda X2 tasks?

http://einsteinathome.org/host/4109993/tasks&offset=0&show_names=0&state=3

I think there was some discussion that it is good to free up one core (i.e. on a quad core use only three) to let it feed the GPU.
Otherwise the GPU is waiting for the data coming from the CPU most of the time.

Greetings, Christoph

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2962572491

RAC: 692515

RE: So far that doesn't

2 Sep 2012 21:28:38 UTC

Message 111219 in response to message 111212

(moderation:

)

Quote:

So far that doesn't seem to be making any difference as far as time per task Bikeman,

Any other tips?

Or is it because I run 2-core T4T's and LHC's at the same time as these cuda X2 tasks?

http://einsteinathome.org/host/4109993/tasks&offset=0&show_names=0&state=3

None of us have much experience of the GTX 660 Ti or driver 304.87 yet.

As reported in the other v1.28 feedback thread, my GTX 670 with driver 304.79 showed a 25% reduction in runtime.

I think it's been reported that the 660 Ti has a narrower, and hence slower, GPU internal memory bus. We handle a lot of data (large data files) here at Einstein: that might explain why your card is showing less benefit than mine - or the difference might be the CPUs (my 670 is hosted by an overclocked i7).

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7232807922

RAC: 1166269

MAGIC wrote:So far that

2 Sep 2012 21:28:42 UTC

Message 111220 in response to message 111212

(moderation:

)

MAGIC wrote:

So far that doesn't seem to be making any difference as far as time per task Bikeman

I wonder how you are looking at it to see it that way.

Looking at the valid tasks lists for your host 4109993 I think I see an impressive transition. Without doing a formal summary, it appears that for 1.25 this host was posting elapsed times of about 4200 seconds and CPU times of about 1420 seconds, which seem to have come down to about 2800 and 640 for version 1.28--an impressive improvement by most standards.

Am I missing something?

MAGIC Quantum M...

Joined: 18 Jan 05

Posts: 1895

Credit: 1417314400

RAC: 1128665

RE: MAGIC wrote:So far that

2 Sep 2012 21:45:37 UTC

Message 111221 in response to message 111220

(moderation:

)

Quote:

MAGIC wrote:
So far that doesn't seem to be making any difference as far as time per task Bikeman

I wonder how you are looking at it to see it that way.

Looking at the valid tasks lists for your host 4109993 I think I see an impressive transition. Without doing a formal summary, it appears that for 1.25 this host was posting elapsed times of about 4200 seconds and CPU times of about 1420 seconds, which seem to have come down to about 2800 and 640 for version 1.28--an impressive improvement by most standards.

Am I missing something?

Yeah archae,

What we were talking about is that the time didn't change after setting the Boinc manager to change the "task checkpoint to disc" to 60 seconds since for some reason it was at zero and after I did change that and take a look at the time it was still the same.

Bikeman was saying he thought it should be even faster if I did that.

But as mentioned it could be because I run these quads using 2 cores for the T4T and an LHC task at the same time running 2X cuda's here.

You can take a look at the first batch of 1.28's that I did compared to the ones after I changed the "task checkpoint to disc" to 60 seconds.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7232807922

RAC: 1166269

RE: What we were talking

2 Sep 2012 22:21:13 UTC

Message 111222 in response to message 111221

(moderation:

)

Quote:

What we were talking about is that the time didn't change after setting the Boinc manager to change the "task checkpoint to disc" to 60 seconds since for some reason it was at zero and after I did change that and take a look at the time it was still the same.

I botched the context, sorry about that.

MAGIC Quantum M...

Joined: 18 Jan 05

Posts: 1895

Credit: 1417314400

RAC: 1128665

Hey no problem archae

3 Sep 2012 23:51:20 UTC

Message 111223 in response to message 111222

(moderation:

)

Hey no problem archae

dskagcommunity

Joined: 16 Mar 11

Posts: 89

Credit: 1218390735

RAC: 167587

Holy shit the new app is

4 Sep 2012 10:46:35 UTC

Message 111224

(moderation:

)

Holy shit the new app is absolut fantastic!!

My Main(24/7)cruncher for Einstein is a 9800GTX that running in a slow PciE*4 Slot with a hard banged Mainboard running three concurrent GPU POEM Tasks in the other pciE*16 slot with a second 9800GTX. So the runtime was ~7000secs for one Einstein tasks and a reserved CPU Core (Q6600) due the limited pciE. Now it dropped to 4200!! secs. Thats the value, when i rememer right, i had with this card in e pciE*16 some time ago :D Thats so great for cards that have only 512MB memory and can only run one task at once AND/OR in slow pcie slots.

Great!! Thx !! *working from 6,5k up to a >10k RAC again* :)

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

New CUDA BRP4 app versions 1.28

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner