New CUDA BRP4 app versions 1.28

ggesmundo
ggesmundo
Joined: 3 Jun 12
Posts: 31
Credit: 18,699,116
RAC: 0

On my host with the 560: ~35

On my host with the 560: ~35 mins (1.28) vs ~46 mins (1.25)
On my host with the 550TI ~110 mins (1.28) vs ~135 mins (1.25)

Doing 2 WUs per GPU with 1 CPU core free for each GPU in both hosts.

Sparrow
Sparrow
Joined: 4 Jul 11
Posts: 29
Credit: 10,701,417
RAC: 0

One task at a time on my 560

One task at a time on my 560 Ti:
1.24: about 1720 seconds
1.28: about 1100 seconds

So it only takes 64% of the time now. A fine performance improvement!

archae86
archae86
Joined: 6 Dec 05
Posts: 3,146
Credit: 7,100,794,931
RAC: 1,053,683

Horacio wrote:As was said,

Horacio wrote:
As was said, the speed up is much more awesome on faster GPUs.

I'd guess that a host with a fast GPU and a CPU with slow single-core performance might show the biggest improvement.

I'd also guess that the whole relationship of optimal number of simultaneous GPU tasks and optimal number of CPU cores held back from pure CPU processing has probably changed for many hosts. As I am running on summer power-throttling and preparing for some time away from home it is not a good time for me to rerun tests, but even without them I can see clear signs of major improvement caused by the new application on two hosts with GTX460 GPUs.

Christoph
Christoph
Joined: 25 Aug 05
Posts: 41
Credit: 5,954,206
RAC: 0

RE: So far that doesn't

Quote:

So far that doesn't seem to be making any difference as far as time per task Bikeman,

Any other tips?

Or is it because I run 2-core T4T's and LHC's at the same time as these cuda X2 tasks?

http://einsteinathome.org/host/4109993/tasks&offset=0&show_names=0&state=3


I think there was some discussion that it is good to free up one core (i.e. on a quad core use only three) to let it feed the GPU.
Otherwise the GPU is waiting for the data coming from the CPU most of the time.

Greetings, Christoph

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,142
Credit: 2,796,601,363
RAC: 720,719

RE: So far that doesn't

Quote:

So far that doesn't seem to be making any difference as far as time per task Bikeman,

Any other tips?

Or is it because I run 2-core T4T's and LHC's at the same time as these cuda X2 tasks?

http://einsteinathome.org/host/4109993/tasks&offset=0&show_names=0&state=3


None of us have much experience of the GTX 660 Ti or driver 304.87 yet.

As reported in the other v1.28 feedback thread, my GTX 670 with driver 304.79 showed a 25% reduction in runtime.

I think it's been reported that the 660 Ti has a narrower, and hence slower, GPU internal memory bus. We handle a lot of data (large data files) here at Einstein: that might explain why your card is showing less benefit than mine - or the difference might be the CPUs (my 670 is hosted by an overclocked i7).

archae86
archae86
Joined: 6 Dec 05
Posts: 3,146
Credit: 7,100,794,931
RAC: 1,053,683

MAGIC wrote:So far that

MAGIC wrote:
So far that doesn't seem to be making any difference as far as time per task Bikeman


I wonder how you are looking at it to see it that way.

Looking at the valid tasks lists for your host 4109993 I think I see an impressive transition. Without doing a formal summary, it appears that for 1.25 this host was posting elapsed times of about 4200 seconds and CPU times of about 1420 seconds, which seem to have come down to about 2800 and 640 for version 1.28--an impressive improvement by most standards.

Am I missing something?

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1,724
Credit: 1,116,072,347
RAC: 1,450,216

RE: MAGIC wrote:So far that

Quote:
MAGIC wrote:
So far that doesn't seem to be making any difference as far as time per task Bikeman

I wonder how you are looking at it to see it that way.

Looking at the valid tasks lists for your host 4109993 I think I see an impressive transition. Without doing a formal summary, it appears that for 1.25 this host was posting elapsed times of about 4200 seconds and CPU times of about 1420 seconds, which seem to have come down to about 2800 and 640 for version 1.28--an impressive improvement by most standards.

Am I missing something?

Yeah archae,

What we were talking about is that the time didn't change after setting the Boinc manager to change the "task checkpoint to disc" to 60 seconds since for some reason it was at zero and after I did change that and take a look at the time it was still the same.

Bikeman was saying he thought it should be even faster if I did that.

But as mentioned it could be because I run these quads using 2 cores for the T4T and an LHC task at the same time running 2X cuda's here.

You can take a look at the first batch of 1.28's that I did compared to the ones after I changed the "task checkpoint to disc" to 60 seconds.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,146
Credit: 7,100,794,931
RAC: 1,053,683

RE: What we were talking

Quote:
What we were talking about is that the time didn't change after setting the Boinc manager to change the "task checkpoint to disc" to 60 seconds since for some reason it was at zero and after I did change that and take a look at the time it was still the same.

I botched the context, sorry about that.

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1,724
Credit: 1,116,072,347
RAC: 1,450,216

Hey no problem archae

Hey no problem archae

dskagcommunity
dskagcommunity
Joined: 16 Mar 11
Posts: 89
Credit: 1,171,014,828
RAC: 267,609

Holy shit the new app is

Holy shit the new app is absolut fantastic!!

My Main(24/7)cruncher for Einstein is a 9800GTX that running in a slow PciE*4 Slot with a hard banged Mainboard running three concurrent GPU POEM Tasks in the other pciE*16 slot with a second 9800GTX. So the runtime was ~7000secs for one Einstein tasks and a reserved CPU Core (Q6600) due the limited pciE. Now it dropped to 4200!! secs. Thats the value, when i rememer right, i had with this card in e pciE*16 some time ago :D Thats so great for cards that have only 512MB memory and can only run one task at once AND/OR in slow pcie slots.

Great!! Thx !! *working from 6,5k up to a >10k RAC again* :)

DSKAG Austria Research Team: [LINK]http://www.research.dskag.at[/LINK]

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.