Times (Elapsed / CPU) for BRP5/6/6-Beta on various CPU/GPU combos - DISCUSSION Thread

JBird

Joined: 22 Dec 14

Posts: 1963

Credit: 4046216051

RAC: 0

Well yes, thanks for *that.

10 May 2015 22:28:14 UTC

Message 130848 in response to message 130847

(moderation:

)

Well yes, thanks for *that. Actually, I see 2-4 files uploaded when reporting/updating(probably 2 a "started" and a "finished" entry on both.

But actually most curious about the "Missing Checkpoints File/Directory" - obviously *called by the app yet reported "Not found/missing"

And the *presence of the "checkpoint_debug" diag flag - intuition that problem exists, can be diagnosed and potentially repaired; is what's on my plate here.

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 579640192

RAC: 201991

Yeah, the question about this

11 May 2015 19:36:57 UTC

Message 130849 in response to message 130848

(moderation:

)

Yeah, the question about this "missing checkpoint" pops up a lot. But it's answered by, right? Do you have an idea for a less confusing way for the app to report this?

MrS

Scanning for our furry friends since Jan 2002

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

RE: Well yes, thanks for

13 May 2015 17:08:40 UTC

Message 130850 in response to message 130848

(moderation:

)

Quote:

Well yes, thanks for *that. Actually, I see 2-4 files uploaded when reporting/updating(probably 2 a "started" and a "finished" entry on both.

But actually most curious about the "Missing Checkpoints File/Directory" - obviously *called by the app yet reported "Not found/missing"

And the *presence of the "checkpoint_debug" diag flag - intuition that problem exists, can be diagnosed and potentially repaired; is what's on my plate here.

All this talk about info messages about checkpoints are a red herring regarding the task run times. They are not errors.

If your card really is slower now you'll have to look for other causes:
1. Is it running at the same clock rate as before?
2. Is the rest of the machine running at the same clock rates as before?
3. Is the machine running the same types of tasks other than BRP6 as before?

To break the messages and their meaning down this is how I understand it:
1. Boinc starts a new task for the first time, there is no checkpoint file, so the app writes an informational message to the stderr saying so. <-- Normal
2. You run through the tasks and from your previous logs the app checkpoints normally. <-- Also normal
3. The main analysis is completed and the app moves over to sorting out the results, this is completed so fast that no checkpoint is needed hence the message that no checkpoint was written while it also presents some other statistics on "dirty SumSpec pages." <-- Also normal
4. The app then proceeds to start the 2nd bundled task and the whole thing repeats itself. <-- As the second bundled task is really a new task it does not have a checkpoint and a message saying so is written.

In the log from message140881 you can see these messages:

[13:43:39][4924][INFO ] Output file: '../../projects/einstein.phys.uwm.edu/PM0021_001B1_104_0_0' already exists - skipping pass

[13:43:40][4924][INFO ] Continuing work on ../../projects/einstein.phys.uwm.edu PM0021_001B1_105.bin4 at template no. 293663

Where the first tells you that the 1st bundled tasks is already done and the app moves on to the second bundled task. The second message tells you that the app is continuing work from a checkpoint.

JBird

Joined: 22 Dec 14

Posts: 1963

Credit: 4046216051

RAC: 0

Thanks Holmis (and

15 May 2015 17:13:28 UTC

Message 130851 in response to message 130850

(moderation:

)

Thanks Holmis (and MrS)-

All this talk about info messages about checkpoints are a red herring regarding the task run times. They are not errors.
=
I agree with "red herring" analogy - and thanks for the analysis of the messages
=
If your card really is slower now you'll have to look for other causes:
= actually it's the app that's running slower - not my under rated card--

GTX 960 SC 2048 GDDR5 w 8 multiprocs (CUs)/ Direct compute =5.2(shaders);is a Maxwell GM206 chip.
Detect routine in app is a bit Thin IMHO
And(not bashing here) the aging CUDA 3.2 (again, IMHO) is the primary bottleneck
= My card is limited only by 2 things: 1)it's running on a PCIe 2.0x16 Bus; and my CPU is not a Hyperthread- which if it was would activate Maxwells Unified Memory and improve CPU/Memory and GPU comm across the Bus/I/O
=
1. Is it running at the same clock rate as before?<-Yes, stable 1404.8Mhz core clock; 1752.8 Mem clock; mem used=301MB;Load=82%;1.2060Volts;Temp-64C; avg TDP-58%
+ avg CPU usage= (Lasso reports avg 3-4% thruout the run)

2. Is the rest of the machine running at the same clock rates as before?<- Yes i5 2500 - 4 cores/4 Threads at 3.3Ghz (Speedstep off and Turboboost on)cores stable at 58C 100% load -- and App Process Priority set to Above Normal, high I/O, Normal Mem - actually Bitsum Highest in Lasso.

3. Is the machine running the same types of tasks other than BRP6 as before?<-Yes- against/with 4ea SETI v7 cpu tasks(AVX) w avg 24% CPU usage
+ all 4 cores active on both sites.
I run Parkes BRP6 with stock config of 0.2CPUs + 1 NVIDIA GPU (GPU/CUDA apps suspend at SETI-)
=
So again, Thanks for the Looks and app idiosyncratic analyses and explanations.
All good, in furthering my knowledge of "HOW things work" (systemically) and how we interact *with them.
Kudos to the Devs and Admins

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 579640192

RAC: 201991

RE: And(not bashing here)

20 May 2015 19:31:59 UTC

Message 130852 in response to message 130851

(moderation:

)

Quote:

And(not bashing here) the aging CUDA 3.2 (again, IMHO) is the primary bottleneck

No, not really. The Devs are looking into using newer build environments, but so far the benefit of CUDA 5.5 has only been in the single digit percentage range if I remember correctly.

Quote:

My card is limited only by 2 things: 1)it's running on a PCIe 2.0x16 Bus; and my CPU is not a Hyperthread- which if it was would activate Maxwells Unified Memory and improve CPU/Memory and GPU comm across the Bus/I/O

The 16x PCIe 2.0 is perfectly fine with the new app. Even slower connections work nicely now. The old app used to be far more talkative and suffered from slower PCIe connections.

And HT would not magically speed up your GPU at Einstein. It would help keeping the GPU busy, though, if all CPU cores are crunching something else (as is the case in your system).

Unified memory is something which has to be used by the app explicitly or at least by the compiler.

Quote:

I run Parkes BRP6 with stock config of 0.2CPUs + 1 NVIDIA GPU

You could increase your Einstein throughput by running 2 WUs (0.2 CPU + 0.5 GPU) concurrently. This might also help avoiding any idle time which may occur because all CPU cores are busy.

MrS

Scanning for our furry friends since Jan 2002

DF1DX

Joined: 14 Aug 10

Posts: 105

Credit: 3897106854

RAC: 4976243

RE: RE: My card is

21 May 2015 7:08:48 UTC

Message 130853 in response to message 130852

(moderation:

)

Quote:

Quote:
My card is limited only by 2 things: 1)it's running on a PCIe 2.0x16 Bus; [...]

The 16x PCIe 2.0 is perfectly fine with the new app. Even slower connections work nicely now. The old app used to be far more talkative and suffered from slower PCIe connections.

MrS

Indeed.

Run times on my Host
Asus P8 MB, Z77 Express chipset, Windows 7

CPU: Intel i5-3570K CPU @ 3.8 GHz
GPU 0: Intel HD 4000
GPU 1: NVIDIA GTX 750Ti PCIe 3 x 16
GPU 2: NVIDIA GTX 750Ti PCIe 2 x 4 <--

BRP6 (Parkes PMPS XT v1.52)

Concurrency: 1 * 1 GPU:
GPU 0: ~8:45:00

Concurrency: 2 @ 0.5 CPUs + 0.5 GPUs:
GPU 1: ~4:00:00
GPU 2: ~4:05:00 <--

Only 5 minutes more!

JÃ¼rgen.

JBird

Joined: 22 Dec 14

Posts: 1963

Credit: 4046216051

RAC: 0

I'll get back to you on that

22 May 2015 23:08:39 UTC

Message 130854 in response to message 130852

(moderation:

)

I'll get back to you on that stuff MrS and DF1DX - and share my own findings comparatively on these points of discussion.
=
I just received (UPS):
ASUS Z97 M Plus mobo

Intel Core i7-4790K Devilâ€™s Canyon Quad-Core 4.0GHz LGA 1150 BX80646I74790K Desktop Processor Intel HD Graphics 4600
HASWELL Hyperthread 8 threads

Intel 730 Series SSDSC2BP240G4R5 2.5" 240GB SATA 6Gb/s MLC
=
Building this out tomorrow; then Win7 DVD Fresh plus 226 Windows Updates; appropriate New Drivers and Tunings; then Data Migration from former SSD to get my Data, Apps and BOINC.
Plan to DVI or HDMI the iGD to my monitor for desktop GFX and crunch only with a Fully enabled GTX 960 SC.
Remount older SSD after Data transfer and remount my 750 Ti SC into this former Host; retune everybody and make it a fulltime cruncher.
=
Wet Memorial Day weekend here---I'll just be building this
=
So after a few days of crunching SETI and Einstein to produce comparative samples = I'll share what differences the new config yields and confirm/deny params of original(former) hypotheses.

Have a good weekend y'all!

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1595778972

RAC: 777378

I wish to state how happy my

25 Jul 2015 19:03:42 UTC

Message 130855

(moderation:

)

I wish to state how happy my GTX660 is with the beta cuda 55 app. Its times are consistently below 4 hrs running 3 at a time vs almost 5 hrs with cuda 32. The first 4 failed with a total run time of less than 30 secs. Since then every thing has validated and my RAC has jumped by 10 k.

exo

Joined: 11 Feb 06

Posts: 11

Credit: 133077998

RAC: 0

Hi, this thread is already

23 Aug 2015 9:18:30 UTC

Message 130856

(moderation:

)

Hi,

this thread is already a bit older - do you still need results?

I should have enough data from my crunching machine in 1 or 2 weeks. It's a GTX 650TI running on a Celeron G530, bus is PCIe 2.

First results show that the runtime is about 20% faster compared to "Binary Radio Pulsar Search (Parkes PMPS XT) v1.52 (BRP6-cuda32-nv270)".

If results are still needed, I would provide it properly once I have enough data.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117859048259

RAC: 34760855

RE: this thread is already

24 Aug 2015 7:42:34 UTC

Message 130857 in response to message 130856

(moderation:

)

Quote:

this thread is already a bit older - do you still need results?

In an earlier message in this thread, Bikeman indicated that he had enough information to validate the success of the optimizations he designed into the new BRP6 app. This had nothing to do with a change in the version of CUDA, which is a much more more recent development and is unrelated to the previous algorithm optimizations.

Quote:

I should have enough data from my crunching machine in 1 or 2 weeks. It's a GTX 650TI running on a Celeron G530, bus is PCIe 2.

First results show that the runtime is about 20% faster compared to "Binary Radio Pulsar Search (Parkes PMPS XT) v1.52 (BRP6-cuda32-nv270)".

There was a separate thread for recording the improvements (or lack thereof) for NVIDIA GPUs (only) as a result of the change from CUDA32 to CUDA55. If you want to comment or post results, you should use it instead of this one. The consensus seems to be that Kepler and later series do benefit whilst Fermi and earlier don't. On this basis, your figure of 20% for a 650Ti seems about right. This message posted in the CUDA55 thread actually provides data for a 650Ti showing a ~19% improvement. There is also a link there to earlier data from the BRP5 -> BRP6 -> BRP6-Beta transitions (all using the old CUDA32).

Quote:

If results are still needed, I would provide it properly once I have enough data.

It's entirely up to you. I get the feeling that the results and comments in the CUDA55 thread support what the Devs were expecting so I assume they aren't really looking for further confirmation. However, don't let that stop you :-). It's always good to see the results that people get :-).

Cheers,
Gary.

Times (Elapsed / CPU) for BRP5/6/6-Beta on various CPU/GPU combos - DISCUSSION Thread

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner