Binary Radio Pulsar Search (Parkes PMPS XT) "BRP6"

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2956626419

RAC: 715222

LOL! Well, I can guess what

28 Feb 2015 22:25:31 UTC

Message 129861 in response to message 129860

(moderation:

)

LOL! Well, I can guess what the Einstein developers will be saying in the general direction of DA on Monday morning, even if they don't say it to his face.

Meanwhile, having got my app_config.xml sorted out, I can confirm that putting --device 0 into the command line for the app_version overcomes the API v7.5.0 issue - one task branded v1.47 Beta picked up cleanly from part-way through, and another has run from start to finish with that combination of settings.

Only a stopgap and a proof-of-concept, of course. We need to handle automatic assignment for --device 1, --device 2 etc. where they exist

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117564356647

RAC: 35299751

I know the focus will be on

2 Mar 2015 9:30:48 UTC

Message 129862

(moderation:

)

I know the focus will be on sorting out the issues with the CUDA version of the app but I thought I'd post some very encouraging findings with the OpenCL app running on HD7850 GPUs. I have a large number of these GPUs in a variety of different hosts and I've been running BRP5 (4 concurrent tasks) on all of them using app_config.xml to control concurrency. When BRP6 started, I added a BRP6 clause to all the app_config.xml files so that BRP6 would also run 4x to start with. My intention is to experiment with this when things settle down.

The bulk of my HD7850 GPUs are in quite modern hosts and I'm seeing nice performance improvements (but not spectacular) when going from 1.39 -> 1.41 -> 1.47. I have several GPUs in older hardware and it's the improvement here that is quite spectacular.

The previous BRP5 1.39 app performs very well in systems whose PCIe bus is v2+ but quite poorly in v1.x systems. I have 18 5+ year old hosts with Q8400 quad core CPUs (v1.x PCIe bus) that have run CPU apps only for their entire life. Last year I played around with putting HD7850s in a few of these and achieved about 56% of the performance I could get from the same card in an Ivy Bridge i3 with 4 virtual cores - obviously a quite disappointing result.

When I ran the BRP6 1.41 app on these PCIe-v1.x hosts, I saw commensurate performance - 33% longer crunch times for 33% bigger tasks giving 33% more credit. I had a suspicion that the BRP6 1.47-beta app might improve this so about 6 hours ago I set about deploying the beta app to these hosts and rebranding all the 1.41 tasks in their caches to 1.47. The first four 1.47 tasks done mostly or entirely with the beta app on the first machine are now just finished and I'm quite amazed at the results.

[pre]
Search App Ver Elapsed time CPU time Notes
====== ======= ============ ========= =====
BRP5 1.39 28,500 sec 3,750 sec Long term averages for BRP5
BRP6 1.41 38,500 sec 4,950 sec Averages for 20+ tasks for BRP6 v1.41
BRP6-beta 1.47 21,980 sec 2,230 sec 1st task - 18% on v1.41 and 82% on v1.47 app
BRP6-beta 1.47 20,567 sec 2,027 sec 2nd task - 100% on v1.47 app
BRP6-beta 1.47 19,053 sec 1,760 sec 3rd task - 100% on v1.47 app
BRP6-beta 1.47 22,509 sec 2,405 sec 4th task - 100% on v1.47 app

BRP6-beta 1.47 19,099 sec 2,072 sec 1st v1.47 task on different host but identical hardware

BRP6-beta 1.47 17,880 sec 664 sec Pentium dual core G3258 (Haswell refresh) with HD7850 4x[/pre]

The nice thing is that these old clunkers are now able to get performance on the HD7850 that is much closer to what a Haswell refresh based system can do. I just grabbed a recently completed result from such a machine and put it in the above table for comparison.

Cheers,
Gary.

Bill592

Joined: 25 Feb 05

Posts: 786

Credit: 70825065

RAC: 0

Thanks Gary ! I also

2 Mar 2015 10:15:02 UTC

Message 129863 in response to message 129862

(moderation:

)

Thanks Gary !

I also noticed my old ancient Core 2 cpu with Version 1 pcie
seemed to be running a whole lot faster with my radeon 7970 )

Bill

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 725640681

RAC: 1219156

Hi! Thx for the feedback.

2 Mar 2015 12:15:07 UTC

Message 129864

(moderation:

)

Hi!

Thx for the feedback. Looks good.

However, the runtime is more data-dependent now, so there are "lucky" and "not-so-lucky" workunits wrt runtime. The overall speedup might be worse.

I have another idea up my sleeves that could improve the situation a bit for the "not-so-lucky" WUs. That will come later,tho, first we need to fix this CUDA device assignment problem,

Cheers
HB

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2956626419

RAC: 715222

Keep an eye on (Windows) host

2 Mar 2015 12:30:20 UTC

Message 129865 in response to message 129864

(moderation:

)

Keep an eye on (Windows) host 5744895 - all Parkes tasks are being run under Beta v1.47 as discussed, two-at-a-time on a GTX 670.

The variability in runtime is very marked, and correlates with high CPU usage as well. Two 'high CPU' tasks running together are sufficient to slow down the Arecibo BRP4 tasks running on the Intel HD 4000 at the same time.

Variable runtime on this scale is a particular problem, affecting work fetch and caching, while this project is still running DCF. The best of luck in sorting it out with your new idea - more power to your elbow (if that English idiom translates safely into German).

disturber

Joined: 26 Oct 14

Posts: 30

Credit: 57155818

RAC: 0

I am having problems with

2 Mar 2015 13:20:38 UTC

Message 129866

(moderation:

)

I am having problems with these. A 970 and 660ti on i7-3770k.
All fail with after 2.5 seconds:

7.4.27

Recursion too deep; the stack overflowed.
(0x3e9) - exit code 1001 (0x3e9)

Activated exception handling...
[22:49:54][11804][INFO ] Starting data processing...
[22:49:54][11804][ERROR] No suitable CUDA device available!
[22:49:54][11804][ERROR] Demodulation failed (error: 1001)!
22:49:54 (11804): called boinc_finish(1001)

]]>

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250459996

RAC: 35134

New (BRP6 Beta) CUDA version

3 Mar 2015 7:57:46 UTC

Message 129867

(moderation:

)

New (BRP6 Beta) CUDA version 1.49 is out. This is meant to fix the API problem, no change to the actual computation code.

Stef

Joined: 8 Mar 05

Posts: 206

Credit: 110568193

RAC: 0

Works for me, thank you.

3 Mar 2015 8:58:16 UTC

Message 129868 in response to message 129867

(moderation:

)

Works for me, thank you.

Gavin

Joined: 21 Sep 10

Posts: 191

Credit: 40644337738

RAC: 1

Hi Bernd Just got a v1.49

3 Mar 2015 9:05:16 UTC

Message 129869 in response to message 129867

(moderation:

)

Hi Bernd

Just got a v1.49 on host 10698787 that failed at 1 second...

Stderr output
7.4.36

(unknown error) - exit code -1073741819 (0xc0000005)

Activated exception handling...
[08:57:44][4948][INFO ] Starting data processing...
-------------------
Error occured on Tuesday, March 3, 2015 at 08:57:44.

C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu\einsteinbinary_BRP6_1.49_windows_intelx86__BRP6-Beta-cuda32-nv301.exe caused an Access Violation at location 00000000 Reading from location 00000000.

Registers:

eax=00000000 ebx=0028fda0 ecx=76a998da edx=030846a0 esi=00000000 edi=007aae87

eip=00000000 esp=0027c1a0 ebp=00000000 iopl=0 nv up ei pl nz na pe nc

cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202

Call stack:

00000000

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2956626419

RAC: 715222

After an initial download

3 Mar 2015 9:14:43 UTC

Message 129870

(moderation:

)

After an initial download failure (quickly resolved), I have one running under BOINC v7.4.36 on 5744895 - no sign of error.

But I still have the --device command line in place to finish a running v1.47, so it's not a true test yet. More later.

Binary Radio Pulsar Search (Parkes PMPS XT) "BRP6"

Forums › Technical News

Comment viewing options

Forums › Technical News