Binary Radio Pulsar Search (Parkes PMPS XT) "BRP6"

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2956626419
RAC: 715222

LOL! Well, I can guess what

LOL! Well, I can guess what the Einstein developers will be saying in the general direction of DA on Monday morning, even if they don't say it to his face.

Meanwhile, having got my app_config.xml sorted out, I can confirm that putting --device 0 into the command line for the app_version overcomes the API v7.5.0 issue - one task branded v1.47 Beta picked up cleanly from part-way through, and another has run from start to finish with that combination of settings.

Only a stopgap and a proof-of-concept, of course. We need to handle automatic assignment for --device 1, --device 2 etc. where they exist

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117564356647
RAC: 35299751

I know the focus will be on

I know the focus will be on sorting out the issues with the CUDA version of the app but I thought I'd post some very encouraging findings with the OpenCL app running on HD7850 GPUs. I have a large number of these GPUs in a variety of different hosts and I've been running BRP5 (4 concurrent tasks) on all of them using app_config.xml to control concurrency. When BRP6 started, I added a BRP6 clause to all the app_config.xml files so that BRP6 would also run 4x to start with. My intention is to experiment with this when things settle down.

The bulk of my HD7850 GPUs are in quite modern hosts and I'm seeing nice performance improvements (but not spectacular) when going from 1.39 -> 1.41 -> 1.47. I have several GPUs in older hardware and it's the improvement here that is quite spectacular.

The previous BRP5 1.39 app performs very well in systems whose PCIe bus is v2+ but quite poorly in v1.x systems. I have 18 5+ year old hosts with Q8400 quad core CPUs (v1.x PCIe bus) that have run CPU apps only for their entire life. Last year I played around with putting HD7850s in a few of these and achieved about 56% of the performance I could get from the same card in an Ivy Bridge i3 with 4 virtual cores - obviously a quite disappointing result.

When I ran the BRP6 1.41 app on these PCIe-v1.x hosts, I saw commensurate performance - 33% longer crunch times for 33% bigger tasks giving 33% more credit. I had a suspicion that the BRP6 1.47-beta app might improve this so about 6 hours ago I set about deploying the beta app to these hosts and rebranding all the 1.41 tasks in their caches to 1.47. The first four 1.47 tasks done mostly or entirely with the beta app on the first machine are now just finished and I'm quite amazed at the results.

[pre]
Search App Ver Elapsed time CPU time Notes
====== ======= ============ ========= =====
BRP5 1.39 28,500 sec 3,750 sec Long term averages for BRP5
BRP6 1.41 38,500 sec 4,950 sec Averages for 20+ tasks for BRP6 v1.41
BRP6-beta 1.47 21,980 sec 2,230 sec 1st task - 18% on v1.41 and 82% on v1.47 app
BRP6-beta 1.47 20,567 sec 2,027 sec 2nd task - 100% on v1.47 app
BRP6-beta 1.47 19,053 sec 1,760 sec 3rd task - 100% on v1.47 app
BRP6-beta 1.47 22,509 sec 2,405 sec 4th task - 100% on v1.47 app

BRP6-beta 1.47 19,099 sec 2,072 sec 1st v1.47 task on different host but identical hardware

BRP6-beta 1.47 17,880 sec 664 sec Pentium dual core G3258 (Haswell refresh) with HD7850 4x[/pre]

The nice thing is that these old clunkers are now able to get performance on the HD7850 that is much closer to what a Haswell refresh based system can do. I just grabbed a recently completed result from such a machine and put it in the above table for comparison.

Cheers,
Gary.

Bill592
Bill592
Joined: 25 Feb 05
Posts: 786
Credit: 70825065
RAC: 0

Thanks Gary ! I also

Thanks Gary !

I also noticed my old ancient Core 2 cpu with Version 1 pcie
seemed to be running a whole lot faster with my radeon 7970 )

Bill

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 725640681
RAC: 1219156

Hi! Thx for the feedback.

Hi!

Thx for the feedback. Looks good.

However, the runtime is more data-dependent now, so there are "lucky" and "not-so-lucky" workunits wrt runtime. The overall speedup might be worse.

I have another idea up my sleeves that could improve the situation a bit for the "not-so-lucky" WUs. That will come later,tho, first we need to fix this CUDA device assignment problem,

Cheers
HB

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2956626419
RAC: 715222

Keep an eye on (Windows) host

Keep an eye on (Windows) host 5744895 - all Parkes tasks are being run under Beta v1.47 as discussed, two-at-a-time on a GTX 670.

The variability in runtime is very marked, and correlates with high CPU usage as well. Two 'high CPU' tasks running together are sufficient to slow down the Arecibo BRP4 tasks running on the Intel HD 4000 at the same time.

Variable runtime on this scale is a particular problem, affecting work fetch and caching, while this project is still running DCF. The best of luck in sorting it out with your new idea - more power to your elbow (if that English idiom translates safely into German).

disturber
disturber
Joined: 26 Oct 14
Posts: 30
Credit: 57155818
RAC: 0

I am having problems with

I am having problems with these. A 970 and 660ti on i7-3770k.
All fail with after 2.5 seconds:

7.4.27

Recursion too deep; the stack overflowed.
(0x3e9) - exit code 1001 (0x3e9)

Activated exception handling...
[22:49:54][11804][INFO ] Starting data processing...
[22:49:54][11804][ERROR] No suitable CUDA device available!
[22:49:54][11804][ERROR] Demodulation failed (error: 1001)!
22:49:54 (11804): called boinc_finish(1001)

]]>

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250459996
RAC: 35134

New (BRP6 Beta) CUDA version

New (BRP6 Beta) CUDA version 1.49 is out. This is meant to fix the API problem, no change to the actual computation code.

BM

BM

Stef
Stef
Joined: 8 Mar 05
Posts: 206
Credit: 110568193
RAC: 0

Works for me, thank you.

Works for me, thank you.

Gavin
Gavin
Joined: 21 Sep 10
Posts: 191
Credit: 40644337738
RAC: 1

Hi Bernd Just got a v1.49

Hi Bernd

Just got a v1.49 on host 10698787 that failed at 1 second...

Stderr output
7.4.36

(unknown error) - exit code -1073741819 (0xc0000005)

Activated exception handling...
[08:57:44][4948][INFO ] Starting data processing...
-------------------
Error occured on Tuesday, March 3, 2015 at 08:57:44.

C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu\einsteinbinary_BRP6_1.49_windows_intelx86__BRP6-Beta-cuda32-nv301.exe caused an Access Violation at location 00000000 Reading from location 00000000.

Registers:

eax=00000000 ebx=0028fda0 ecx=76a998da edx=030846a0 esi=00000000 edi=007aae87

eip=00000000 esp=0027c1a0 ebp=00000000 iopl=0 nv up ei pl nz na pe nc

cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202

Call stack:

00000000

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2956626419
RAC: 715222

After an initial download

After an initial download failure (quickly resolved), I have one running under BOINC v7.4.36 on 5744895 - no sign of error.

But I still have the --device command line in place to finish a running v1.47, so it's not a true test yet. More later.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.