Binary Radio Pulsar Search (Parkes PMPS XT) "BRP6"

mountkidd

Joined: 14 Jun 12

Posts: 176

Credit: 12570852555

RAC: 8013086

Hi HBE, RE: It's

8 Mar 2015 20:01:24 UTC

Message 129911 in response to message 129901

(moderation:

)

Hi HBE,

Quote:

It's like this: The search code can be thought of as a loop over "templates", where the loop has different stages.

After several years of incremental optimization, almost the complete search code within this main loop runs on the GPU. The only exception now is the management of the list of candidates to send back to the server, the "toplist". This is still done on the CPU, e.g. to periodically write the list of candidates found so far to the disk as "checkpoints", something that code on the GPU cannot do.

Originally, near the end of each loop iteration, we copied the entire result from the GPU processing step back to main RAM, where the candidate-selection code would go sequentially thru those results and put them into the toplist of candidates to keep if they make it to this toplist (candidates that are "better" than the last entry in the toplist).

This is somewhat wasteful. In the new version we look at the toplist *before* starting the GPU part of the iteration to give us a threshold of the minimum "strength" of a candidate for it to make it to the toplist. During the GPU processing, we take note when this threshold is crossed. If we find that the threshold was never crossed during the GPU processing, we can completely skip writing the results back to the main memory in that iteration because there can't be anything in it that will make it to the toplist. This saves PCIe bandwidth (for dedicated GPU cards) and CPU processing time because we don't need to inspect those results for candidates either.

This also explains why some workunits can be "lucky": if many strong signal candidates are found early in the search, this sets higher thresholds for all the rest of the templates and cuts down on the number of transfers needed. If a work unit has no clear outliers at all however, the toplist will build up with candidates more evenly during the runtime and the saving effect is much less.

This is a bit simplified and doesn't explain all the details but the gist of it should describe this effect quite well. A further optimization I'll do now is to allow for partial transfers of results from GPU memory to host memory instead of the yes/no decision implemented now.

Is the processing methodology described above in the opencl-ati beta app or is it something that can/will be added in the future?

Gord

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 577926877

RAC: 200162

As far as I know the part

8 Mar 2015 22:40:28 UTC

Message 129912 in response to message 129911

(moderation:

)

As far as I know the part after "In the new version..." is in the current beta, whereas "A further optimization..." is still in development.

MrS

Scanning for our furry friends since Jan 2002

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 725730809

RAC: 1220012

RE: As far as I know the

9 Mar 2015 9:12:15 UTC

Message 129913 in response to message 129912

(moderation:

)

Quote:

As far as I know the part after "In the new version..." is in the current beta, whereas "A further optimization..." is still in development.

MrS

Exactly !
HB

|MatMan|

Joined: 22 Jan 05

Posts: 24

Credit: 249005261

RAC: 0

An update of the cuda version

9 Mar 2015 16:09:13 UTC

Message 129914

(moderation:

)

An update of the cuda version (toolkit) from the old 3.2 to a more recent 5.5 or even 6.5 was discussed some time ago. It should be quite easy to do and could yield a few extra % in processing speed. Is this still on the road map or was it dropped?

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 725730809

RAC: 1220012

RE: An update of the cuda

9 Mar 2015 16:30:04 UTC

Message 129915 in response to message 129914

(moderation:

)

Quote:

An update of the cuda version (toolkit) from the old 3.2 to a more recent 5.5 or even 6.5 was discussed some time ago. It should be quite easy to do and could yield a few extra % in processing speed. Is this still on the road map or was it dropped?

Planning is still like described here: http://einsteinathome.org/node/197990&nowrap=true#138717

In a nutshell, once we have this app version stable we are planning to offer both CUDA 3.2 and 5.5 app versions for a transition period, and then we will see a) what we gain by including CUDA 5.5 support but also b) how many hosts we would lose by dropping CUDA 3.2 support and requiring CUDA 5.5+ in the future. We hope to be able to drop CUDA 3.2 support and switch to 5.5. We'll see.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2956673081

RAC: 715311

I see I'm getting an updated

9 Mar 2015 17:31:58 UTC

Message 129916

(moderation:

)

I see I'm getting an updated v1.52 for cuda32 and - new this time - intel-gpu. Anything in particular you'd like us to watch out for?

Gavin

Joined: 21 Sep 10

Posts: 191

Credit: 40644337738

RAC: 1

Getting them for AMD also...

9 Mar 2015 17:38:21 UTC

Message 129917 in response to message 129916

(moderation:

)

Getting them for AMD also... promoted a few to run now.

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 577926877

RAC: 200162

RE: I'm getting an updated

9 Mar 2015 20:50:23 UTC

Message 129918 in response to message 129916

(moderation:

)

Quote:

I'm getting an updated v1.52 for ... intel-gpu. Anything in particular you'd like us to watch out for?

My first quick feedback: link

MrS

Scanning for our furry friends since Jan 2002

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7221524931

RAC: 973460

I promoted a full set of

10 Mar 2015 3:36:19 UTC

Message 129919

(moderation:

)

I promoted a full set of 1.52, so have run a total of eleven, on five different GPUs residing on three hosts. Uneventful during run time, so far as I could tell, with execution times and CPU times never far above the base population for 1.47/1.50. Perhaps this means 1.52 implements the tail-curtailing scheme Bikeman has been forshadowing, and it works nicely, or perhaps it means this first batch I got just happened to be in the base population anyway, and the real change is something else.

Sadly, of the eleven one raised a Validate error (58:00111010). This was one the GPU which had already generated more than one on 1.50, so may have nothing specific to do with the 1.52 changes.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 725730809

RAC: 1220012

RE: Perhaps this means

10 Mar 2015 15:36:35 UTC

Message 129920 in response to message 129919

(moderation:

)

Quote:

Perhaps this means 1.52 implements the tail-curtailing scheme Bikeman has been forshadowing,

Yes, the version 1.52 beta apps hopefully have a more uniform run time, and not far from the mean runtime of the previus beta app.

Cheers
HB

Binary Radio Pulsar Search (Parkes PMPS XT) "BRP6"

Forums › Technical News

Comment viewing options

Forums › Technical News