Getting "latest" einstein@home apps source code, and why no CUDA for Gamma Wave Pulsar or S6 searches?

John Jamulla

Joined: 26 Feb 05

Posts: 32

Credit: 1174595451

RAC: 541131

5 Oct 2011 12:30:45 UTC

Topic 195996

(moderation:

)

I have recently downloaded the software for einstein@home (following directions from http://einstein.phys.uwm.edu/license.php), and with a ton of "foolin around", have gotten that to compile under linux. But, it looks to me like this is not the latest software, nor is it all of the apps, nor any of the CUDA apps. Maybe I am mistaken.

I would like to see if I could contribute to this endeavor by modifying code for the apps, hopefully to increase the use of GPUs using CUDA for the apps that aren't currently using it (such as gamma ray search, S6 search, etc.).

How can I gt about getting the latest software, and whom might I be able to talk to about it if I had trouble.
I can at least try to make modifications myself, if I could get the real and latest complete set of software.

FYI - Currently I have approx. 7.2M credit and approx rank of 325, I am excited! Lookin to increase that. Currently I am easily able to add 2 more graphics cards, but I don't want to bather since I don't seem to be getting enough CUDA work, as most is still CPU work.

Sincerely,
John J.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250621912

RAC: 34492

Getting "latest" einstein@home apps source code, and why no CUDA

5 Oct 2011 15:26:53 UTC

Message 106834

(moderation:

)

There is a CUDA version of "HierarchicalSearch", the App that we were using until "S5R6". You might be able to build it with the build script "eah_build.sh" with --cuda. If you want to build it manually, you can configure LAL & LALApps with --with-cuda before building. You will find the kernel, wrapper and everything in lalapps/src/pulsar/FDS_isolated/OptimizedCFS. There is also an OpenCL version of that code.

The reason why the development was dropped is that this CUDA code even on our fastest GPUs runs slower than our SSE2 implementation of the same calculation.

To use this in the current HierarchSearchGCT App, you would need to restructure the main loops to basically use XLALComputeFStatFreqBandVector instead of ComputeFStatFreqBand.

The other way to make a CUDA/OpenCL version of that App is to validate the "resamplin Fstat" (which is currently being done internally, but will take a while), then implement the actual resampling (currently based on GSL splines) in CUDA, switch to using the cuFFT and possibly also implement the "global correlation transform" to run on the GPU. This is the way we (the LSC CW group at AEI) are currently heading.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250621912

RAC: 34492

Addendum: The FGRP source

5 Oct 2011 15:46:08 UTC

Message 106835

(moderation:

)

Addendum: The FGRP source code is not public. I am in communication with the main authors, but I don't think this will change in the foreseeable future.

Akos Fekete

Joined: 13 Nov 05

Posts: 561

Credit: 4527270

RAC: 0

RE: The reason why the

5 Oct 2011 20:17:15 UTC

Message 106836 in response to message 106834

(moderation:

)

Quote:

The reason why the development was dropped is that this CUDA code even on our fastest GPUs runs slower than our SSE2 implementation of the same calculation.

An AVX implementation would be able to double the performance of Sandy Bridge CPUs.
AMD Bulldozer doesn't need AVX freshening, because of the clever Flex FP.

John Jamulla

Joined: 26 Feb 05

Posts: 32

Credit: 1174595451

RAC: 541131

AVX sounds interesting, since

5 Oct 2011 23:43:03 UTC

Message 106837 in response to message 106836

(moderation:

)

AVX sounds interesting, since I have a Sandy Bridge CPU... 2600K.

Also not sure but I know there's SSE up to 4.2 now (might include the AVX), but the curent apps seems to use SSE 2 only.

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3562358667

RAC: 0

RE: AVX sounds interesting,

6 Oct 2011 10:40:55 UTC

Message 106838 in response to message 106837

(moderation:

)

Quote:

AVX sounds interesting, since I have a Sandy Bridge CPU... 2600K.

Also not sure but I know there's SSE up to 4.2 now (might include the AVX), but the curent apps seems to use SSE 2 only.

I know SSE3 didn't offer anything that lead to faster computation rates.

Donald A. Tevault

Joined: 17 Feb 06

Posts: 439

Credit: 73516529

RAC: 0

RE: RE: The reason why

6 Oct 2011 13:46:06 UTC

Message 106839 in response to message 106836

(moderation:

)

Quote:

Quote:
The reason why the development was dropped is that this CUDA code even on our fastest GPUs runs slower than our SSE2 implementation of the same calculation.

An AVX implementation would be able to double the performance of Sandy Bridge CPUs.
AMD Bulldozer doesn't need AVX freshening, because of the clever Flex FP.

This sounds interesting. Are you saying that a Bulldozer would outperform a Sandy Bridge on Einstein at Home applications?

Akos Fekete

Joined: 13 Nov 05

Posts: 561

Credit: 4527270

RAC: 0

RE: RE: An AVX

6 Oct 2011 19:24:26 UTC

Message 106840 in response to message 106839

(moderation:

)

Quote:

Quote:
An AVX implementation would be able to double the performance of Sandy Bridge CPUs.
AMD Bulldozer doesn't need AVX freshening, because of the clever Flex FP.

This sounds interesting. Are you saying that a Bulldozer would outperform a Sandy Bridge on Einstein at Home applications?

I don't think it. Sandy Bridge is very powerful...

Getting "latest" einstein@home apps source code, and why no CUDA for Gamma Wave Pulsar or S6 searches?

Forums › Cruncher's Corner

Getting "latest" einstein@home apps source code, and why no CUDA

Addendum: The FGRP source

RE: The reason why the

AVX sounds interesting, since

RE: AVX sounds interesting,

RE: RE: The reason why

RE: RE: An AVX

Comment viewing options

Forums › Cruncher's Corner