I have recently downloaded the software for einstein@home (following directions from http://einstein.phys.uwm.edu/license.php), and with a ton of "foolin around", have gotten that to compile under linux. But, it looks to me like this is not the latest software, nor is it all of the apps, nor any of the CUDA apps. Maybe I am mistaken.
I would like to see if I could contribute to this endeavor by modifying code for the apps, hopefully to increase the use of GPUs using CUDA for the apps that aren't currently using it (such as gamma ray search, S6 search, etc.).
How can I gt about getting the latest software, and whom might I be able to talk to about it if I had trouble.
I can at least try to make modifications myself, if I could get the real and latest complete set of software.
FYI - Currently I have approx. 7.2M credit and approx rank of 325, I am excited! Lookin to increase that. Currently I am easily able to add 2 more graphics cards, but I don't want to bather since I don't seem to be getting enough CUDA work, as most is still CPU work.
Sincerely,
John J.
Copyright © 2024 Einstein@Home. All rights reserved.
Getting "latest" einstein@home apps source code, and why no CUDA
)
There is a CUDA version of "HierarchicalSearch", the App that we were using until "S5R6". You might be able to build it with the build script "eah_build.sh" with --cuda. If you want to build it manually, you can configure LAL & LALApps with --with-cuda before building. You will find the kernel, wrapper and everything in lalapps/src/pulsar/FDS_isolated/OptimizedCFS. There is also an OpenCL version of that code.
The reason why the development was dropped is that this CUDA code even on our fastest GPUs runs slower than our SSE2 implementation of the same calculation.
To use this in the current HierarchSearchGCT App, you would need to restructure the main loops to basically use XLALComputeFStatFreqBandVector instead of ComputeFStatFreqBand.
The other way to make a CUDA/OpenCL version of that App is to validate the "resamplin Fstat" (which is currently being done internally, but will take a while), then implement the actual resampling (currently based on GSL splines) in CUDA, switch to using the cuFFT and possibly also implement the "global correlation transform" to run on the GPU. This is the way we (the LSC CW group at AEI) are currently heading.
BM
BM
Addendum: The FGRP source
)
Addendum: The FGRP source code is not public. I am in communication with the main authors, but I don't think this will change in the foreseeable future.
BM
BM
RE: The reason why the
)
An AVX implementation would be able to double the performance of Sandy Bridge CPUs.
AMD Bulldozer doesn't need AVX freshening, because of the clever Flex FP.
AVX sounds interesting, since
)
AVX sounds interesting, since I have a Sandy Bridge CPU... 2600K.
Also not sure but I know there's SSE up to 4.2 now (might include the AVX), but the curent apps seems to use SSE 2 only.
RE: AVX sounds interesting,
)
I know SSE3 didn't offer anything that lead to faster computation rates.
RE: RE: The reason why
)
This sounds interesting. Are you saying that a Bulldozer would outperform a Sandy Bridge on Einstein at Home applications?
RE: RE: An AVX
)
I don't think it. Sandy Bridge is very powerful...