All-Sky Gravitational Wave Search on O3 data (O3ASHF1)

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250361770

RAC: 35671

There's still work being done

26 Oct 2023 14:16:00 UTC

Message 218569

(moderation:

)

Originally, you all were working on the recalculation step (from CPU to GPU) but said it didn't seem to speed up the work. Is anything in the works related to this?

There's still work being done on the "recalc" step. The problem with that is that this step requires really data-dependent random memory access, which is pretty bad for the GPU memory. There are some tricks you can play to help with that with CUDA, and we plan to bring out a CUDA version of the app for NVidia GPUs. But this is still work in progress. And it will speed up the whole runtime only by 10-20% max, dependig on the card.

The main problem for us is that by losing the computing power from <=4GB GPUs the search is progressing half as fast as we expected and designed it for. Getting more GPUs to help with that is therefore our higher priority.

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 238

Credit: 10502785586

RAC: 26990620

Bernd Machenschalk

26 Oct 2023 14:33:25 UTC

Message 218570 in response to message 218569

(moderation:

)

Bernd Machenschalk wrote:

The main problem for us is that by losing the computing power from <=4GB GPUs the search is progressing half as fast as we expected and designed it for. Getting more GPUs to help with that is therefore our higher priority.

Completely understand! We have seen the impact of the random memory access on our systems. The Threadripper Pros with 8 memory channels have been FAR superior to systems with the same/similar CPU and memory speeds but fewer channels. Our older systems that still have a relatively fast CPUs but slower memory and only 2 channels really struggle with the recalc step. It has been a fun problems for us to optimize on our end (or, attempt to optimize!).

DF1DX

Joined: 14 Aug 10

Posts: 105

Credit: 3839116854

RAC: 4884371

Thank you for the

26 Oct 2023 15:07:13 UTC

Message 218572

(moderation:

)

Thank you for the information.

What is the difference between

GW-opencl-ati / GW-opencl-ati-2 and

GW-opencl-nvidia / GW-opencl-nvidia-2 for version 1.06?

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250361770

RAC: 35671

The "-2" plan classes don't

26 Oct 2023 15:12:11 UTC

Message 218573

(moderation:

)

The "-2" plan classes don't really exist yet. Ultimately these will be used to specify a lower VRAM requirement for the new workunits.

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 238

Credit: 10502785586

RAC: 26990620

Could someone help me out

27 Oct 2023 11:56:23 UTC

Message 218602

(moderation:

)

Could someone help me out with this error? I had this happen for a group of work units. I restarted the system and have not seen it again, but would like to have more insight into what it means. Thanks!

[14:51:54][4797][ERROR] Error synchronising after CUDA device->host HS data transfer (dirty phase 2) (error: 700)
[14:51:54][4797][ERROR] Error during CUDA host->device HS thresholds data transfer (error: 700)
[14:51:54][4797][ERROR] Demodulation failed (error: 1007)!
14:51:54 (4797): called boinc_finish(1007)

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3945

Credit: 46593192642

RAC: 64190191

my guess is some kind of

27 Oct 2023 12:23:59 UTC

Message 218604

(moderation:

)

my guess is some kind of problem with the driver.

_________________________________________________________________________

wujj123456

Joined: 16 Sep 08

Posts: 18

Credit: 1978439314

RAC: 2472489

Is it possible to free all

28 Oct 2023 19:07:02 UTC

Message 218648

(moderation:

)

Is it possible to free all the GPU memory the moment O3AS is done using GPU? From reading posts here and monitoring with nvtop, I believe that once the GPU calculation phase is done, the GPU is never used afterwards. However, I see all the memory is still kept there. It would be nice if it's possible to free them sooner. The benefits are tens of watts of savings if the GPU is not used by anything else and I have this one crappy laptop that allocates power to CPU based on how much GPU is pulling...

I'm curious if the latest source code is available. I couldn't find anything related to EAH following the instructions in the source code page. I don't see lalapps/src/pulsar/EinsteinAtHome/eah_build2.sh or anything related to EAH in the git repository. I doubt this is useful for most people, so probably not worth core developer's time. I was hoping to check out if I could get lucky in case it's a simple change. :-D

In addition, could the "recalc" phase benefit from multiple threads? This could be helpful for systems with weaker CPU, or not enough VRAM to stagger two tasks. Otherwise, it basically becomes mostly a CPU app and throwing more cores might be useful.

Thanks.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250361770

RAC: 35671

1. Indeed freeing the GPU

1 Nov 2023 16:46:00 UTC

Message 218776

(moderation:

)

1. Indeed freeing the GPU memory is theoretically possible, although it's not easy technically, i.e. within the current function call structure. Are you sure that just keeping something in memory draws noticeable power for the GPU even if its processing units are not used?

2. We thought about that ourselves. However the change to the workunits that we are about to deploy may scre up all benefit of it. The current workunits analyze a 2Hz frequency band, which is something like a sweet spot between efficiency and memory requirement. We plan to make this a workunit with two passes, each analyzing a 1 Hz band. This will add additional time e.g. for overhead because of the two calls, but should roughly cut the required memory in half (+ overhead). Freeing that in the first pass just to be allocated again just afterwards won't help you much I'm afraid.

mikey

Joined: 22 Jan 05

Posts: 12676

Credit: 1839075974

RAC: 3984

Bernd Machenschalk wrote: 2.

2 Nov 2023 2:43:49 UTC

Message 218791 in response to message 218776

(moderation:

)

Bernd Machenschalk wrote:

2. We thought about that ourselves. However the change to the workunits that we are about to deploy may scre up all benefit of it. The current workunits analyze a 2Hz frequency band, which is something like a sweet spot between efficiency and memory requirement. We plan to make this a workunit with two passes, each analyzing a 1 Hz band. This will add additional time e.g. for overhead because of the two calls, but should roughly cut the required memory in half (+ overhead).

Is there any reason to keep going up the Hz band, ie 3Hz, 4Hz, 5Hz etc? Or is that beyond the point of whatever you are looking for in this dataset?

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250361770

RAC: 35671

I'm not sure if I understand

2 Nov 2023 10:29:14 UTC

Message 218797

(moderation:

)

I'm not sure if I understand the question. In O3ASHF1 we are analyzing O3 data in a "high" (for GW) frequency range (800-1500Hz), in 2Hz per workunit. These 2Hz of a workunit will be split in halves and done in two 1Hz passes. Does that help?

All-Sky Gravitational Wave Search on O3 data (O3ASHF1)

Forums › Technical News

Comment viewing options

Forums › Technical News