Gravitational Wave search O2 Multi-Directional ("O2MD1")

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,099
Credit: 227,908,862
RAC: 25,056

Our workunits have varying

Our workunits have varying memory requirements, depending much (but not only) on the analysis frequency - the higher the frequency, the more data (files) is needed to process these, and the more memory to store it. We developed a model for the memory usage of our apps depending on the input parameters, and coded it into the workunit generator. The workunit generator caclculates the memory requirement for each particular workunit and writes it into the workunit record, such that the scheduler doesn't give tasks to clients which have too few (available) memory. For CPU Apps, this works pretty good.

However, BOINC's handling of GPU memory restrictions is completely separated from that. It only allows to specify a minimum RAM per GPU, and a single fixed value for how much memory will actually be used (by this App), completely ignorant of the memory requirement assigned to the workunit. Therefore I now changed the scheduler, such that it will require the GPU to have at least as much memory as recorded in the workunit. Test show that this is a pretty good fit, at least for our applications. This might lead to a lot more "work requests" being rejected (or fulfilled by FGRP work), but it should avoid the memory allocation errors of the GPU apps.

BM

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3,117
Credit: 4,050,672,230
RAC: 37

Thanks Bernd,  You should

Thanks Bernd, 

You should look at this thread where it's discussed how BOINC determines the amount of RAM on Nvidia GPUs

https://einsteinathome.org/goto/comment/176690

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,099
Credit: 227,908,862
RAC: 25,056

Thanks for the note. However,

Thanks for the note. However, I don't think that any workunit of ours does now require more than 2GB of RAM, or will in the foreseeable future.

BM

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3,117
Credit: 4,050,672,230
RAC: 37

No problem. Note Nvidia only

No problem. Note Nvidia only allows use of 27% of the RAM on a card to be used for OpenCl. This is creating issues with low RAM nvidia (2-3 GB) cards.  AMD and Intel use somewhere between 52-67% of a card's available RAM for scientific computations.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 2,012
Credit: 16,796,626,077
RAC: 40,557,152

Bernd Machenschalk wrote:Our

Bernd Machenschalk wrote:
Our workunits have varying memory requirements, depending much (but not only) on the analysis frequency - the higher the frequency, the more data (files) is needed to process these, and the more memory to store it. We developed a model for the memory usage of our apps depending on the input parameters, and coded it into the workunit generator. The workunit generator caclculates the memory requirement for each particular workunit and writes it into the workunit record, such that the scheduler doesn't give tasks to clients which have too few (available) memory. For CPU Apps, this works pretty good. However, BOINC's handling of GPU memory restrictions is completely separated from that. It only allows to specify a minimum RAM per GPU, and a single fixed value for how much memory will actually be used (by this App), completely ignorant of the memory requirement assigned to the workunit. Therefore I now changed the scheduler, such that it will require the GPU to have at least as much memory as recorded in the workunit. Test show that this is a pretty good fit, at least for our applications. This might lead to a lot more "work requests" being rejected (or fulfilled by FGRP work), but it should avoid the memory allocation errors of the GPU apps.

Hi Bernd, Please see my post here: https://einsteinathome.org/content/discussion-thread-continuous-gw-search-known-o2md1-now-o2mdf-gpus-only?page=28#comment-177563

 

I think there is some bug in your scheduling method for determining how much GPU memory will be used by a particular WU. you can see in my screenshot that the scheduler thinks it only needs 1800MB, but in practice, it tries to allocate >3000MB (~3200 when run on a GPU with enough memory). This is why so many people with 3GB GPUs keep getting sent GW tasks that error out from not enough memory.

I hope this information is helpful for you to track down the cause and be able to fix this error. :)

_________________________________________________________________________

Rob R
Rob R
Joined: 22 May 14
Posts: 1
Credit: 21,488,495
RAC: 103

Seams as though this is a

Seams as though this is a known issue but I thought Id add my GPU for reference.

I have a GTX1050 (2GB) The work units run for about 1:30 then the GPU memory usage starts going up.  At about 1:50 it hits 1.8GB used then the task fails with a computational error.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 2,012
Credit: 16,796,626,077
RAC: 40,557,152

also, the scheduler is

also, the scheduler is checking global memory, it might be a little better to look at available memory instead. running the desktop on a GUI based OS will eat up a couple hundred MB of GPU ram, which can be the deciding factor in available GPU mem to run the task on a GPU that's on the line. a 2GB GPU might be able to run a GW task needing 1800MB, but not if it's driving the display also. but the scheduler isnt taking that into account when it's checking global mem.

_________________________________________________________________________

astro-marwil
astro-marwil
Joined: 28 May 05
Posts: 457
Credit: 189,127,136
RAC: 43,391

Hallo! It seems to me,

Hallo!

It seems to me, that O2MD1 is finalizing, running out of tasks. Since some days I don´t get any new task.

What and when is comming next app in this regard???

Kind regards and happy crunching

Martin

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 2,580
Credit: 7,690,627,672
RAC: 22,958,142

You and I would like to

You and I would like to know.  I think that the O2MD1 app data is finished up since Bernd posted the notice their analysis of the data was published.

He did hint they needed to followup with maybe an O3 run to look at the outlier candidates that look interesting.

But no messages or posts from the admins that the next application and data run is forthcoming.

I haven't been able to get more than 1 or 2 O2MD1 tasks per week for several weeks now.

My cpu threads are mostly running the FGRP5 tasks since there is plenty of those.

But there is plenty of O2MDF gpu tasks to run.

 

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,373
Credit: 195,312,650
RAC: 169,744

It seems to be that the O3

It seems to be that the O3 run - even shortened due to COVID - was such a bonanza of detections. That speaks to the sensitivity gain of the instruments over O1 and O2. This augers well for a continuous wave detection in our follow up of candidates using O3 data.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.