GWnew takes too much memory - app_config.xml required

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 181,038,326
RAC: 10,431
Topic 224807

I had few failures recently because of memory allocation errors from GWnew app.

Seems 8GB isn't enough for quad CPU with this app so I need to restrict number of GWnew instances (allowing other E@h work on CPU meanwhile).

AFAIK this could be done with proper app_config.xml file.

Surely it's not brand new issue so such file perhaps was in use already.

Could someone to share?

 

mikey
mikey
Joined: 22 Jan 05
Posts: 12,555
Credit: 1,838,837,100
RAC: 23,865

Raistmer* wrote: I had few

Raistmer* wrote:

I had few failures recently because of memory allocation errors from GWnew app.

Seems 8GB isn't enough for quad CPU with this app so I need to restrict number of GWnew instances (allowing other E@h work on CPU meanwhile).

AFAIK this could be done with proper app_config.xml file.

Surely it's not brand new issue so such file perhaps was in use already.

Could someone to share? 

IF you don't run anything other than gpu tasks this is work:

<app_config>
<project_max_concurrent>3</project_max_concurrent>
</app_config>

Put in the usual place with the usual name and it will work for any project, no more is needed. It limits the total number of tasks from a project not just one kind of task, so if you need one that only limits GWnew tasks then you need a more extensive one that I don't have.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4,902
Credit: 18,453,335,529
RAC: 6,081,554

This will restrict to 3 max

This will restrict to 3 max for both GW and GR.

<app_config>

<app>
    <name>einstein_O2MDF</name>
       <gpu_versions>
          <gpu_usage>1.0</gpu_usage>
          <cpu_usage>1.0</cpu_usage>
      </gpu_versions>   
    <max_concurrent>3</max_concurrent>
</app>

<app>
    <name>hsgamma_FGRPB1G</name>
      <gpu_versions>
          <gpu_usage>1.0</gpu_usage>
          <cpu_usage>1.0</cpu_usage>
      </gpu_versions>
    <max_concurrent>3</max_concurrent>
</app>

</app_config>

 

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 181,038,326
RAC: 10,431

Thanks a lot!   These

Thanks a lot!

 

These days I run 1 GPU app instance + as many CPU instances as needed to load all CPU cores.

Some E@h GPU  apps reserve CPU core some not.

So either 3CPU+ 1GPU or 4CPU+1GPU.

But because E@h data set much bigger than S@h one was, some app combos can't fit in available RAM.

So I want to keep computational resources loaded while ensure fit in memory. Fortunately, binary pulsars search much less memory demanding than gravitational waves one so I need specifically restrict only number of GW app instances, not whole E@h.

 

For now I'll try such minimalistic one

<app_config>

<app>
    <name>einstein_O2MDF</name>
    <max_concurrent>2</max_concurrent>
</app>

</app_config>

in hope that GPU part no need corrections at all and idle cores will be filled with other work.

Lets see.

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 181,038,326
RAC: 10,431

2/13/2021 17:21:27 PM |

2/13/2021 17:21:27 PM | Einstein@Home | Your app_config.xml file refers to an unknown application 'einstein_O2MDF'.  Known applications: 'einstein_O2MD1', 'hsgamma_FGRP5', 'hsgamma_FGRPB1G'
 

So, app name probably changed. But good way to find actual ones, will use it in future.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3,914
Credit: 43,875,259,309
RAC: 63,372,341

O2MD1 = CPU app  O2MDF =

O2MD1 = CPU app 

O2MDF = GPU app

 

you must only be running gamma Ray tasks on your GPU and GW tasks on your CPU 

_________________________________________________________________________

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 181,038,326
RAC: 10,431

Perhaps this particular host

Perhaps this particular host has too old NV GPU for GW GPU app.

But no prob, I need to restrict CPU apps, not GPU ones.

 

Applying of mentioned config file had some side effect...

 

BOINC started to load additional GW tasks... and exhausted space on system partition of SSD drive.

It's strange cause E@h has 0% share since S@h days. So it should not download additional work in my undertanding. And now it does. And BOINC client started to crash constantly. More than 5 restarts already.

So need to increase system partition before continue.

 

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 4,198
Credit: 3,116,386,957
RAC: 1,834,987

There has been some

There has been some discussion on Boinc forums that when using the max_concurrent tags, it makes Boinc to download excess amounts of tasks. https://boinc.berkeley.edu/forum_thread.php?id=14146&postid=102745#102745 So Boinc has a bug for that.

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 181,038,326
RAC: 10,431

Good to know this bug is

Good to know this bug is known, thanks.

But why it's not fixed? It really can be issue - currently my host download more than 10 pages of GW tasks (~20h long each!) and cache options are 0.1/0.1 days, definitely ignored.

Current situation not good. While BOINC obeys app_config and runs only 2 instances of GW tasks, it leaves GPU just empty. So I have 4 CPU tasks running, not GPU tasks at all and hugely overloaded cache.

And maybe empty GPU is direct result of cache overloading...

 

2/15/2021 10:35:04 AM | Einstein@Home | update requested by user
2/15/2021 10:35:08 AM | Einstein@Home | Sending scheduler request: Requested by user.
2/15/2021 10:35:08 AM | Einstein@Home | Reporting 1 completed tasks
2/15/2021 10:35:08 AM | Einstein@Home | Requesting new tasks for CPU and NVIDIA GPU
2/15/2021 10:35:11 AM | Einstein@Home | Scheduler request completed: got 0 new tasks
2/15/2021 10:35:11 AM | Einstein@Home | No work sent
2/15/2021 10:35:11 AM | Einstein@Home | (reached daily quota of 384 tasks)

2/15/2021 10:35:11 AM | Einstein@Home | Project has no jobs available
2/15/2021 10:35:11 AM | Einstein@Home | Project requested delay of 60789 seconds
 

So, having so many CPU tasks my host can't get GPU ones... that leaves much more powerfull computing device (GPU) idle.

 

Definitely not good!

I see another bug here - interdependence between CPU and GPU jobs caches. Overloaded CPU cache should prevent new CPU tasks to be downloaded, not GPU ones...

 

 

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 181,038,326
RAC: 10,431

So I need to brainstorm again

So I need to brainstorm again - what could be done instead of app_config?

The aim is the same: to restrict GW tasks instances (to fit in RAM) while keeping all computational devices busy.

To reject GW tasks at all could be solution, but that's E@h mainly about - to search gravitational waves... So, it's suboptimal solution.

mikey
mikey
Joined: 22 Jan 05
Posts: 12,555
Credit: 1,838,837,100
RAC: 23,865

Raistmer* wrote: So I need

Raistmer* wrote:

So I need to brainstorm again - what could be done instead of app_config?

The aim is the same: to restrict GW tasks instances (to fit in RAM) while keeping all computational devices busy.

To reject GW tasks at all could be solution, but that's E@h mainly about - to search gravitational waves... So, it's suboptimal solution.

You could always run a different project on your cpu's that way the caches won't clash. And YES that's been a problem for a long time as well, Boinc has a few of those that are long standing problems.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.