All-Sky Gravitational Wave Search on O3 data (O3ASHF1)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,835
Credit: 108,306,765,837
RAC: 33,074,036

MyrCu wrote:Is there a

MyrCu wrote:
Is there a possibility to exclude these "special" WUs without withdraw all GPU Jobs in general?

There aren't any "special" workunits.  All of them are likely to fail on a GPU with only 4GB VRAM, particularly if some of that VRAM is reserved by the OS to run your display or for other things you are doing.

All you need to do is change your project preferences to exclude the All Sky search from your list of allowed searches.  Just remove the tick mark for that search and make sure to scroll to the bottom and "Save Changes".

If you have set up different "Locations" (previously known as venues) make the change for the location your computer is set to.  If you don't know what "Locations" are and you haven't previously used them, just ignore this last bit.

Cheers,
Gary.

MyrCu
MyrCu
Joined: 18 Feb 22
Posts: 4
Credit: 57,472,869
RAC: 61,206

Thank you verymuch  for that

Thank you verymuch  for that hint. I did not discoverd that before.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,240
Credit: 244,349,085
RAC: 23,763

Richard Haselgrove

Richard Haselgrove wrote:

Had a run of 6 consecutive tasks this morning, which all failed with "Exit status:196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED".

TASKS FOR COMPUTER 12808716 (filtered)

Machine has a 2 TB SSD dedicated to BOINC, and has been processing tasks successfully before and since - I think it's the tasks, rather than my machine.

Thanks for pointing this out! The disk limit certainly needs an update.

I noticed that the checkpoint alone takes a whopping 150MB, and it only covers the first GPU processing stage, which should taks <10 Min. I wonder whether we should turn that off altogether? Setting your preferences to "write to disk at most ..." to a time longer than the GPU part takes to run should do that for you, but you may not want to o this for all apps of all projects...

BM

Allnight
Allnight
Joined: 23 Jan 07
Posts: 1
Credit: 132,353,264
RAC: 97,230

So...I am getting only

So...I am getting only "computation error" from any work units for the appllication

All-Sky Gravitational Wave search on O3 1.04 (GW-opencl-nvidia)

Are you aware of this?  I think I get this on my 9th-gen I9, and am now getting it on my 12th-gen I9.   This machine has an Nvidia RTX-A2000 in it.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,140
Credit: 6,966,524,931
RAC: 1,812,651

Allnight wrote: So...I am

Allnight wrote:

So...I am getting only "computation error" from any work units for the appllication

All-Sky Gravitational Wave search on O3 1.04 (GW-opencl-nvidia)

Are you aware of this?  I think I get this on my 9th-gen I9, and am now getting it on my 12th-gen I9.   This machine has an Nvidia RTX-A2000 in it.

The first thing to do is to review the std_err 

For yours you'll find the text string:

CL_MEM_OBJECT_ALLOCATION_FAILURE

That is a big clue that trying to run these big tasks on your 4Gbyte laptop GPU is a mismatch.  You could also look around the forums and notice that this is among the most frequently mentioned recent difficulties.

My suggestion is to disallow this application on that machine.  It will be better for you, and better for the project.

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 257
Credit: 6,277,458,307
RAC: 16,433,679

... back to basics ...

... back to basics ...

Aurum
Aurum
Joined: 12 Jul 17
Posts: 77
Credit: 3,406,488,158
RAC: 1,236,789

Trying to make more efficient

Bernd, Is it possible to release the VRAM when the recalc phase begins? Was trying to run 3 O3AS WUs on a 12 GB GPU but they fail even though one or two of them are in the recalc phase and barely even using the GPU.

Guess they must be staggered for that to work. What's the BOINC command :-?

<stagger>1</stagger>

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3,631
Credit: 32,939,264,728
RAC: 6,797,423

Aurum wrote: Bernd, Is it

Aurum wrote:

Bernd, Is it possible to release the VRAM when the recalc phase begins? Was trying to run 3 O3AS WUs on a 12 GB GPU but they fail even though one or two of them are in the recalc phase and barely even using the GPU.

Guess they must be staggered for that to work. What's the BOINC command :-?

<stagger>1</stagger>

there is no stagger command like that

_________________________________________________________________________

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,240
Credit: 244,349,085
RAC: 23,763

Actually that's some

Actually that's some optimization that we already thought of, though we didn't finish the implementation yet. In principle the BOINC client should start only so much tasks in parallel as fit in the available GPU memory, thus by adjusting the memory size (and free cores) one should be able to convince the client to start another GPU task when the memory is freed by the one still running. However I think the client only performs this check every five minutes, which might not be fine-grained enough. Also too I would change the app such that a memory allocation failure becomes a "transient" error, so the client would start the same task again after some time. For now, though, these are just thoughts, we never tried that, and there are a few other more urgent problems to solve. But we'll keep that in mind.

BM

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3,631
Credit: 32,939,264,728
RAC: 6,797,423

i don't think BOINC has any

i don't think BOINC has any mechanism to monitor actual GPU VRAM use. but it does for system memory use. the "available memory" metric seems to only be checked at startup of the client for collecting coproc info. this is subsequently transmitted to the projects under that metric, but it never changes.

_________________________________________________________________________

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.