All Sky WU crashing on 1050ti GPU

Ron Kosinski
Ron Kosinski
Joined: 23 Mar 05
Posts: 57
Credit: 1061374868
RAC: 865918
Topic 229794

I just got a bunch of All Sky WU on 2 of my boxes.  All of them (19) crashed and burned on one box with W10 OS and one GTX 1050ti GPU.  They have not started running on the second box yet.  It is W10 with dual GTX 1050ti GPU's.  I have not received any on my third box with W10 OS and dual RX580's.  All boxes are set to run the All Sky WU with 0.9 CPU and 1 GPU.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18718554224
RAC: 6370508

I think somebody else already

I think somebody else already posted with the inability to run the GW OAS tasks on a 1050 Ti and suspected the VRAM is insufficient.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117565563306
RAC: 35284188

If you go to the website and

If you go to the website and click on the taskID link for any of the failed tasks, you get to see what was returned to the project after the task failed.  I picked one and here is the first failure message:-

XLAL Error - XLALOpenCLExecuteKernel (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/lib/GPUUtils/OpenCLUtils.c:652): Enqueue OpenCL kernel failed with OpenCL error: CL_MEM_OBJECT_ALLOCATION_FAILURE

As you can see, the problem is indeed caused by an inability to allocate sufficient GPU memory.

Hopefully, your machine with the 8GB RX 580s will have sufficient memory.

Cheers,
Gary.

Ben Scott
Ben Scott
Joined: 30 Mar 20
Posts: 53
Credit: 1596736465
RAC: 4979418

On my machine the All Sky GW

On my machine the All Sky GW units take just a hair less than 4GB each. If you were running Linux I would  disable the X server and they should work. I don't know about Windows though.

 

 

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 1053
Credit: 17889309
RAC: 10990

I don't know if my comment is

I don't know if my comment is misleading (then please ignore it; i'm not running GPU tasks on nvidia or AMD).

Each workunit has some upper bound parameters for disk use, runtime and memory requirements for the BOINC client to manage the scheduling of the tasks without exceeding physical resource limits. 

Are these bounds only used by BOINC for CPU tasks or also GPU (e.g. run 1 or 2 in parallel on one GPU card without exceeding VRAM)? Are bounds set appropriately (done at work unit generation; not too low)? Some months ago memory bounds were set way too low for O3MD1 CPU workunits (then discussed in O3MD1 thread).

I mean these parameters:

boinccmd --get_state

[...]

======== Workunits ========
1) -----------
   name: LATeah4021L15_1180.0_0_0.0_14710941
   FP estimate: 5.250000e+14
   FP bound: 1.050000e+16
   memory bound: 429.15 MB
   disk bound: 19.07 MB
2) -----------
   name: LATeah4021L15_1180.0_0_0.0_14711718
   FP estimate: 5.250000e+14
   FP bound: 1.050000e+16
   memory bound: 429.15 MB
   disk bound: 19.07 MB

[...]

Ron Kosinski
Ron Kosinski
Joined: 23 Mar 05
Posts: 57
Credit: 1061374868
RAC: 865918

The All Sky WU seems to

The All Sky WU seems to process without any issues on my second box (dual GTX 1050ti GPU's), and more WU crashed on the first box with the single 1050ti.  This is an older card and some of the memory could be failing.  I disabled the All Sky WU on the one box.  The RX580 cards have 8G of memory, so they should not be an issue.

Is there a possibility I just got lucky with the AS WU on the second box and they didn't need more than 4G of memory?  I now remember having this same issue several years ago with the AS WU running on a GTX750 with only 2G of memory.

I see the card that is having the problem has a newer driver (531.79) than the cards that seem to be running OK (457.51).  Is there a chance the newer driver needs more overhead memory than the older driver?

Ron Kosinski
Ron Kosinski
Joined: 23 Mar 05
Posts: 57
Credit: 1061374868
RAC: 865918

Update! Sorry wingmen, my

Update!

Sorry wingmen, my AS WU are crashing on my other box with the dual 1050ti cards too.  I guess I got lucky with the first batch running smoothly.  I disabled AS on both boxes with the1050ti cards.

:-(

 

Nuadormrac
Nuadormrac
Joined: 9 Feb 05
Posts: 76
Credit: 229259947
RAC: 99

I've noticed a problem

I've noticed a problem recently with these all sky tasks, and I have run this project on this laptop without a hitch since I bought it last October.  I reset the project to no avail.  The gfx isn't a 1050ti though, but a rtx 3050 ti however....

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18718554224
RAC: 6370508

From your Computers list, you

From your Computers list, you are still getting hamstrung by the limited 4GB RAM on your dedicated RTX 3050 Ti gpu. Same issue as the 1050 Ti's.  Not enough VRAM.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3952
Credit: 46787412642
RAC: 64189391

Keith Myers wrote: From your

Keith Myers wrote:

From your Computers list, you are still getting hamstrung by the limited 4GB RAM on your dedicated RTX 3050 Ti gpu. Same issue as the 1050 Ti's.  Not enough VRAM.

this

_________________________________________________________________________

Nuadormrac
Nuadormrac
Joined: 9 Feb 05
Posts: 76
Credit: 229259947
RAC: 99

OK, it seems that unticking

OK, it seems that unticking allsky and leaving the other WUs available still results in me getting allsky units even when "allowdownload of non-selected WUs is unticked also.  If I can't prevent it from giving me these WUs that won't work on my gfx.  Perhaps a setting to prevent the sending of these units when the gfx has insufficient VRAM would help alleviate this,, as perhaps part of error handling in the scheduling software.  Otherwise people could end up inadvertently going through a lot of units they can't crunch before they ccheck; especially when they were successfully computing before this issue is discovered by the cruncher.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.