Some 2.09 G/W GPU W/U crashing out - GPU memory too small?

Ron Kosinski
Ron Kosinski
Joined: 23 Mar 05
Posts: 57
Credit: 893861645
RAC: 630335
Topic 224827

I have started getting a bunch of Computation Error's on some 2.09 G/W GPU work units on one of my rigs.  I previously had a similar issue on another rig that had a GTX-760 GPU card with 2GB of memory.  I had to opt out of running the G/W wok units on that rig due to the small memory size.  Now I seem to be having the same issue on another rig that has a GTX 1050 Ti card with 4 GB of memory.  Some work units run fine and some fail with a Computation Error.  Am I correct in guessing that 4 GB of memory is now too small for some G/W GPU work units?

I have my app_config file set to run 2 work units concurrently on the GPU card with 1 CPU core per work unit.  If I change this to 1 GPU core and 1 CPU  core per work unit, would it help?

Thanks in advance for any assistance.

mikey
mikey
Joined: 22 Jan 05
Posts: 11948
Credit: 1832823567
RAC: 220712

Ron Kosinski wrote: I have

Ron Kosinski wrote:

I have started getting a bunch of Computation Error's on some 2.09 G/W GPU work units on one of my rigs.  I previously had a similar issue on another rig that had a GTX-760 GPU card with 2GB of memory.  I had to opt out of running the G/W wok units on that rig due to the small memory size.  Now I seem to be having the same issue on another rig that has a GTX 1050 Ti card with 4 GB of memory.  Some work units run fine and some fail with a Computation Error.  Am I correct in guessing that 4 GB of memory is now too small for some G/W GPU work units?

I have my app_config file set to run 2 work units concurrently on the GPU card with 1 CPU core per work unit.  If I change this to 1 GPU core and 1 CPU  core per work unit, would it help?

Thanks in advance for any assistance. 

The tasks require 4gb of memory for alot of those tasks and that's why it's crashing, your GTX760 only has 2gb on onboard memory and you can't extend it with system memory. Suggest you put that pc in a different venue ie default, home, work or school and let it run different tasks.

Ron Kosinski
Ron Kosinski
Joined: 23 Mar 05
Posts: 57
Credit: 893861645
RAC: 630335

mikey wrote: The tasks

mikey wrote:

The tasks require 4gb of memory for alot of those tasks and that's why it's crashing, your GTX760 only has 2gb on onboard memory and you can't extend it with system memory. Suggest you put that pc in a different venue ie default, home, work or school and let it run different tasks.

Hi Mikey,

I already have the G/W work units stopped for the box with the GTX-760 card. 

I am having a problem with the G/W work units on the box with the GTX-1050 Ti card.  I did change my app_config file to run only one G/W work unit per card.  Let's see if that solves the crashing.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5846
Credit: 109977924228
RAC: 29323968

Ron Kosinski wrote:....  If I

Ron Kosinski wrote:
....  If I change this to 1 GPU core and 1 CPU  core per work unit, would it help?

I'll let you answer that for yourself :-).

All you need to do is pick any of your recently failed tasks on the website and click on the TaskID link.  Scroll down through all the stderr output you find there looking for the word "error".  Here is the first occurrence with irrelevant stuff truncated:-

XLAL Error - XLALComputeECLFFT ... : Processing FFT failed: CL_MEM_OBJECT_ALLOCATION_FAILURE

As the frequency term in the task name gets larger, this is likely to become a more common problem for people with older and more basic GPUs that (unfortunately) don't have enough memory for running multiple concurrent tasks.  As the above example shows, if you check the stderr output immediately when you see tasks failing, it's pretty easy to diagnose for yourself, this particular cause of task failures.

Cheers,
Gary.

Ron Kosinski
Ron Kosinski
Joined: 23 Mar 05
Posts: 57
Credit: 893861645
RAC: 630335

Gary Roberts wrote: All you

Gary Roberts wrote:

All you need to do is pick any of your recently failed tasks on the website and click on the TaskID link.  Scroll down through all the stderr output you find there looking for the word "error".  Here is the first occurrence with irrelevant stuff truncated:-

XLAL Error - XLALComputeECLFFT ... : Processing FFT failed: CL_MEM_OBJECT_ALLOCATION_FAILURE

As the frequency term in the task name gets larger, this is likely to become a more common problem for people with older and more basic GPUs that (unfortunately) don't have enough memory for running multiple concurrent tasks.  As the above example shows, if you check the stderr output immediately when you see tasks failing, it's pretty easy to diagnose for yourself, this particular cause of task failures.

Gary, thanks for the info on what to look for in the output file.  Switching back to running one 2.09 w/u per GPU card has solved the problem.  I forgot I had recently changed the app_cofig file to run two w/u per card.  Trying to run two w/u on a 4GB card is the same as trying to run one w/u on a 2 GB card.  It just doesn't work sometimes! :-)

Mikey, Gary, again, thank you for the help!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.