Computation Error - Output file Absent

Larry Hubble

Joined: 13 Mar 05

Posts: 3

Credit: 293592026

RAC: 158509

27 Apr 2020 13:24:36 UTC

Topic 222195

(moderation:

)

Lately, I frequently see computation errors reported. Here is a recent example.

4/27/2020 8:15:45 AM | Einstein@Home | Output file h1_1457.10_O2C02Cl4In0__O2MDFV2g_VelaJr1_1457.95Hz_339_1_0 for task h1_1457.10_O2C02Cl4In0__O2MDFV2g_VelaJr1_1457.95Hz_339_1 absent

4/27/2020 8:15:45 AM | Einstein@Home | Output file h1_1457.10_O2C02Cl4In0__O2MDFV2g_VelaJr1_1457.95Hz_339_1_1 for task h1_1457.10_O2C02Cl4In0__O2MDFV2g_VelaJr1_1457.95Hz_339_1 absent
4/27/2020 8:15:45 AM | Einstein@Home | Output file h1_1457.10_O2C02Cl4In0__O2MDFV2g_VelaJr1_1457.95Hz_339_1_2 for task h1_1457.10_O2C02Cl4In0__O2MDFV2g_VelaJr1_1457.95Hz_339_1 absent

I there a way to correct this?

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

To see the real error

27 Apr 2020 15:07:04 UTC

Message 177157

(moderation:

)

To see the real error reported by the application go to your task list and then click on the Task ID for one of the failed tasks.

Here's an example from one of the failed tasks:

XLALExecuteKernel_OpenCL failed: CL_MEM_OBJECT_ALLOCATION_FAILURE

Are you running more than one tasks at a time on the GPU?
Try reducing the number of tasks you run at a time to see if that fixes the problem.
There's been discussions in other threads that some of the Gravity Wave tasks requires quite a lot of GPU memory.

Larry Hubble

Joined: 13 Mar 05

Posts: 3

Credit: 293592026

RAC: 158509

I've reduced GPU tasks to 1

27 Apr 2020 16:39:28 UTC

Message 177160

(moderation:

)

I've reduced GPU tasks to 1 rather than 2 But still getting errors:

XLALExecuteKernel_OpenCL failed: CL_MEM_OBJECT_ALLOCATION_FAILURE
XLAL Error - XLALExecuteKernel_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/OpenCLutils.c:565): Internal function call failed
XLAL Error - XLALCLMEMVectorMemsetCOMPLEX8 (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/OpenCLutils.c:181): Check failed: XLALExecuteKernel_OpenCL ( &openclObj.kernel.kernel_MemsetCOMPLEX8 , &in->length, 1, last ) ==

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5850

Credit: 110040173967

RAC: 22413328

Larry Hubble wrote:I've

27 Apr 2020 21:51:42 UTC

Message 177168 in response to message 177160

(moderation:

)

Larry Hubble wrote:

I've reduced GPU tasks to 1 rather than 2 But still getting errors:

Whilst you would think that a 3GB GPU should have no trouble, apparently nvidia chose to 'encourage' the use of professional (and more expensive) GPUs for compute use by severely restricting the available memory on consumer grade cards being used for that purpose. There have been quite a few other reports of exactly this same issue with nvidia cards of 3GB or less VRAM.

The amount of RAM needed is variable, so some tasks will succeed. If you look at your stats for the GW search (O2MDF), you currently have nearly 570 failed tasks and around 110 that are pending. None have yet validated. Chances are that the pendings will validate, but a 1 in 6 ratio means that until the current VelaJr1 tasks are gone and memory requirements are lower, you should consider changing your preferences to opt out of the O2MDF search and just choose the gamma-ray pulsar search (FGRPB1G). For that search you have 64 valid and 30 pending and there certainly shouldn't be a similar problem, as the memory requirements are much lower. You should be able to run 2 concurrent tasks if you were doing that previously. Each concurrent task will need at least a full CPU core for support.

If you do decide to opt out of O2MDF, you should also set the pref for "Allow non-preferred apps" to NO. This makes sure the scheduler has no excuse to send you the 'wrong' type of task :-).

Cheers,
Gary.

Larry Hubble

Joined: 13 Mar 05

Posts: 3

Credit: 293592026

RAC: 158509

Thanks Gary I will take

27 Apr 2020 22:39:59 UTC

Message 177171

(moderation:

)

Thanks Gary

I will take that approach

Cheers,

Larry

Computation Error - Output file Absent

Forums › Problems and Bug Reports

To see the real error

I've reduced GPU tasks to 1

Larry Hubble wrote:I've

Thanks Gary I will take

Comment viewing options

Forums › Problems and Bug Reports