New GPU errors running Gravitational Wave search O2

Cherokee150
Cherokee150
Joined: 13 May 11
Posts: 7
Credit: 48,560,931
RAC: 41,803
Topic 220480

Can anyone help?

My computer, 4125225, has been processing Einstein for years.  It has never had a problem before, but now I am getting an error on all the tasks it is processing.

Can anyone help me, as I am unable to continue completing any tasks successfullyfor Einstein until this is resolved!

Here are the details:

Currently I am only getting Gravitational Wave search O2 Multi-Directional GPU v2.07 () windows_x86_64 for my Windows 10 x-64 OS with a GeForce GTX 960 running version 390.65.

The error message I am getting from the Stderr output is:

 

DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.

Code-version: %% LAL: 6.19.2.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
%% LALPulsar: 1.17.1.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
%% LALApps: 6.23.0.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c

-------------------------------

 

For reference, here is the entire Stderr output from one of the failed tasks:

 

Task 913775386

Name:
h1_1171.30_O2C02Cl4In0__O2MDFG2e_G34731_1171.70Hz_287_1
Workunit ID:
434664359
Created:
15 Jan 2020 4:56:35 UTC
Sent:
19 Jan 2020 7:17:24 UTC
Report deadline:
26 Jan 2020 7:17:24 UTC
Received:
19 Jan 2020 9:06:48 UTC
Server state:
Over
Outcome:
Computation error
Client state:
Compute error
Exit status:
1024 (0x00000400) Unknown error code
Computer:
4125225
Run time (sec):
175.84
CPU time (sec):
107.53
Peak working set size (MB):
324.3
Peak swap size (MB):
1155.81
Peak disk usage (MB):
0.02
Validation state:
Invalid
Granted credit:
0
Application:
Gravitational Wave search O2 Multi-Directional GPU v2.07 (GW-opencl-nvidia)
windows_x86_64
Stderr output

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 1024 (0x400)</message>
<stderr_txt>
putenv 'LAL_DEBUG_LEVEL=3'
2020-01-19 02:41:01.1811 (5204) [normal]: This program is published under the GNU General Public License, version 2
2020-01-19 02:41:01.1967 (5204) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2020-01-19 02:41:01.1967 (5204) [normal]: This Einstein@home App was built at: Dec 19 2019 12:14:49

2020-01-19 02:41:01.1967 (5204) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O2MDF_2.07_windows_x86_64__GW-opencl-nvidia.exe'.
Activated exception handling...
[DEBUG} GPU type: 1
[DEBUG} got GPU info from BOINC
[DEBUG} got VendorID 4318
2020-01-19 02:41:01.2748 (5204) [debug]: BSGL output files
2020-01-19 02:41:01.2905 (5204) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2020-01-19 02:41:01.2905 (5204) [debug]: Set up communication with graphics process.

DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.19.2.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
%% LALPulsar: 1.17.1.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
%% LALApps: 6.23.0.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)

2020-01-19 02:41:02.9003 (5204) [normal]: Reading input data ... 2020-01-19 02:43:12.4209 (5204) [normal]: Search FstatMethod used: 'ResampOpenCL'
2020-01-19 02:43:12.4209 (5204) [normal]: Recalc FstatMethod used: 'DemodSSE'
2020-01-19 02:43:12.4209 (5204) [normal]: OpenCL Device used for Search/Recalc and/or semi coherent step: 'GeForce GTX 960 (Platform: NVIDIA CUDA, global memory: 2048 MiB)'
2020-01-19 02:43:12.4209 (5204) [normal]: OpenCL version is used for the semi-coherent step!
2020-01-19 02:43:51.3667 (5204) [normal]: Number of segments: 6, total number of SFTs in segments: 9873
 done.
% --- GPS reference time = 1177858472.0000 ,  GPS data mid time = 1177858472.0000
2020-01-19 02:43:51.4604 (5204) [normal]: dFreqStack = 1.902483e-007, df1dot = 4.494707e-013, df2dot = 1.589301e-019, df3dot = 0.000000e+000
% --- Setup, N = 6, T = 2592000 s, Tobs = 19750204 s, gammaRefine = 21, gamma2Refine = 11, gamma3Refine = 1

DEPRECATION WARNING: program has invoked obsolete function InitDopplerSkyScan(). Please see XLALInitDopplerSkyScan() for information about a replacement.
2020-01-19 02:43:51.5073 (5204) [normal]: INFO: No checkpoint checkpoint.cpt found - starting from scratch
% --- Cpt:0,  total:168,  sky:1/1,  f1dot:1/168

0.% --- CG:1578018 FG:262815  f1dotmin_fg:-1.995797403367e-008 df1dot_fg:2.140336666667e-014 f2dotmin_fg:-7.224095454545e-020 df2dot_fg:1.444819090909e-020 f3dotmin_fg:0 df3dot_fg:1
..........putenv 'LAL_DEBUG_LEVEL=3'
2020-01-19 03:00:36.3412 (1540) [normal]: This program is published under the GNU General Public License, version 2
2020-01-19 03:00:36.3412 (1540) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2020-01-19 03:00:36.3412 (1540) [normal]: This Einstein@home App was built at: Dec 19 2019 12:14:49

2020-01-19 03:00:36.3412 (1540) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O2MDF_2.07_windows_x86_64__GW-opencl-nvidia.exe'.
Activated exception handling...
[DEBUG} GPU type: 1
[DEBUG} got GPU info from BOINC
[DEBUG} got VendorID 4318
2020-01-19 03:00:36.9975 (1540) [debug]: BSGL output files
2020-01-19 03:00:37.9037 (1540) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2020-01-19 03:00:37.9037 (1540) [debug]: Set up communication with graphics process.

DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.19.2.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
%% LALPulsar: 1.17.1.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
%% LALApps: 6.23.0.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)

2020-01-19 03:00:43.4812 (1540) [normal]: Reading input data ... 2020-01-19 03:03:03.9836 (1540) [normal]: Search FstatMethod used: 'ResampOpenCL'
2020-01-19 03:03:03.9836 (1540) [normal]: Recalc FstatMethod used: 'DemodSSE'
2020-01-19 03:03:03.9836 (1540) [normal]: OpenCL Device used for Search/Recalc and/or semi coherent step: 'GeForce GTX 960 (Platform: NVIDIA CUDA, global memory: 2048 MiB)'
2020-01-19 03:03:03.9992 (1540) [normal]: OpenCL version is used for the semi-coherent step!
2020-01-19 03:03:27.1856 (1540) [normal]: Number of segments: 6, total number of SFTs in segments: 9873
 done.
% --- GPS reference time = 1177858472.0000 ,  GPS data mid time = 1177858472.0000
2020-01-19 03:03:27.2481 (1540) [normal]: dFreqStack = 1.902483e-007, df1dot = 4.494707e-013, df2dot = 1.589301e-019, df3dot = 0.000000e+000
% --- Setup, N = 6, T = 2592000 s, Tobs = 19750204 s, gammaRefine = 21, gamma2Refine = 11, gamma3Refine = 1

DEPRECATION WARNING: program has invoked obsolete function InitDopplerSkyScan(). Please see XLALInitDopplerSkyScan() for information about a replacement.
2020-01-19 03:03:27.7481 (1540) [normal]: INFO: No checkpoint checkpoint.cpt found - starting from scratch
% --- Cpt:0,  total:168,  sky:1/1,  f1dot:1/168

0.% --- CG:1578018 FG:262815  f1dotmin_fg:-1.995797403367e-008 df1dot_fg:2.140336666667e-014 f2dotmin_fg:-7.224095454545e-020 df2dot_fg:1.444819090909e-020 f3dotmin_fg:0 df3dot_fg:1
XLAL Error - XLALExecuteKernel_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/OpenCLutils.c:565): XLALExecuteKernel_OpenCL failed: CL_MEM_OBJECT_ALLOCATION_FAILURE
XLAL Error - XLALExecuteKernel_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/OpenCLutils.c:565): Internal function call failed
XLAL Error - XLALCLMEMVectorMemsetCOMPLEX8 (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/OpenCLutils.c:181): Check failed: XLALExecuteKernel_OpenCL ( &openclObj.kernel.kernel_MemsetCOMPLEX8 , &in->length, 1, last ) == XLAL_SUCCESS
XLAL Error - XLALCLMEMVectorMemsetCOMPLEX8 (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/OpenCLutils.c:181): Internal function call failed
XLAL Error - XLALBarycentricResampleMultiCOMPLEX8TimeSeries_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp_OpenCL.c:832): Check failed: XLALCLMEMVectorMemsetCOMPLEX8 ( ws->TStmp1_SRC, 0, (1==0) ) == XLAL_SUCCESS
XLAL Error - XLALBarycentricResampleMultiCOMPLEX8TimeSeries_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp_OpenCL.c:832): Internal function call failed
XLAL Error - XLALBarycentricResampleMultiCOMPLEX8TimeSeries (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp.c:433): Check failed: XLALBarycentricResampleMultiCOMPLEX8TimeSeries_OpenCL ( resamp, common, multiSRCtimes ) == XLAL_SUCCESS
XLAL Error - XLALBarycentricResampleMultiCOMPLEX8TimeSeries (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp.c:433): Internal function call failed
XLAL Error - XLALComputeFstatResamp_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp_OpenCL.c:400): Check failed: XLALBarycentricResampleMultiCOMPLEX8TimeSeries ( resamp, &thisPoint, common ) == XLAL_SUCCESS
XLAL Error - XLALComputeFstatResamp_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp_OpenCL.c:400): Internal function call failed
XLAL Error - XLALComputeFstat (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat.c:875): Check failed: (input->method_funcs.compute_func) ( *Fstats, common, input->method_data ) == XLAL_SUCCESS
XLAL Error - XLALComputeFstat (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat.c:875): Internal function call failed
MAIN: XLALComputeFstat() failed with errno=1024
2020-01-19 03:03:27.8106 (1540) [CRITICAL]: ERROR: MAIN() returned with error '1024'
2020-01-19 03:03:27.8262 (1540) [debug]: resultfile '../../projects/einstein.phys.uwm.edu/h1_1171.30_O2C02Cl4In0__O2MDFG2e_G34731_1171.70Hz_287_1_0' (len 94), current config file: 0
Code-version: %% LAL: 6.19.2.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
%% LALPulsar: 1.17.1.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
%% LALApps: 6.23.0.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)

FPU status flags:
2020-01-19 03:03:27.8887 (1540) [debug]: worker done. return(1024) to caller
2020-01-19 03:03:27.8887 (1540) [normal]: done. calling boinc_finish(1024).
03:03:27 (1540): called boinc_finish

</stderr_txt>
]]>

----------------------------

 

Thank you!!

Richie
Richie
Joined: 7 Mar 14
Posts: 531
Credit: 1,637,262,032
RAC: 766,079

How many GW GPU tasks at a

How many GW GPU tasks at a time has that GTX 960 been running ?

Would you be ready to try out a newer Nvidia driver ?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,045
Credit: 34,790,000,443
RAC: 35,396,434

Cherokee150 wrote:The error

Cherokee150 wrote:
The error message I am getting from the Stderr output is:

That's not the error message.  It's just a benign warning about deprecated functions that everybody sees.  The real error message is much later and people reading/posting in these forums are unlikely to have any knowledge of what exit code 1024 means.  It's even listed as an "unknown error".  It might mean something to whoever wrote the code.

Your tasks list shows no tasks earlier than a day or two ago and yet to have a RAC in the thousands.  That implies that at some point not too long ago you were returning successful work - work that has since been dropped from the online database.  It would be quite useful to know exactly what that work was, did it also involve your GPU, and what changes if any you have made to your system since that work was completed and before you started the current GW tasks.

You have a solitary Gamma-ray pulsar task in your list of 'in progress' tasks.  Please suspend all the GW tasks you have left and allow that single task to run.  It would be useful to know if it can complete rather than just ending in a compute error.  If it can complete, it at least gives you a way forward with that type of work rather than the GW stuff.  It would also indicate that your hardware is OK and that perhaps the problem lies elsewhere - perhaps some driver component.

Cheers,
Gary.

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3,001
Credit: 3,314,748,087
RAC: 188,835

Take a look at look at this

Take a look at look at this post from BM.  He recommend a resetting of the project or failing that, detaching and reattaching to see if that corrects the problem.

 
 
https://einsteinathome.org/goto/comment/114787

Holmis
Joined: 4 Jan 05
Posts: 1,041
Credit: 737,603,415
RAC: 178,229

@Zalster: That response from

@Zalster: That response from Bernd was posted 2013 when this GW GPU-app did not yet exist and the error code in the other thread was not the same as the error code in this thread.

As to the problem discussed in this thread, if one digs a bit in the output the following can be uncovered as the first real sign of trouble: XLALExecuteKernel_OpenCL failed: CL_MEM_OBJECT_ALLOCATION_FAILURE

This might indicate one or more of the following problems:
Running out of memory on the GPU - If you're trying to run more than one task at a time then try just one task.
GPU driver problem - Try reinstalling/updating the driver, if recently updated then maybe try an older version.
Insufficient GPU hardware capabilities.
Faulty GPU.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 23
Credit: 154,660,716
RAC: 1,634,594

If I recall, the gravity wave

If I recall, the gravity wave tasks use a lot of VRAM, greater than 1GB. so 2 or more one the GTX 960 which only has 2GB, is likely to cause you to run out of available VRAM.

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 63
Credit: 1,735,055,197
RAC: 6,580,320

My setup:Two GTX 960 (VRAM

My setup:

Two GTX 960 (VRAM 4GB each) stock, no OC, using NVIDIA driver 441.66 on WIN7 with Intel i5 3570.

Each GPU uses LESS than 1050MB VRAM at a load of MAX 90% and run-time LESS than 25 min.

Only running ONE (x1) GW on each GPU.

Ran 9 GW O2 Multi-Directional GPU v2.07 WUs so far with no errors, but still waiting for validation.

I will run some more WUs and in case of errors, I will update here.

 

UPDATE:  so far many have validated ...

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 23
Credit: 154,660,716
RAC: 1,634,594

yeah working fine since you

yeah working fine since you have a 4GB GPU. the OP has only 2GB on his 960.

Richie
Richie
Joined: 7 Mar 14
Posts: 531
Credit: 1,637,262,032
RAC: 766,079

Hi is "Only running ONE (x1)

Hi is "Only running ONE (x1) GW on each GPU". That would work on any GTX 960, no matter if it was 2GB or 4GB card,

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 23
Credit: 154,660,716
RAC: 1,634,594

Richie wrote:Hi is "Only

Richie wrote:
Hi is "Only running ONE (x1) GW on each GPU". That would work on any GTX 960, no matter if it was 2GB or 4GB card,

yes, San-Fernando states that he is only running 1x on his 4GB card

but we have no confirmation yet on how Cherokee (the OP, Original Poster) is running on his 2GB card. he was asked in the first reply how many WU he is running on his 2GB card but has yet to reply to provide this information.

If Cherokee is running 2X WU, then that could very well be his problem.he might need to try only 1x, or switch to the Gamma ray WUs instead.

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 75
Credit: 547,801,341
RAC: 578,018

Wasn't there a limitation for

Wasn't there a limitation for Nvidia opencl that it can only access about 25% of the GPU memory?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.