O3 All-Sky fails on OpenCL NVIDIA Linux

Viktor Rak
Joined: 4 Jan 19
Posts: 4
Credit: 2339851
RAC: 26
Topic 226429

Having a lot of instant failures on start of Gravitational Wave search O3 All-Sky #1 v1.01 (GW-opencl-nvidia)
x86_64-pc-linux-gnu
 tasks with computation error - probably because of parameter --device used instead of --GPUDevice, as GPU tasks from other projects seem to work fine.

See https://einsteinathome.org/task/1192511451, for example:

TASK 1192511451

Name:h1_0361.80_O3aC01Cl1In0__O3AS1_362.00Hz_6523_0

Workunit ID:587629420

Created:17 Nov 2021 7:29:23 UTC

Sent:17 Nov 2021 7:29:24 UTC

Report deadline:24 Nov 2021 7:29:24 UTC

Received:17 Nov 2021 8:45:10 UTC

Server state:Over

Outcome:Computation error

Client state:Compute error

Exit status:1 (0x00000001) Unknown error code

Computer:12858660

Run time (sec):5.53

CPU time (sec):0.01

Peak working set size (MB):0

Peak swap size (MB):0

Peak disk usage (MB):0.02

Validation state:Invalid

Granted credit:0

Application:Gravitational Wave search O3 All-Sky #1 v1.01 (GW-opencl-nvidia)
x86_64-pc-linux-gnu


Stderr output

<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
putenv 'LAL_DEBUG_LEVEL=3'
2021-11-17 09:30:08.4021 (4779) [normal]: This program is published under the GNU General Public License, version 2
2021-11-17 09:30:08.4022 (4779) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2021-11-17 09:30:08.4022 (4779) [normal]: This Einstein@home App was built at: Aug  5 2021 17:20:50

2021-11-17 09:30:08.4022 (4779) [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/einstein_O3AS_1.01_x86_64-pc-linux-gnu__GW-opencl-nvidia'.
[DEBUG} GPU type: 1
[ERROR] Couldn't get OpenCL device from BOINC (-1)!
2021-11-17 09:30:08.4448 (4779) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2021-11-17 09:30:08.4448 (4779) [debug]: glibc version/release: 2.27/stable
2021-11-17 09:30:08.444835 - mytime()
2021-11-17 09:30:08.4450 (4779) [debug]: Set up communication with graphics process.

einstein_O3AS_1.01_x86_64-pc-linux-gnu__GW-opencl-nvidia: unrecognized option `--device'

Usage: einstein_O3AS_1.01_x86_64-pc-linux-gnu__GW-opencl-nvidia [-h|--help] [-v|--version] [@<config-file>] [--log] [--semiCohToplist] [--DataFiles1] [--IFOs] [--skyRegion] [--numSkyPartitions] [--partitionIndex] [--skyGridFile] [--dAlpha] [--dDelta] [-f|--Freq] [--dFreq] [-b|--FreqBand] [--f1dot] [--df1dot] [--f1dotBand] [--f2dot] [--df2dot] [--f2dotBand] [--f3dot] [--df3dot] [--f3dotBand] [--peakThrF] [-m|--mismatch1] [--gridType1] [--metricType1] [-g|--gammaRefine] [-G|--gamma2Refine] [-o|--fnameout] [--fnameChkPoint] [-n|--nCand1] [--printCand1] [--refTime] [--ephemEarth] [--ephemSun] [--minStartTime1] [--maxStartTime1] [--printFstat1] [--assumeSqrtSX] [--nStacksMax] [-T|--tStack] [--segmentList] [--recalcToplistStats] [--loudestSegOutput] [--writeLeanerOutput] [--tlCompartments] [--computeBSGL] [--Fstar0sc] [--oLGX] [--getMaxFperSeg] [--SortToplist] [--FstatMethod] [--FstatMethodRecalc] [--injectionSources] [--injectSqrtSX] [--timestampsFiles] [--Tsft] [--useGPUSemiCoh] [--GPUDevice]

2021-11-17 09:30:08.4462 (4779) [CRITICAL]: ERROR: MAIN() returned with error '1'

DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.21.0.1 (CLEAN 8d0838c264f9ff9adc8c3cdbfa17b5154eaa2994)
%% LALPulsar: 1.18.2.1 (CLEAN 8d0838c264f9ff9adc8c3cdbfa17b5154eaa2994)
%% LALApps: 6.25.1.1 (CLEAN 8d0838c264f9ff9adc8c3cdbfa17b5154eaa2994)

FPU status flags:
2021-11-17 09:30:08.4468 (4779) [debug]: worker done. return(1) to caller
2021-11-17 09:30:08.4468 (4779) [normal]: done. calling boinc_finish(1).
09:30:08 (4779): called boinc_finish

</stderr_txt>
]]>



Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18753192997
RAC: 7133476

Are you using

Are you using an <exclude_gpu> in an app_config.xml file?

Sounds like you haven't used the correct syntax for the statement. You need to specify the gpu type and use the BOINC enumeration of your device in the statement.  You can read how to construct the statement here.

Client configuration

 

Viktor Rak
Joined: 4 Jan 19
Posts: 4
Credit: 2339851
RAC: 26

No, I don't have a

No, I don't have a configuration on GPU restriction, not app-wise, nor project-wise.

I decided to give up on O3 tasks completely for now as I have no computer with more than 2GB video RAM anyway

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3959
Credit: 47050112642
RAC: 65165264

first, it seems you might not

first, it seems you might not have the openCL drivers installed since your stderr output indicates that it cannot find an openCL device. often times with Nvidia drivers on Ubuntu, if you installed the drivers via some PPA or repository, openCL components are not included, however they are included if you do the nvidia .run installer.

install them with:

sudo apt install ocl-icd-libopencl1

 

second, 2GB will likely not be enough for GW tasks anyway. you will fail tasks again for not enough memory. but with the opencl drivers properly installed, you will at least be able to run the gamma ray tasks.

_________________________________________________________________________

Viktor Rak
Joined: 4 Jan 19
Posts: 4
Credit: 2339851
RAC: 26

Thank you for the advice! I

Thank you for the advice! I already have the openCL components installed though:

ocl-icd-libopencl1 is already the newest version (2.2.11-1ubuntu1).

 
So I guess that the reason of fail may really be in the wrong option name the task is being invoked with:

einstein_O3AS_1.01_x86_64-pc-linux-gnu__GW-opencl-nvidia: unrecognized option `--device'

Usage: einstein_O3AS_1.01_x86_64-pc-linux-gnu__GW-opencl-nvidia [-h|--help] [-v|--version] [@<config-file>] [--log] [--semiCohToplist] [--DataFiles1] [--IFOs] [--skyRegion] [--numSkyPartitions] [--partitionIndex] [--skyGridFile] [--dAlpha] [--dDelta] [-f|--Freq] [--dFreq] [-b|--FreqBand] [--f1dot] [--df1dot] [--f1dotBand] [--f2dot] [--df2dot] [--f2dotBand] [--f3dot] [--df3dot] [--f3dotBand] [--peakThrF] [-m|--mismatch1] [--gridType1] [--metricType1] [-g|--gammaRefine] [-G|--gamma2Refine] [-o|--fnameout] [--fnameChkPoint] [-n|--nCand1] [--printCand1] [--refTime] [--ephemEarth] [--ephemSun] [--minStartTime1] [--maxStartTime1] [--printFstat1] [--assumeSqrtSX] [--nStacksMax] [-T|--tStack] [--segmentList] [--recalcToplistStats] [--loudestSegOutput] [--writeLeanerOutput] [--tlCompartments] [--computeBSGL] [--Fstar0sc] [--oLGX] [--getMaxFperSeg] [--SortToplist] [--FstatMethod] [--FstatMethodRecalc] [--injectionSources] [--injectSqrtSX] [--timestampsFiles] [--Tsft] [--useGPUSemiCoh] [--GPUDevice]

 

solling2
solling2
Joined: 20 Nov 14
Posts: 219
Credit: 1577611306
RAC: 19750

The current batch of these

The current batch of these tasks has lower RAM requirements. So I'm running a test system with an older 2GB Nvidia GPU with no problems. Differences I noticed are that it uses glibc version/release: 2.31 which is deployed with Ubuntu 20 but is backwards compatible and Boinc 7.16 instead of 7.9.     :-)

Viktor Rak
Joined: 4 Jan 19
Posts: 4
Credit: 2339851
RAC: 26

It may sound strange, but it

It may sound strange, but it seems that the problem is gone - it might be some initial video card misconfiguration fixed after system restart, - I have a hypothesis that it was NVIDIA using Power Saving Mode PRIME profile by default (so the calls were sent to integrated Intel graphic card instead of NVIDIA one.


At now I have O3 tasks on the same system going just fine with valid outcome now, see https://einsteinathome.org/task/1193782597, for example:
 

TASK 1193782597

Name:h1_0361.80_O3aC01Cl1In0__O3AS1_362.00Hz_5001_1

Workunit ID:588263137

Created:20 Nov 2021 9:57:32 UTC

Sent:20 Nov 2021 11:02:37 UTC

Report deadline:27 Nov 2021 11:02:37 UTC

Received:20 Nov 2021 17:49:30 UTC

Server state:Over

Outcome:Success

Client state:Done

Exit status:0 (0x00000000)

Computer:12858660

Run time (sec):3,304.21

CPU time (sec):3,336.09

Peak working set size (MB):251.11

Peak swap size (MB):26472.66

Peak disk usage (MB):4.6

Validation state:Valid

Granted credit:1,000

Application:Gravitational Wave search O3 All-Sky #1 v1.01 (GW-opencl-nvidia)
x86_64-pc-linux-gnu


Stderr output

<core_client_version>7.9.3</core_client_version>
<![CDATA[
<stderr_txt>
putenv 'LAL_DEBUG_LEVEL=3'
2021-11-20 17:53:15.2634 (3566) [normal]: This program is published under the GNU General Public License, version 2
2021-11-20 17:53:15.2635 (3566) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2021-11-20 17:53:15.2635 (3566) [normal]: This Einstein@home App was built at: Aug  5 2021 17:20:50
2021-11-20 17:53:15.2635 (3566) [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/einstein_O3AS_1.01_x86_64-pc-linux-gnu__GW-opencl-nvidia'.
[DEBUG} GPU type: 1
[DEBUG} got GPU info from BOINC
[DEBUG} got VendorID 4318
2021-11-20 17:53:15.3199 (3566) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2021-11-20 17:53:15.3199 (3566) [debug]: glibc version/release: 2.27/stable
2021-11-20 17:53:15.319951 - mytime()
2021-11-20 17:53:15.3201 (3566) [debug]: Set up communication with graphics process.
2021-11-20 17:53:15.3300 (3566) [normal]: Parsed user input successfully
DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.21.0.1 (CLEAN 8d0838c264f9ff9adc8c3cdbfa17b5154eaa2994)
%% LALPulsar: 1.18.2.1 (CLEAN 8d0838c264f9ff9adc8c3cdbfa17b5154eaa2994)
%% LALApps: 6.25.1.1 (CLEAN 8d0838c264f9ff9adc8c3cdbfa17b5154eaa2994)
2021-11-20 17:53:15.3300 (3566) [normal]: Initialise compartments with freqWidth = 0.05 and candidates per compartment = 3000.
2021-11-20 17:53:15.9758 (3566) [normal]: Reading input data ... 
2021-11-20 17:53:15.9759 (3566) [normal]: Loading SFTs matching '../../projects/einstein.phys.uwm.edu/h1_0361.80_O3aC01Cl1In0;../../projects/einstein.phys.uwm.edu/l1_0361.80_O3aC01Cl1In0;../../projects/einstein.phys.uwm.edu/h1_0362.00_O3aC01Cl1In0;../../projects/einstein.phys.uwm.edu/l1_0362.00_O3aC01Cl1In0;../../projects/einstein.phys.uwm.edu/h1_0362.20_O3aC01Cl1In0;../../projects/einstein.phys.uwm.edu/l1_0362.20_O3aC01Cl1In0;../../projects/einstein.phys.uwm.edu/h1_0362.40_O3aC01Cl1In0;../../projects/einstein.phys.uwm.edu/l1_0362.40_O3aC01Cl1In0;../../projects/einstein.phys.uwm.edu/h1_0362.60_O3aC01Cl1In0;../../projects/einstein.phys.uwm.edu/l1_0362.60_O3aC01Cl1In0' into catalog ...2021-11-20 17:53:16.6343 (3566) [normal]: done.
2021-11-20 17:53:16.6344 (3566) [normal]: Validating SFTs (detectors: H1, L1, ) ... success.
2021-11-20 17:53:20.5222 (3566) [normal]: Search FstatMethod used: 'ResampGPU'
2021-11-20 17:53:20.5223 (3566) [normal]: Recalc FstatMethod used: 'DemodSSE'
2021-11-20 17:53:20.5229 (3566) [normal]: GPU Device used for Search/Recalc and/or semi coherent step: 'NVIDIA GeForce MX130 ( Platform: NVIDIA CUDA )'
2021-11-20 17:53:20.5229 (3566) [normal]: GPU Backend used for Search/Recalc and/or semi coherent step: 'OpenCL'
2021-11-20 17:53:20.5229 (3566) [normal]: GPU version is used for the semi-coherent step!
2021-11-20 17:53:42.0394 (3566) [normal]: Number of segments: 37, total number of SFTs in segments: 11745
2021-11-20 17:53:42.0466 (3566) [normal]: Finished reading input data.
% --- GPS reference time = 1246070525.0000 ,  GPS data mid time = 1246070525.0000
2021-11-20 17:53:42.0466 (3566) [normal]: dFreqStack = 2.000000e-06, df1dot = 1.500000e-10, df2dot = 0.000000e+00, df3dot = 0.000000e+00
% --- Setup, N = 37, T = 432000 s, Tobs = 15809012 s, gammaRefine = 250, gamma2Refine = 4653, gamma3Refine = 1
DEPRECATION WARNING: program has invoked obsolete function InitDopplerSkyScan(). Please see XLALInitDopplerSkyScan() for information about a replacement.
2021-11-20 17:53:45.1234 (3566) [normal]: INFO: No checkpoint checkpoint.cpt found - starting from scratch
% --- Cpt:0,  total:2000,  sky:1/100,  f1dot:1/20
0.% --- CG:9272015 FG:250000  f1dotmin_fg:-2.717183860529e-09 df1dot_fg:5.97609561753e-13 f2dotmin_fg:0 df2dot_fg:0 f3dotmin_fg:0 df3dot_fg:1
...................
1.c
...................
2....................
3...c
.................
4....................
5.....c
...............
6....................
7........c
............
8....................
9..........c
..........
10....................
11..............c
......
12....................
13.................c
...
14....................
15....................c
16....................
17....................
18...c
.................
19....................
20.......c
.............
21....................
22...........c
.........
23....................
24...............c
.....
25....................
26..................c
..
27....................
28....................
29.c
...................
30....................
31.....c
...............
32....................
33........c
............
34....................
35............c
........
36....................
37...............c
.....
38....................
39...................c
.
40....................
41....................
42..c
..................
43....................
44......c
..............
45....................
46..........c
..........
47....................
48..............c
......
49....................
50..................c
..
51....................
52....................
53.c
...................
54....................
55.....c
...............
56....................
57.........c
...........
58....................
59.............c
.......
60....................
61.................c
...
62....................
63....................
64.c
...................
65....................
66.....c
...............
67....................
68.........c
...........
69....................
70.............c
.......
71....................
72.................c
...
73....................
74....................
75.c
...................
76....................
77.....c
...............
78....................
79.........c
...........
80....................
81.............c
.......
82....................
83.................c
...
84....................
85....................
86.c
...................
87....................
88.....c
...............
89....................
90.........c
...........
91....................
92.............c
.......
93....................
94.................c
...
95....................
96....................
97.c
...................
98....................
99.....c
...............
2021-11-20 18:39:57.9270 (3566) [normal]: Finished main analysis.
2021-11-20 18:39:57.9278 (3566) [normal]: Recalculating statistics for the final toplist...
2021-11-20 18:49:13.6102 (3566) [normal]: Finished recalculating toplist statistics.
2021-11-20 18:49:13.6103 (3566) [normal]: Finished in 3358.28 s with peak RAM usage: 377.4 MB on CPU 'Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz', peak VRAM usage: 867.9 MB on GPU Device: 'NVIDIA GeForce MX130 ( Platform: NVIDIA CUDA )' with backend: 'OpenCL'.
2021-11-20 18:49:13.6439 (3566) [debug]: Writing output ... done.
DEPRECATION WARNING: program has invoked obsolete function FreeDopplerSkyScan(). Please see XLALDestroyDopplerSkyScan() for information about a replacement.
Code-version: %% LAL: 6.21.0.1 (CLEAN 8d0838c264f9ff9adc8c3cdbfa17b5154eaa2994)
%% LALPulsar: 1.18.2.1 (CLEAN 8d0838c264f9ff9adc8c3cdbfa17b5154eaa2994)
%% LALApps: 6.25.1.1 (CLEAN 8d0838c264f9ff9adc8c3cdbfa17b5154eaa2994)
FPU status flags:  COND_1 PRECISION
2021-11-20 18:49:14.4972 (3566) [debug]: worker done. return(0) to caller
2021-11-20 18:49:14.4972 (3566) [normal]: done. calling boinc_finish(0).
18:49:14 (3566): called boinc_finish
</stderr_txt>
]]>

The question now is why I still receive O3 tasks if I turned them off in the project preferences here, but I think it's just a matter of time.

Have a good weekend!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.