Hello experts
I got again this kiind of error:
17.10.2020 14:25:32 | Einstein@Home | Computation for task h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1585_1 finished
17.10.2020 14:25:32 | Einstein@Home | Output file h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1585_1_0 for task h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1585_1 absent
17.10.2020 14:25:32 | Einstein@Home | Output file h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1585_1_1 for task h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1585_1 absent
17.10.2020 14:25:32 | Einstein@Home | Output file h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1585_1_2 for task h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1585_1 absent
17.10.2020 14:25:45 | Einstein@Home | Starting task h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1592_2
17.10.2020 14:25:46 | Einstein@Home | Started upload of h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1585_1_3
17.10.2020 14:25:48 | Einstein@Home | Finished upload of h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1585_1_3
17.10.2020 14:25:55 | | Suspending computation - CPU is busy
17.10.2020 14:26:05 | | Resuming computation
17.10.2020 14:26:46 | Einstein@Home | Project requested delay of 60 seconds
17.10.2020 14:31:25 | Einstein@Home | Computation for task h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1592_2 finished
17.10.2020 14:31:25 | Einstein@Home | Output file h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1592_2_0 for task h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1592_2 absent
17.10.2020 14:31:25 | Einstein@Home | Output file h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1592_2_1 for task h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1592_2 absent
17.10.2020 14:31:25 | Einstein@Home | Output file h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1592_2_2 for task h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1592_2 absent
17.10.2020 14:31:47 | Einstein@Home | Starting task h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1591_2
17.10.2020 14:31:49 | Einstein@Home | Started upload of h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1592_2_3
17.10.2020 14:31:51 | Einstein@Home | Finished upload of h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1592_2_3
17.10.2020 14:33:10 | Einstein@Home | Project requested delay of 60 seconds
17.10.2020 14:35:46 | Einstein@Home | Computation for task h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1591_2 finished
17.10.2020 14:35:46 | Einstein@Home | Output file h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1591_2_0 for task h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1591_2 absent
17.10.2020 14:35:46 | Einstein@Home | Output file h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1591_2_1 for task h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1591_2 absent
17.10.2020 14:35:46 | Einstein@Home | Output file h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1591_2_2 for task h1_0383.45_O2C02Cl4In0__O2MDFS2_Spotlight_383.95Hz_1591_2 absent
What can I do to prvent this error ?
Please help
thanks
Jochen
Copyright © 2024 Einstein@Home. All rights reserved.
That is not the error, that
)
That is not the error, that is the result of the error. Because there was an error the result files are absent. You have to look at the stderr output for the actual error. That can be found here on the web site on your account page where your tasks are listed.
Hi! It is because of
)
Hi!
It is because of "OpenCL Device used for Search/Recalc and/or semi coherent step: 'GeForce GTX 1050 (Platform: NVIDIA CUDA, global memory: 2048 MiB)". All those tasks that failed had so called DF of 0.50 . They required more memory than what your 2GB GPU is able to provide. Some GW GPU tasks with lower DF could run with a 2GB card, but a user isn't able to choose what kind of tasks the project server will send, considering the DF.
It would be a fail safe solution to run only "Gamma-ray pulsar binary search #1 on GPUs" with a 2GB card.
Hello Harri, Last 2 weeks
)
Hello Harri,
Last 2 weeks I'm also facing the same problem mentioned in this ticket.
Many WU's were cancelled after +/- 2 min CPU time. Based on your comment I had a look to the stderr output, but there're a lot of errors being mentioned (like unknown with exit code 1024) that I've no clue what they mean or how to correct this. An insider / developer will understand for sure, but I'm not.
This is the stderr of one of the failing WU's, can you please have a look?
Task 1015155880
windows_x86_64
Stderr output
2020-10-12 17:32:16.1159 (63356) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O2MDF_2.07_windows_x86_64__GW-opencl-nvidia.exe'.
Activated exception handling...
[DEBUG} GPU type: 1
[DEBUG} got GPU info from BOINC
[DEBUG} got VendorID 4318
2020-10-12 17:32:16.4763 (63356) [debug]: BSGL output files
2020-10-12 17:32:16.4883 (63356) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2020-10-12 17:32:16.4883 (63356) [debug]: Set up communication with graphics process.
DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.19.2.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
%% LALPulsar: 1.17.1.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
%% LALApps: 6.23.0.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
2020-10-12 17:32:17.5658 (63356) [normal]: Reading input data ... 2020-10-12 17:33:32.9681 (63356) [normal]: Search FstatMethod used: 'ResampOpenCL'
2020-10-12 17:33:32.9681 (63356) [normal]: Recalc FstatMethod used: 'DemodSSE'
2020-10-12 17:33:32.9701 (63356) [normal]: OpenCL Device used for Search/Recalc and/or semi coherent step: 'GeForce GTX 950M (Platform: NVIDIA CUDA, global memory: 2048 MiB)'
2020-10-12 17:33:32.9701 (63356) [normal]: OpenCL version is used for the semi-coherent step!
2020-10-12 17:33:59.8248 (63356) [normal]: Number of segments: 12, total number of SFTs in segments: 10192
done.
% --- GPS reference time = 1177858472.0000 , GPS data mid time = 1177858472.0000
2020-10-12 17:33:59.8896 (63356) [normal]: dFreqStack = 2.251046e-007, df1dot = 5.685400e-013, df2dot = 4.020648e-019, df3dot = 0.000000e+000
% --- Setup, N = 12, T = 1296000 s, Tobs = 19750204 s, gammaRefine = 9, gamma2Refine = 23, gamma3Refine = 1
DEPRECATION WARNING: program has invoked obsolete function InitDopplerSkyScan(). Please see XLALInitDopplerSkyScan() for information about a replacement.
2020-10-12 17:33:59.9066 (63356) [normal]: INFO: No checkpoint checkpoint.cpt found - starting from scratch
% --- Cpt:0, total:26, sky:1/1, f1dot:1/26
0.% --- CG:2669916 FG:222119 f1dotmin_fg:-3.753135252444e-008 df1dot_fg:6.317111111111e-014 f2dotmin_fg:-1.922918608696e-019 df2dot_fg:1.748107826087e-020 f3dotmin_fg:0 df3dot_fg:1
XLAL Error - XLALComputeECLFFT_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp_OpenCL.c:1248): Processing FFT failed: CL_MEM_OBJECT_ALLOCATION_FAILURE
XLAL Error - XLALComputeECLFFT_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp_OpenCL.c:1248): Internal function call failed
XLAL Error - XLALComputeFaFb_Resamp_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp_OpenCL.c:654): Check failed: (*fftfuncs->computefft_func) ( fftfuncs->fftplan, ws->TS_FFT, ((void *)0) ) == XLAL_SUCCESS
XLAL Error - XLALComputeFaFb_Resamp_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp_OpenCL.c:654): Internal function call failed
XLAL Error - XLALComputeFstatResamp_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp_OpenCL.c:441): Check failed: XLALComputeFaFb_Resamp_OpenCL ( resamp, ws, thisPoint, common->dFreq, numFreqBins, TimeSeriesX_SRC_a, TimeSeriesX_SRC_b ) == XLAL_SUCCESS
XLAL Error - XLALComputeFstatResamp_OpenCL (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat_Resamp_OpenCL.c:441): Internal function call failed
XLAL Error - XLALComputeFstat (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat.c:875): Check failed: (input->method_funcs.compute_func) ( *Fstats, common, input->method_data ) == XLAL_SUCCESS
XLAL Error - XLALComputeFstat (/home/jenkins/workspace/workspace/EaH-GW-OpenCL-Testing/SLAVE/MinGW6.3/TARGET/windows-x64/EinsteinAtHome/source/lalsuite/lalpulsar/src/ComputeFstat.c:875): Internal function call failed
MAIN: XLALComputeFstat() failed with errno=1024
2020-10-12 17:34:00.7417 (63356) [CRITICAL]: ERROR: MAIN() returned with error '1024'
2020-10-12 17:34:00.7438 (63356) [debug]: resultfile '../../projects/einstein.phys.uwm.edu/h1_0397.30_O2C02Cl4In0__O2MDFS2_Spotlight_397.80Hz_1937_0_0' (len 96), current config file: 0
Code-version: %% LAL: 6.19.2.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
%% LALPulsar: 1.17.1.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
%% LALApps: 6.23.0.1 (CLEAN 98bbe72a728eb25935e9195dafae691335dabf8c)
FPU status flags: COND_0 PRECISION
2020-10-12 17:34:00.7438 (63356) [debug]: worker done. return(1024) to caller
2020-10-12 17:34:00.7438 (63356) [normal]: done. calling boinc_finish(1024).
17:34:00 (63356): called boinc_finish
</stderr_txt>
]]>
Paul Vleugels wrote:Many WU's
)
Hi Paul !
Your computer has "NVIDIA GeForce GTX 950M (2048MB) driver: 388.00".
All the GW GPU tasks that failed had a DF of 0.50 (task ID's had xxxx.30 ... xxx.80 or xxxx.35 ... xxx..85 in their names, 85 - 35 or 80 - 30 = DF of 0.50). Those GW GPU tasks with high DF value will require more than 2GB of GPU memory. That's why they have crashed early in the beginning.
Some GW GPU tasks with lower DF could run with a 2GB card, but a user isn't able to choose what kind of GW GPU tasks the project server will send, considering the DF.
Currently it would be a fail safe solution to run only "Gamma-ray pulsar binary search #1 on GPUs" with a 2GB GPU.
Also, the Nvidia GPU driver 388.00 on your computer is a three years old version. That could be a major problem for any kind of succesfull crunching with Nvidia GPU. I would definitely suggest you to visit Nvidia driver site and upgrade that driver to the latest version even if you didn't run Boinc.
Hello Richie, Thanks for
)
Hello Richie,
Thanks for your quick feedback !
Looking to my Speccy output I can see that my Nvidia GeForce GTX 950 M driver is current'ly '23.21.13.8800'. I wonder where and how you've see '388.00' (maybe your 388 is a sub string within 23.21.13.8800'?
Finding a driver update at the Nvidia side is not easy (can't find the 950 M at all), but I'll continue anyway.
Second question: how and where can I set I prefer to process 'Gamma-ray pulsar binary search #1 on GPUs' WU's as you suggest?
DRIVER: https://www.nvidia
)
DRIVER:
https://www.nvidia.com/en-us/drivers/results/165685/
Gamma Ray:
)
Gamma Ray:
ACCOUNT
PREFERENCE
PROJECT
make a check for Gamma Ray
and don'T forget to "save" change at the bottom
Paul Vleugels wrote:... I had
)
None of the people likely to reply here are "insiders" - we're all just volunteers like yourself.
The trick is not to look at error codes but to look (earlier in the output) for keywords. In your case, the key information is contained in the line that says:
Processing FFT failed: CL_MEM_OBJECT_ALLOCATION_FAILURE
The app uses a process referred to as a "Fast Fourier Transform" (FFT) to process a very large number of data points, all of which requires a lot of memory. The problem is that your GPU doesn't have enough memory to store the data - a memory allocation failure.
At earlier stages for this particular search, memory requirements were less so perhaps your card may just have had enough memory earlier on. As the run progresses, memory requirements are likely to increase even further so even if some tasks are not failing now, the situation is likely to get worse with time. As a general comment for anyone with a 2GB card, you should avoid the GW search and select the gamma-ray pulsar search instead. 2GB will be fine there.
Cheers,
Gary.
Quote:Paul Vleugels wrote:I
)
Thanks everybody for all your
)
Thanks everybody for all your advices and tips. Will process it and get back to you soon, enjoy your weekend !