Hi,
I am trying to understand what could be possibly the cause of my issue but I can't get a good grasp on the root cause, my WUs error our all the time almost immediately.
The only interesting info I get is the following error log:
[13:01:23][57640][INFO ] Application startup - thank you for supporting Einstein@Home!
[13:01:23][57640][INFO ] Starting data processing...
[13:01:23][57640][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[13:01:23][57640][INFO ] Using OpenCL device "gfx1031" by: Advanced Micro Devices, Inc.
[13:01:23][57640][ERROR] Couldn't create OpenCL command queue (error: -6)!
[13:01:23][57640][INFO ] OpenCL shutdown complete!
[13:01:23][57640][ERROR] Demodulation failed (error: 2013)!
[13:01:23][57640][WARN ] Sorry, at the moment your system doesn't have enough free CPU/GPU memory to run this task!
I am running the following hw/sw:
ArchLinux with kernel 5.19.11
amdgpu drivers with rocm-opencl-runtime
boinc 7.20.2
AMD Radeon 6700XT
Anybody got a similar issue?
Copyright © 2024 Einstein@Home. All rights reserved.
If at all possible, can you
)
If at all possible, can you see if you still get the error with BOINC before v7.18? I seem to have the same error as you're reporting, when running recent BOINC versions.
https://boinc.berkeley.edu/forum_thread.php?id=14786
Soli Deo Gloria
Wedge009 wrote: If at all
)
My problem with that pc is that it is actually NOT an AMD cpu so no AMD drivers were ever loaded on that pc, it runs MilkyWay tasks just fine and MilkyWay sees as it an NVIDIA Quadro K600 (1023MB) driver: 340.10 OpenCL: 1.1 which is what it really is. That power supply is maxed out so it's a gpu without a power plug and I should not have put it on Einstein to begin with.
I'm sorry, I don't understand
)
I'm sorry, I don't understand your response. Did you intend to reply to this discussion? I thought this was about AMD, not Nvidia.
Soli Deo Gloria
I did some digging -
)
I did some digging - initialize_ocl() seems to be a function in Einstein code, not BOINC. For whatever reason, though, newer BOINCs causes a problem in it. According to the source code for BRP - which may well be out of date - error code 2013 is the definition in demod_binary.h for RADPUL_OCL_MEM_ALLOC_DEVICE. It's one of the error codes in response clCreateCommandQueue(), which is an OpenCL function. Error code -6 corresponds to CL_OUT_OF_HOST_MEMORY. It seems to be a common error code for a variety of reasons, so I suspect it's not really out of memory, just some weird interaction between potentially old Einstein code and new BOINC.
Soli Deo Gloria
I solved my issue randomly by
)
I solved my issue randomly by trying to run boinc in another folder without systemd, this is NOT related to einstein in any way but in the way ArchLinux packages boinc, the systemd script uses the following value: ProtectSystem=strict
This seems to create some very weird permission issues that I don't fully understand, but basically the user boinc can't fully use OpenCL, commenting out that line makes everything working!
Well, I'm glad you seem to
)
Well, I'm glad you seem to have your situation resolved, although I'm puzzled why none of your hosts are registered as having been active with Einstein within the last 30 days. I do wonder what systemd might have to do with BOINC and ROCm, however...
Soli Deo Gloria
I am still finishing some
)
I am still finishing some pool work for gridcoin, that's why. Once that will be done I'll move back to my own account :)
For the benefit of anyone
)
For the benefit of anyone finding this thread later, I point out your work-around mentioned in https://einsteinathome.org/content/brp7-opencl-ati-wont-run-because-not-enough-mem
Soli Deo Gloria