Dear all,
I just noticed that all my BRP7 tasks are failing with the following error (AMD Radeon RX 7900 XTX on Ubuntu 22.04):
<core_client_version>7.20.5</core_client_version> <![CDATA[ <message> process exited with code 11 (0xb, -245)</message> <stderr_txt> [10:56:45][156661][INFO ] Application startup - thank you for supporting Einstein@Home! [10:56:45][156661][INFO ] Starting data processing... [10:56:45][156661][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc. [10:56:45][156661][INFO ] Using OpenCL device "gfx1100" by: Advanced Micro Devices, Inc. [10:56:50][156661][INFO ] Number of generated templates to be used: 50000 [10:56:50][156661][INFO ] Checkpoint file unavailable: M22_1_dns_cfbf00004_segment_4_dms_100_0.cpt (No such file or directory). ------> Starting from scratch... [10:56:50][156661][INFO ] Header contents: ------> Original WAPP file: /atlas/data/TRAPUM_GC/M22/epoch1/M22/2021-02-14-06:43:10/1284/30min_segments/dedispersed_files/cfbf00004/M22_1_dns_cfbf00004_segment_4_dms_100_DM85.00 ------> Sample time in microseconds: 153.121 ------> Observation time in seconds: 2568.9524 ------> Time stamp (MJD): 59259.339546529867 ------> Number of samples/record: 0 ------> Center freq in MHz: 857.5673828 ------> Channel band in MHz: 3.34375 ------> Number of channels/record: 256 ------> Nifs: 1 ------> RA (J2000): 183622.66 ------> DEC (J2000): -235355.700001 ------> Galactic l: 0 ------> Galactic b: 0 ------> Name: M22 ------> Lagformat: 0 ------> Sum: 1 ------> Level: 3 ------> AZ at start: 0 ------> ZA at start: 0 ------> AST at start: 0 ------> LST at start: 0 ------> Project ID: -- ------> Observers: -- ------> File size (bytes): 0 ------> Data size (bytes): 0 ------> Number of samples: 16777216 ------> Trial dispersion measure: 85 cm^-3 pc ------> Scale factor: 0.964776 [10:56:50][156661][INFO ] Seed for random number generator is 1122104711.
[10:56:52][156661][ERROR] Application caught signal 11.
------> Obtained 7 stack frames for this thread.
------> Backtrace:
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 7:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b929f)
Source file: erp_boinc_wrapper.cpp (Function: ‰% / Line: 159)
Frame 6:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7f5bb2bc0520)
Offset info: +0x42520
BFD: DWARF error: could not find variable specification at offset 2052
BFD: DWARF error: could not find variable specification at offset 20c8
BFD: DWARF error: could not find variable specification at offset 3cf9
BFD: DWARF error: could not find variable specification at offset 3d47
BFD: DWARF error: could not find variable specification at offset 3d52
BFD: DWARF error: could not find variable specification at offset 4319
BFD: DWARF error: could not find variable specification at offset 435b
BFD: DWARF error: could not find variable specification at offset 439d
Frame 5:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4bc0e3)
Offset info: MAIN+0x24f3
Source file: demod_binary.c (Function: MAIN / Line: 1275)
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 4:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b8b1e)
Offset info: main+0xe3e
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 518)
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 622)
Frame 3:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7f5bb2ba7d90)
Offset info: +0x29d90
Frame 2:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7f5bb2ba7e40)
Offset info: __libc_start_main+0x80
Frame 1:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b9161)
Source file: unknown (Function: _start / Line: 0)
------> End of backtrace
10:56:52 (156661): called boinc_finish(11)
</stderr_txt>
]]>
See https://einsteinathome.org/fr/task/1442986876 for more details.
What am I doing wrong ?
Best regards,
Samuel
Copyright © 2024 Einstein@Home. All rights reserved.
Yes sir indeed. These 'M22'
)
Yes sir indeed. These 'M22' w/u's. All failing over here on amdgpu 570's , 6800XT's, etc.
------------------------
Stderr output
[05:20:37][21466][ERROR] Application caught signal 11.
------> Obtained 6 stack frames for this thread.
------> Backtrace:
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b929f)
Source file: erp_boinc_wrapper.cpp (Function: / Line: 159)
Frame 5:
Binary file: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f068cb93420)
Offset info: +0x14420
BFD: DWARF error: could not find variable specification at offset 2052
BFD: DWARF error: could not find variable specification at offset 20c8
BFD: DWARF error: could not find variable specification at offset 3cf9
BFD: DWARF error: could not find variable specification at offset 3d47
BFD: DWARF error: could not find variable specification at offset 3d52
BFD: DWARF error: could not find variable specification at offset 4319
BFD: DWARF error: could not find variable specification at offset 435b
BFD: DWARF error: could not find variable specification at offset 439d
Frame 4:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4bc0e3)
Offset info: MAIN+0x24f3
Source file: demod_binary.c (Function: MAIN / Line: 1275)
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 3:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b8b1e)
Offset info: main+0xe3e
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 518)
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 622)
Frame 2:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7f068c680083)
Offset info: __libc_start_main+0xf3
Frame 1:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b9161)
Source file: unknown (Function: _start / Line: 0)
------> End of backtrace
05:20:37 (21466): called boinc_finish(11)
</stderr_txt>
]]>
The weather has been warming
)
The weather has been warming out here. But not THAT much. lol ....
The FGRPB1G's still run fine, as well as the earlier BRP7 tasks that have the
Te5_4 prefix. It's the M22's (BRP7) that all fail. No exceptions so far. So I'm not looking
for any hardware issue at this point. All appears well on that front.
I am getting the same with a
)
I am getting the same with a Quadro RTX A4500 on Opensuse Linux.
Now I have 24 back-off timer
)
Now I have 24 back-off timer from Einstein@home, I guess because of too many failed tasks.
EDIT: I am also seeing this on my second Opensuse box that runs an Nvidia GTX 970
Same here... all M22
)
Same here... all M22 (BRP7-opencl) end with computation error on GTX1060. I see some wingpersons running workunits to completion on Windows cuda55 but a few that I checked were failing to validate. ?? I think the admins are on it ?? - some workunits (N+1)th copy of the task ID show "unsent". And server status shows wugcontrol BRP7 -not running-.
I've got the same problem for
)
I've got the same problem for rx 570, rx 6600 and rx 5700xt. It's started yesterday it seems. I hope einstein guys are looking into that!
MeerKat M22 fails. Now down
)
MeerKat M22 fails. Now down to about half the time depending on the machine.
Some machines fail most all or all, a few seem to run ok. None are overclocked over
stock defaults. I can adjust fan on amdgpu but not clocks so they run at default.
--------------
Task 1444707069
x86_64-pc-linux-gnu
Stderr output
[06:52:12][31266][ERROR] Application caught signal 11.
------> Obtained 6 stack frames for this thread.
------> Backtrace:
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b929f)
Source file: erp_boinc_wrapper.cpp (Function: / Line: 159)
Frame 5:
Binary file: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f2db049c420)
Offset info: +0x14420
BFD: DWARF error: could not find variable specification at offset 2052
BFD: DWARF error: could not find variable specification at offset 20c8
BFD: DWARF error: could not find variable specification at offset 3cf9
BFD: DWARF error: could not find variable specification at offset 3d47
BFD: DWARF error: could not find variable specification at offset 3d52
BFD: DWARF error: could not find variable specification at offset 4319
BFD: DWARF error: could not find variable specification at offset 435b
BFD: DWARF error: could not find variable specification at offset 439d
Frame 4:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4bc0e3)
Offset info: MAIN+0x24f3
Source file: demod_binary.c (Function: MAIN / Line: 1275)
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 3:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b8b1e)
Offset info: main+0xe3e
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 518)
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 622)
Frame 2:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7f2daff89083)
Offset info: __libc_start_main+0xf3
Frame 1:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b9161)
Source file: unknown (Function: _start / Line: 0)
------> End of backtrace
06:52:12 (31266): called boinc_finish(11)
</stderr_txt>
]]>
Here's an error log from an
)
Here's an error log from an amdgpu 6800XT machine. The one above was from a machine with 2x 570RX gpus.
Look pretty similar to a non-programmer. Some machines run some ok and some fail.
-----------
Stderr output
[04:25:07][24629][ERROR] Application caught signal 11.
------> Obtained 6 stack frames for this thread.
------> Backtrace:
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b929f)
Source file: erp_boinc_wrapper.cpp (Function: / Line: 159)
Frame 5:
Binary file: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7fc41d707420)
Offset info: +0x14420
BFD: DWARF error: could not find variable specification at offset 2052
BFD: DWARF error: could not find variable specification at offset 20c8
BFD: DWARF error: could not find variable specification at offset 3cf9
BFD: DWARF error: could not find variable specification at offset 3d47
BFD: DWARF error: could not find variable specification at offset 3d52
BFD: DWARF error: could not find variable specification at offset 4319
BFD: DWARF error: could not find variable specification at offset 435b
BFD: DWARF error: could not find variable specification at offset 439d
Frame 4:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4bc0e3)
Offset info: MAIN+0x24f3
Source file: demod_binary.c (Function: MAIN / Line: 1275)
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 3:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b8b1e)
Offset info: main+0xe3e
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 518)
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 622)
Frame 2:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7fc41d1f4083)
Offset info: __libc_start_main+0xf3
Frame 1:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b9161)
Source file: unknown (Function: _start / Line: 0)
------> End of backtrace
04:25:07 (24629): called boinc_finish(11)
</stderr_txt>
The same on NVIDIA; nearly
)
The same on NVIDIA; nearly all BRP7-WUs fail.
Stderr output
[19:52:52][3671087][ERROR] Application caught signal 11.
------> Obtained 7 stack frames for this thread.
------> Backtrace:
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 7:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-nvidia (0x4b929f)
Source file: erp_boinc_wrapper.cpp (Function: ä5 / Line: 159)
Frame 6:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7fa374202520)
Offset info: +0x42520
BFD: DWARF error: could not find variable specification at offset 2052
BFD: DWARF error: could not find variable specification at offset 20c8
BFD: DWARF error: could not find variable specification at offset 3cf9
BFD: DWARF error: could not find variable specification at offset 3d47
BFD: DWARF error: could not find variable specification at offset 3d52
BFD: DWARF error: could not find variable specification at offset 4319
BFD: DWARF error: could not find variable specification at offset 435b
BFD: DWARF error: could not find variable specification at offset 439d
Frame 5:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-nvidia (0x4bc0e3)
Offset info: MAIN+0x24f3
Source file: demod_binary.c (Function: MAIN / Line: 1275)
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 4:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-nvidia (0x4b8b1e)
Offset info: main+0xe3e
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 518)
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 622)
Frame 3:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7fa3741e9d90)
Offset info: +0x29d90
Frame 2:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7fa3741e9e40)
Offset info: __libc_start_main+0x80
Frame 1:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-nvidia (0x4b9161)
Source file: unknown (Function: _start / Line: 0)
------> End of backtrace
19:52:52 (3671087): called boinc_finish(11)
</stderr_txt>
]]>
could be a problem with the
)
could be a problem with the M22 data files. there have been a lot of reports of errors with that one.
_________________________________________________________________________