Lots of BRP7 errors

magic_sam
magic_sam
Joined: 30 Dec 21
Posts: 23
Credit: 556699183
RAC: 97552
Topic 229254

Dear all,

I just noticed that all my BRP7 tasks are failing with the following error (AMD Radeon RX 7900 XTX on Ubuntu 22.04):

<core_client_version>7.20.5</core_client_version>
<![CDATA[
<message>
process exited with code 11 (0xb, -245)</message>
<stderr_txt>
[10:56:45][156661][INFO ] Application startup - thank you for supporting Einstein@Home!
[10:56:45][156661][INFO ] Starting data processing...
[10:56:45][156661][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[10:56:45][156661][INFO ] Using OpenCL device "gfx1100" by: Advanced Micro Devices, Inc.
[10:56:50][156661][INFO ] Number of generated templates to be used: 50000
[10:56:50][156661][INFO ] Checkpoint file unavailable: M22_1_dns_cfbf00004_segment_4_dms_100_0.cpt (No such file or directory).
------> Starting from scratch...
[10:56:50][156661][INFO ] Header contents:
------> Original WAPP file: /atlas/data/TRAPUM_GC/M22/epoch1/M22/2021-02-14-06:43:10/1284/30min_segments/dedispersed_files/cfbf00004/M22_1_dns_cfbf00004_segment_4_dms_100_DM85.00
------> Sample time in microseconds: 153.121
------> Observation time in seconds: 2568.9524
------> Time stamp (MJD): 59259.339546529867
------> Number of samples/record: 0
------> Center freq in MHz: 857.5673828
------> Channel band in MHz: 3.34375
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 183622.66
------> DEC (J2000): -235355.700001
------> Galactic l: 0
------> Galactic b: 0
------> Name: M22
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 16777216
------> Trial dispersion measure: 85 cm^-3 pc
------> Scale factor: 0.964776
[10:56:50][156661][INFO ] Seed for random number generator is 1122104711.

[10:56:52][156661][ERROR] Application caught signal 11.

------> Obtained 7 stack frames for this thread.
------> Backtrace:
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 7:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b929f)
Source file: erp_boinc_wrapper.cpp (Function: &#137;% / Line: 159)
Frame 6:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7f5bb2bc0520)
Offset info: +0x42520
BFD: DWARF error: could not find variable specification at offset 2052
BFD: DWARF error: could not find variable specification at offset 20c8
BFD: DWARF error: could not find variable specification at offset 3cf9
BFD: DWARF error: could not find variable specification at offset 3d47
BFD: DWARF error: could not find variable specification at offset 3d52
BFD: DWARF error: could not find variable specification at offset 4319
BFD: DWARF error: could not find variable specification at offset 435b
BFD: DWARF error: could not find variable specification at offset 439d
Frame 5:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4bc0e3)
Offset info: MAIN+0x24f3
Source file: demod_binary.c (Function: MAIN / Line: 1275)
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 4:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b8b1e)
Offset info: main+0xe3e
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 518)
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 622)
Frame 3:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7f5bb2ba7d90)
Offset info: +0x29d90
Frame 2:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7f5bb2ba7e40)
Offset info: __libc_start_main+0x80
Frame 1:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b9161)
Source file: unknown (Function: _start / Line: 0)
------> End of backtrace

10:56:52 (156661): called boinc_finish(11)

</stderr_txt>
]]>


See https://einsteinathome.org/fr/task/1442986876 for more details.

What am I doing wrong ?

Best regards,

Samuel

Mike
Mike
Joined: 26 Dec 20
Posts: 45
Credit: 5747425139
RAC: 7716370

Yes sir indeed.  These 'M22'

Yes sir indeed.  These 'M22' w/u's.  All failing over here on amdgpu 570's , 6800XT's, etc.

------------------------

Stderr output

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 11 (0xb, -245)</message>
<stderr_txt>
[05:20:28][21466][INFO ] Application startup - thank you for supporting Einstein@Home!
[05:20:28][21466][INFO ] Starting data processing...
[05:20:28][21466][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[05:20:28][21466][INFO ] Using OpenCL device "Ellesmere" by: Advanced Micro Devices, Inc.
[05:20:34][21466][INFO ] Number of generated templates to be used: 50000
[05:20:34][21466][INFO ] Checkpoint file unavailable: M22_1_dns_cfbf00004_segment_4_dms_100_97.cpt (No such file or directory).
------> Starting from scratch...
[05:20:34][21466][INFO ] Header contents:
------> Original WAPP file: /atlas/data/TRAPUM_GC/M22/epoch1/M22/2021-02-14-06:43:10/1284/30min_segments/dedispersed_files/cfbf00004/M22_1_dns_cfbf00004_segment_4_dms_100_DM94.70
------> Sample time in microseconds: 153.121
------> Observation time in seconds: 2568.9524
------> Time stamp (MJD): 59259.339546370575
------> Number of samples/record: 0
------> Center freq in MHz: 857.5673828
------> Channel band in MHz: 3.34375
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 183622.66
------> DEC (J2000): -235355.700001
------> Galactic l: 0
------> Galactic b: 0
------> Name: M22
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 16777216
------> Trial dispersion measure: 94.7 cm^-3 pc
------> Scale factor: 0.965192
[05:20:34][21466][INFO ] Seed for random number generator is 1124308096.

[05:20:37][21466][ERROR] Application caught signal 11.

------> Obtained 6 stack frames for this thread.
------> Backtrace:
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b929f)
Source file: erp_boinc_wrapper.cpp (Function: / Line: 159)
Frame 5:
Binary file: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f068cb93420)
Offset info: +0x14420
BFD: DWARF error: could not find variable specification at offset 2052
BFD: DWARF error: could not find variable specification at offset 20c8
BFD: DWARF error: could not find variable specification at offset 3cf9
BFD: DWARF error: could not find variable specification at offset 3d47
BFD: DWARF error: could not find variable specification at offset 3d52
BFD: DWARF error: could not find variable specification at offset 4319
BFD: DWARF error: could not find variable specification at offset 435b
BFD: DWARF error: could not find variable specification at offset 439d
Frame 4:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4bc0e3)
Offset info: MAIN+0x24f3
Source file: demod_binary.c (Function: MAIN / Line: 1275)
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 3:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b8b1e)
Offset info: main+0xe3e
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 518)
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 622)
Frame 2:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7f068c680083)
Offset info: __libc_start_main+0xf3
Frame 1:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b9161)
Source file: unknown (Function: _start / Line: 0)
------> End of backtrace

05:20:37 (21466): called boinc_finish(11)

</stderr_txt>
]]>



Mike
Mike
Joined: 26 Dec 20
Posts: 45
Credit: 5747425139
RAC: 7716370

The weather has been warming

The weather has been warming out here.  But not THAT much.  lol ....

The FGRPB1G's still run fine, as well as the earlier BRP7 tasks that have the

Te5_4 prefix.  It's the M22's (BRP7) that all fail.  No exceptions so far.   So I'm not looking

for any hardware issue at this point.  All appears well on that front.

cuphi
cuphi
Joined: 20 Jan 22
Posts: 7
Credit: 38051759
RAC: 158214

I am getting the same with a

I am getting the same with a Quadro RTX A4500 on Opensuse Linux.

cuphi
cuphi
Joined: 20 Jan 22
Posts: 7
Credit: 38051759
RAC: 158214

Now I have 24 back-off timer

Now I have 24 back-off timer from Einstein@home, I guess because of too many failed tasks.

 

EDIT:  I am also seeing this on my second Opensuse box that runs an Nvidia GTX 970

Eugene Stemple
Eugene Stemple
Joined: 9 Feb 11
Posts: 67
Credit: 364170567
RAC: 542655

Same here...  all M22

Same here...  all M22 (BRP7-opencl) end with computation error on GTX1060.  I see some wingpersons running workunits to completion on Windows cuda55 but a few that I checked were failing to validate.   ?? I think the admins are on it ?? - some workunits (N+1)th copy of the task ID show "unsent".  And server status shows wugcontrol BRP7 -not running-.

 

alex
alex
Joined: 8 Apr 21
Posts: 6
Credit: 2238146719
RAC: 4908461

I've got the same problem for

I've got the same problem for rx 570, rx 6600 and rx 5700xt. It's started yesterday it seems. I hope einstein guys are looking into that!

Mike
Mike
Joined: 26 Dec 20
Posts: 45
Credit: 5747425139
RAC: 7716370

MeerKat M22 fails.  Now down

MeerKat M22 fails.  Now down to about half the time depending on the machine.

Some machines fail most all or all, a few seem to run ok.  None are overclocked over

stock defaults.  I can adjust fan on amdgpu but not clocks so they run at default.

 

--------------

Task 1444707069

Name: M22_1_dns_cfbf00004_segment_6_dms_100_20000_81_800000_1

Workunit ID: 717541260

Created: 22 Mar 2023 5:15:58 UTC

Sent: 22 Mar 2023 11:38:47 UTC

Report deadline: 5 Apr 2023 11:38:47 UTC

Received: 22 Mar 2023 14:01:01 UTC

Server state: Over

Outcome: Computation error

Client state: Compute error

Exit status: 11 (0x0000000B) Unknown error code

Computer: 12864011

Run time (sec): 8.22

CPU time (sec): 6.09

Peak working set size (MB): 120.9

Peak swap size (MB): 69099.75

Peak disk usage (MB): 0.02

Validation state: Invalid

Granted credit: 0

Application: Binary Radio Pulsar Search (MeerKAT) v0.13 (BRP7-opencl-ati)
x86_64-pc-linux-gnu


Stderr output

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 11 (0xb, -245)</message>
<stderr_txt>
[06:52:06][31266][INFO ] Application startup - thank you for supporting Einstein@Home!
[06:52:06][31266][INFO ] Starting data processing...
[06:52:06][31266][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[06:52:06][31266][INFO ] Using OpenCL device "Ellesmere" by: Advanced Micro Devices, Inc.
[06:52:10][31266][INFO ] Number of generated templates to be used: 50000
[06:52:10][31266][INFO ] Checkpoint file unavailable: M22_1_dns_cfbf00004_segment_6_dms_100_81.cpt (No such file or directory).
------> Starting from scratch...
[06:52:10][31266][INFO ] Header contents:
------> Original WAPP file: /atlas/data/TRAPUM_GC/M22/epoch1/M22/2021-02-14-06:43:10/1284/30min_segments/dedispersed_files/cfbf00004/M22_1_dns_cfbf00004_segment_6_dms_100_DM93.10
------> Sample time in microseconds: 153.121
------> Observation time in seconds: 2568.9524
------> Time stamp (MJD): 59259.380982498042
------> Number of samples/record: 0
------> Center freq in MHz: 857.5673828
------> Channel band in MHz: 3.34375
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 183622.66
------> DEC (J2000): -235355.700001
------> Galactic l: 0
------> Galactic b: 0
------> Name: M22
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 16777216
------> Trial dispersion measure: 93.1 cm^-3 pc
------> Scale factor: 1.39522
[06:52:10][31266][INFO ] Seed for random number generator is 1118475585.

[06:52:12][31266][ERROR] Application caught signal 11.

------> Obtained 6 stack frames for this thread.
------> Backtrace:
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b929f)
Source file: erp_boinc_wrapper.cpp (Function: / Line: 159)
Frame 5:
Binary file: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f2db049c420)
Offset info: +0x14420
BFD: DWARF error: could not find variable specification at offset 2052
BFD: DWARF error: could not find variable specification at offset 20c8
BFD: DWARF error: could not find variable specification at offset 3cf9
BFD: DWARF error: could not find variable specification at offset 3d47
BFD: DWARF error: could not find variable specification at offset 3d52
BFD: DWARF error: could not find variable specification at offset 4319
BFD: DWARF error: could not find variable specification at offset 435b
BFD: DWARF error: could not find variable specification at offset 439d
Frame 4:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4bc0e3)
Offset info: MAIN+0x24f3
Source file: demod_binary.c (Function: MAIN / Line: 1275)
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 3:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b8b1e)
Offset info: main+0xe3e
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 518)
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 622)
Frame 2:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7f2daff89083)
Offset info: __libc_start_main+0xf3
Frame 1:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b9161)
Source file: unknown (Function: _start / Line: 0)
------> End of backtrace

06:52:12 (31266): called boinc_finish(11)

</stderr_txt>
]]>







Mike
Mike
Joined: 26 Dec 20
Posts: 45
Credit: 5747425139
RAC: 7716370

Here's an error log from an

Here's an error log from an amdgpu 6800XT machine.  The one above was from a machine with 2x 570RX gpus.

Look pretty similar to a non-programmer.  Some machines run some ok and some fail.

-----------

 

Stderr output

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
process exited with code 11 (0xb, -245)</message>
<stderr_txt>
[04:25:00][24629][INFO ] Application startup - thank you for supporting Einstein@Home!
[04:25:00][24629][INFO ] Starting data processing...
[04:25:01][24629][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[04:25:01][24629][INFO ] Using OpenCL device "gfx1030" by: Advanced Micro Devices, Inc.
[04:25:05][24629][INFO ] Number of generated templates to be used: 50000
[04:25:05][24629][INFO ] Checkpoint file unavailable: M22_1_dns_cfbf00004_segment_6_dms_100_41.cpt (No such file or directory).
------> Starting from scratch...
[04:25:05][24629][INFO ] Header contents:
------> Original WAPP file: /atlas/data/TRAPUM_GC/M22/epoch1/M22/2021-02-14-06:43:10/1284/30min_segments/dedispersed_files/cfbf00004/M22_1_dns_cfbf00004_segment_6_dms_100_DM89.10
------> Sample time in microseconds: 153.121
------> Observation time in seconds: 2568.9524
------> Time stamp (MJD): 59259.380982563729
------> Number of samples/record: 0
------> Center freq in MHz: 857.5673828
------> Channel band in MHz: 3.34375
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 183622.66
------> DEC (J2000): -235355.700001
------> Galactic l: 0
------> Galactic b: 0
------> Name: M22
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 16777216
------> Trial dispersion measure: 89.1 cm^-3 pc
------> Scale factor: 1.39522
[04:25:05][24629][INFO ] Seed for random number generator is 1116690650.

[04:25:07][24629][ERROR] Application caught signal 11.

------> Obtained 6 stack frames for this thread.
------> Backtrace:
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b929f)
Source file: erp_boinc_wrapper.cpp (Function: / Line: 159)
Frame 5:
Binary file: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7fc41d707420)
Offset info: +0x14420
BFD: DWARF error: could not find variable specification at offset 2052
BFD: DWARF error: could not find variable specification at offset 20c8
BFD: DWARF error: could not find variable specification at offset 3cf9
BFD: DWARF error: could not find variable specification at offset 3d47
BFD: DWARF error: could not find variable specification at offset 3d52
BFD: DWARF error: could not find variable specification at offset 4319
BFD: DWARF error: could not find variable specification at offset 435b
BFD: DWARF error: could not find variable specification at offset 439d
Frame 4:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4bc0e3)
Offset info: MAIN+0x24f3
Source file: demod_binary.c (Function: MAIN / Line: 1275)
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 3:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b8b1e)
Offset info: main+0xe3e
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 518)
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 622)
Frame 2:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7fc41d1f4083)
Offset info: __libc_start_main+0xf3
Frame 1:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-ati (0x4b9161)
Source file: unknown (Function: _start / Line: 0)
------> End of backtrace

04:25:07 (24629): called boinc_finish(11)

</stderr_txt>


Ereignishorizont
Ereignishorizont
Joined: 17 May 21
Posts: 19
Credit: 2998772861
RAC: 469296

The same on NVIDIA; nearly

The same on NVIDIA; nearly all BRP7-WUs fail.

 

 

Stderr output

<core_client_version>7.20.5</core_client_version>
<![CDATA[
<message>
process exited with code 11 (0xb, -245)</message>
<stderr_txt>
[19:52:45][3671087][INFO ] Application startup - thank you for supporting Einstein@Home!
[19:52:45][3671087][INFO ] Starting data processing...
[19:52:45][3671087][INFO ] Using OpenCL platform provided by: NVIDIA Corporation
[19:52:45][3671087][INFO ] Using OpenCL device "NVIDIA GeForce RTX 4080" by: NVIDIA Corporation
[19:52:49][3671087][INFO ] Number of generated templates to be used: 50000
[19:52:49][3671087][INFO ] Checkpoint file unavailable: M22_1_dns_cfbf00004_segment_7_dms_100_99.cpt (No such file or directory).
------> Starting from scratch...
[19:52:49][3671087][INFO ] Header contents:
------> Original WAPP file: /atlas/data/TRAPUM_GC/M22/epoch1/M22/2021-02-14-06:43:10/1284/30min_segments/dedispersed_files/cfbf00004/M22_1_dns_cfbf00004_segment_7_dms_100_DM94.90
------> Sample time in microseconds: 153.121
------> Observation time in seconds: 2568.9524
------> Time stamp (MJD): 59259.401700515293
------> Number of samples/record: 0
------> Center freq in MHz: 857.5673828
------> Channel band in MHz: 3.34375
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 183622.66
------> DEC (J2000): -235355.700001
------> Galactic l: 0
------> Galactic b: 0
------> Name: M22
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 16777216
------> Trial dispersion measure: 94.9 cm^-3 pc
------> Scale factor: 2.51855
[19:52:50][3671087][INFO ] Seed for random number generator is 1089562769.

[19:52:52][3671087][ERROR] Application caught signal 11.

------> Obtained 7 stack frames for this thread.
------> Backtrace:
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 7:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-nvidia (0x4b929f)
Source file: erp_boinc_wrapper.cpp (Function: &#228;5 / Line: 159)
Frame 6:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7fa374202520)
Offset info: +0x42520
BFD: DWARF error: could not find variable specification at offset 2052
BFD: DWARF error: could not find variable specification at offset 20c8
BFD: DWARF error: could not find variable specification at offset 3cf9
BFD: DWARF error: could not find variable specification at offset 3d47
BFD: DWARF error: could not find variable specification at offset 3d52
BFD: DWARF error: could not find variable specification at offset 4319
BFD: DWARF error: could not find variable specification at offset 435b
BFD: DWARF error: could not find variable specification at offset 439d
Frame 5:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-nvidia (0x4bc0e3)
Offset info: MAIN+0x24f3
Source file: demod_binary.c (Function: MAIN / Line: 1275)
BFD: DWARF error: could not find variable specification at offset 68c
BFD: DWARF error: could not find variable specification at offset 702
BFD: DWARF error: could not find variable specification at offset 2726
BFD: DWARF error: could not find variable specification at offset 2774
BFD: DWARF error: could not find variable specification at offset 277f
BFD: DWARF error: could not find variable specification at offset 2827
BFD: DWARF error: could not find variable specification at offset 2869
BFD: DWARF error: could not find variable specification at offset 28ab
Frame 4:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-nvidia (0x4b8b1e)
Offset info: main+0xe3e
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 518)
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 622)
Frame 3:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7fa3741e9d90)
Offset info: +0x29d90
Frame 2:
Binary file: /lib/x86_64-linux-gnu/libc.so.6 (0x7fa3741e9e40)
Offset info: __libc_start_main+0x80
Frame 1:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP7_0.13_x86_64-pc-linux-gnu__BRP7-opencl-nvidia (0x4b9161)
Source file: unknown (Function: _start / Line: 0)
------> End of backtrace

19:52:52 (3671087): called boinc_finish(11)

</stderr_txt>
]]>



Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3926
Credit: 45538662642
RAC: 63285272

could be a problem with the

could be a problem with the M22 data files. there have been a lot of reports of errors with that one.

_________________________________________________________________________

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.