A new Einstein@home CUDA App for Linux is available for Beta Test at Beta Test Page.
We stumbled over some bugs in the CUDA part that might have caused some segfaults, so this is mainly a bugfix release. Also too the CPU part of the App now uses SSE, like in the .09 Beta Apps.
Please test and report, and please include important information (like the NVIdia Driver and Core Client version) in your posts.
BM
BM
Copyright © 2024 Einstein@Home. All rights reserved.
CUDA App einsteinbinary 1.10 for Linux available for Beta Test
)
Tried
- Fedora 11 x86_64
- 'rpm -qa kmod-nvidia' returns kmod-nvidia-185.18.14-1.fc11.3.x86_64
Crashes (see below). Shows 100% but does not stop. Had to abort workunit.
Also:
- Seti@HOME Beta (CUDA) crashes similarly but workunit finishes with failure
- GPUGRID - works OK
[22:53:40][14614][INFO ] Application startup - thank you for supporting Einstein@Home!
[22:53:40][14614][INFO ] Starting data processing...
[22:53:40][14614][INFO ] Using CUDA device #0 "GeForce 9800 GT" (508.03 GFLOPS)
[22:53:40][14614][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[22:53:40][14614][INFO ] Header contents:
------> Original WAPP file: p2030_54161_48913_0050_G54.71-02.47.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 54161.566122685188
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 194205.603964
------> DEC (J2000): 180900.63493
------> Galactic l: 54.8068
------> Galactic b: -2.4852
------> Name: G54.71-02.47.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 440.3328
------> ZA at start: 1.3934
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: Vilma,Kevin
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 25.2 cm^-3 pc
------> Scale factor: 6953.53
[22:53:41][14614][INFO ] Seed for random number generator is 977043268.
[22:53:43][14614][ERROR] Application caught signal 11.
------> Obtained 17 stack frames for this thread.
------> Backtrace:
Frame 17:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af632)
Offset info: pthread_mutex_lock+0x5e6
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 16:
Binary file: /usr/lib/nvidia/libcuda.so (0x10c86ab)
Frame 15:
Binary file: /usr/lib/nvidia/libcuda.so (0x10c86ab)
Frame 14:
Binary file: /usr/lib/nvidia/libcuda.so (0x10cf4d1)
Frame 13:
Binary file: /usr/lib/nvidia/libcuda.so (0x1093ad0)
Frame 12:
Binary file: /usr/lib/nvidia/libcuda.so (0xdd595b)
Frame 11:
Binary file: /usr/lib/nvidia/libcuda.so (0xde8164)
Frame 10:
Binary file: /usr/lib/nvidia/libcuda.so (0xdcde03)
Frame 9:
Binary file: /usr/lib/nvidia/libcuda.so (0xdc7df2)
Offset info: cuCtxCreate+0xa2
Frame 8:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x5128c0)
Frame 7:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x5132c1)
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x4f333b)
Offset info: cudaMallocHost+0x2b
Frame 5:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b2862)
Offset info: MAIN+0x1ce2
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 4:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b03cd)
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 3:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b075f)
Offset info: main+0x17f
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 2:
Binary file: /lib/libc.so.6 (0x2bca66)
Offset info: __libc_start_main+0xe6
------> End of backtrace
called boinc_finish
Frame 1:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af531)
Offset info: pthread_key_create+0x35
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
[22:54:18][14706][INFO ] Application startup - thank you for supporting Einstein@Home!
[22:54:18][14706][INFO ] Starting data processing...
[22:54:18][14706][INFO ] Using CUDA device #0 "GeForce 9800 GT" (508.03 GFLOPS)
[22:54:18][14706][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[22:54:18][14706][INFO ] Header contents:
------> Original WAPP file: p2030_54161_48913_0050_G54.71-02.47.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 54161.566122685188
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 194205.603964
------> DEC (J2000): 180900.63493
------> Galactic l: 54.8068
------> Galactic b: -2.4852
------> Name: G54.71-02.47.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 440.3328
------> ZA at start: 1.3934
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: Vilma,Kevin
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 25.2 cm^-3 pc
------> Scale factor: 6953.53
[22:54:19][14706][INFO ] Seed for random number generator is 977043268.
[22:54:20][14706][ERROR] Application caught signal 11.
------> Obtained 17 stack frames for this thread.
------> Backtrace:
Frame 17:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af632)
Offset info: pthread_mutex_lock+0x5e6
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 16:
Binary file: /usr/lib/nvidia/libcuda.so (0x10e16ab)
Frame 15:
Binary file: /usr/lib/nvidia/libcuda.so (0x10e16ab)
Frame 14:
Binary file: /usr/lib/nvidia/libcuda.so (0x10e84d1)
Frame 13:
Binary file: /usr/lib/nvidia/libcuda.so (0x10acad0)
Frame 12:
Binary file: /usr/lib/nvidia/libcuda.so (0xdee95b)
Frame 11:
Binary file: /usr/lib/nvidia/libcuda.so (0xe01164)
Frame 10:
Binary file: /usr/lib/nvidia/libcuda.so (0xde6e03)
Frame 9:
Binary file: /usr/lib/nvidia/libcuda.so (0xde0df2)
Offset info: cuCtxCreate+0xa2
Frame 8:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x25e8c0)
Frame 7:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x25f2c1)
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x23f33b)
Offset info: cudaMallocHost+0x2b
Frame 5:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b2862)
Offset info: MAIN+0x1ce2
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 4:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b03cd)
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 3:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b075f)
Offset info: main+0x17f
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 2:
Binary file: /lib/libc.so.6 (0x2bca66)
Offset info: __libc_start_main+0xe6
------> End of backtrace
called boinc_finish
Frame 1:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af531)
Offset info: pthread_key_create+0x35
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
WU starts running bot jumps
)
WU starts running bot jumps straight to 100%, according to stderrout.txt there is a "File format not recognized"
[14:53:33][5101][INFO ] Application startup - thank you for supporting Einstein@Home!
[14:53:33][5101][INFO ] Starting data processing...
[14:53:33][5101][INFO ] Using CUDA device #0 "GeForce GTX 275" (1010.88 GFLOPS)
[14:53:33][5101][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[14:53:33][5101][INFO ] Header contents:
------> Original WAPP file: p2030_54162_45910_0042_G41.76+01.37.C_0.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 54162.531365740739
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 190240.781394
------> DEC (J2000): 82856.412406
------> Galactic l: 41.758
------> Galactic b: 1.3809
------> Name: G41.76+01.37.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 357.8171
------> ZA at start: 9.8367
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: Kevin
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 70.2 cm^-3 pc
------> Scale factor: 6877.66
[14:53:35][5101][INFO ] Seed for random number generator is -1164413432.
[14:53:37][5101][ERROR] Application caught signal 11.
------> Obtained 18 stack frames for this thread.
------> Backtrace:
Frame 18:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af632)
Offset info: pthread_mutex_lock+0x5e6
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 17:
Binary file: /usr/lib32/libcuda.so (0xf780f99b)
Frame 16:
Binary file: /usr/lib32/libcuda.so (0xf780f99b)
Frame 15:
Binary file: /usr/lib32/libcuda.so (0xf7816791)
Frame 14:
Binary file: /usr/lib32/libcuda.so (0xf77e07ae)
Frame 13:
Binary file: /usr/lib32/libcuda.so (0xf77841f3)
Frame 12:
Binary file: /usr/lib32/libcuda.so (0xf7798794)
Frame 11:
Binary file: /usr/lib32/libcuda.so (0xf777a675)
Frame 10:
Binary file: /usr/lib32/libcuda.so (0xf7773992)
Frame 9:
Binary file: /usr/lib32/libcuda.so (0xf77d65bf)
Offset info: cuCtxCreate+0x4f
Frame 8:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0xf7e3c8c0)
Frame 7:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0xf7e3d2c1)
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0xf7e1d33b)
Offset info: cudaMallocHost+0x2b
Frame 5:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b2862)
Offset info: MAIN+0x1ce2
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 4:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b03cd)
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 3:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b075f)
Offset info: main+0x17f
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 2:
Binary file: /lib32/libc.so.6 (0xf7b71775)
Offset info: __libc_start_main+0xe5
------> End of backtrace
called boinc_finish
Frame 1:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af531)
Offset info: pthread_key_create+0x35
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
The message about file format
)
The message about file format not recognized is only after the real problem:
[14:53:37][5101][ERROR] Application caught signal 11.
So application crashes and the message appears when outputting stack backtrace. For comparisson: do You have 32 or 64 bit Linux? I have 64-bit and I'm getting similar crashes as seen in earlier message in this thread.
Andris
I'm running 64bit Ubuntu
)
I'm running 64bit Ubuntu
It works without problems on
)
It works without problems on my notebook (Intel Core2 Duo P7350 +
GeForce 9650M GT, driver 180.22-2).
It is about 20-30% faster for one job than with the CPU only.
Sometimes it runs three jobs at once, two S5R5 and one ABP1 on CPU+GPU.
Note, that there are only two CPUs and one GPU. In such a case one of
the CPUs seems to run one S5R5 and the ABP1 together with the GPU at the
same time - is this intended?
Unfortunately, I get no work
)
Unfortunately, I get no work for the new application. BOINC reports:
[pre]Message from Server: (Project has no jobs available)[/pre]
What seems strange is, while I get the message "Found app_info.xml; using anonymous platform" as described in the installation instructions, several lines further down I get "Can't load libcudart".
I am sure that the CUDA libs are in the search path: After re-running ldconfig, everything is visible to the binary:
I have installed the CUDA application as described on the website, but moved the libs manually to /usr/local/lib/cuda/. The system is a current Arch Linux with NVidia drivers version 185.18.31-1, BOINC version 6.4.5 and the CUDA libs that are delivered with the Einstein Beta download. Hardware: Pentium M 2,1 GHz (i686, not a 64 bit architecture) and a simple GeForce 8400 GS.
Any ideas what's going on here?
RE: Unfortunately, I get no
)
I recently get that at every (successful) download:
So, you have to read the log carefully ;-)
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
Gundolf, I can confirm that
)
Gundolf, I can confirm that there is a workunit downloaded immediately before the "no jobs available" message. However, I think that that is a WU for the "old", non-CUDA application because after the post-install BOINC restart, I now have one WU in progress on the "classical" einstein_S5R5 1.06 application. Are you sure that you run the CUDA-enabled one?
RE: Gundolf, I can confirm
)
I only wanted to say that you can't trust the "no jobs available" message.
I'm sure that I don't run CUDA, since I don't have such a device :-)
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: I only wanted to say
)
Yeah ..... I see that frequently and yet the machine(s) is/are certainly not idle. I've been ignoring it since all is otherwise running fine.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal