CUDA App einsteinbinary 1.10 for Linux available for Beta Test

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 251794693
RAC: 36179
Topic 194500

A new Einstein@home CUDA App for Linux is available for Beta Test at Beta Test Page.

We stumbled over some bugs in the CUDA part that might have caused some segfaults, so this is mainly a bugfix release. Also too the CPU part of the App now uses SSE, like in the .09 Beta Apps.

Please test and report, and please include important information (like the NVIdia Driver and Core Client version) in your posts.

BM

BM

Andris Pavenis
Andris Pavenis
Joined: 24 Feb 05
Posts: 3
Credit: 38926952
RAC: 0

CUDA App einsteinbinary 1.10 for Linux available for Beta Test

Tried
- Fedora 11 x86_64
- 'rpm -qa kmod-nvidia' returns kmod-nvidia-185.18.14-1.fc11.3.x86_64

Crashes (see below). Shows 100% but does not stop. Had to abort workunit.

Also:
- Seti@HOME Beta (CUDA) crashes similarly but workunit finishes with failure
- GPUGRID - works OK

[22:53:40][14614][INFO ] Application startup - thank you for supporting Einstein@Home!
[22:53:40][14614][INFO ] Starting data processing...
[22:53:40][14614][INFO ] Using CUDA device #0 "GeForce 9800 GT" (508.03 GFLOPS)
[22:53:40][14614][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[22:53:40][14614][INFO ] Header contents:
------> Original WAPP file: p2030_54161_48913_0050_G54.71-02.47.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 54161.566122685188
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 194205.603964
------> DEC (J2000): 180900.63493
------> Galactic l: 54.8068
------> Galactic b: -2.4852
------> Name: G54.71-02.47.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 440.3328
------> ZA at start: 1.3934
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: Vilma,Kevin
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 25.2 cm^-3 pc
------> Scale factor: 6953.53
[22:53:41][14614][INFO ] Seed for random number generator is 977043268.

[22:53:43][14614][ERROR] Application caught signal 11.

------> Obtained 17 stack frames for this thread.
------> Backtrace:
Frame 17:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af632)
Offset info: pthread_mutex_lock+0x5e6
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 16:
Binary file: /usr/lib/nvidia/libcuda.so (0x10c86ab)
Frame 15:
Binary file: /usr/lib/nvidia/libcuda.so (0x10c86ab)
Frame 14:
Binary file: /usr/lib/nvidia/libcuda.so (0x10cf4d1)
Frame 13:
Binary file: /usr/lib/nvidia/libcuda.so (0x1093ad0)
Frame 12:
Binary file: /usr/lib/nvidia/libcuda.so (0xdd595b)
Frame 11:
Binary file: /usr/lib/nvidia/libcuda.so (0xde8164)
Frame 10:
Binary file: /usr/lib/nvidia/libcuda.so (0xdcde03)
Frame 9:
Binary file: /usr/lib/nvidia/libcuda.so (0xdc7df2)
Offset info: cuCtxCreate+0xa2
Frame 8:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x5128c0)
Frame 7:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x5132c1)
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x4f333b)
Offset info: cudaMallocHost+0x2b
Frame 5:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b2862)
Offset info: MAIN+0x1ce2
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 4:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b03cd)
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 3:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b075f)
Offset info: main+0x17f
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 2:
Binary file: /lib/libc.so.6 (0x2bca66)
Offset info: __libc_start_main+0xe6
------> End of backtrace

called boinc_finish
Frame 1:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af531)
Offset info: pthread_key_create+0x35
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
[22:54:18][14706][INFO ] Application startup - thank you for supporting Einstein@Home!
[22:54:18][14706][INFO ] Starting data processing...
[22:54:18][14706][INFO ] Using CUDA device #0 "GeForce 9800 GT" (508.03 GFLOPS)
[22:54:18][14706][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[22:54:18][14706][INFO ] Header contents:
------> Original WAPP file: p2030_54161_48913_0050_G54.71-02.47.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 54161.566122685188
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 194205.603964
------> DEC (J2000): 180900.63493
------> Galactic l: 54.8068
------> Galactic b: -2.4852
------> Name: G54.71-02.47.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 440.3328
------> ZA at start: 1.3934
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: Vilma,Kevin
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 25.2 cm^-3 pc
------> Scale factor: 6953.53
[22:54:19][14706][INFO ] Seed for random number generator is 977043268.

[22:54:20][14706][ERROR] Application caught signal 11.

------> Obtained 17 stack frames for this thread.
------> Backtrace:
Frame 17:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af632)
Offset info: pthread_mutex_lock+0x5e6
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 16:
Binary file: /usr/lib/nvidia/libcuda.so (0x10e16ab)
Frame 15:
Binary file: /usr/lib/nvidia/libcuda.so (0x10e16ab)
Frame 14:
Binary file: /usr/lib/nvidia/libcuda.so (0x10e84d1)
Frame 13:
Binary file: /usr/lib/nvidia/libcuda.so (0x10acad0)
Frame 12:
Binary file: /usr/lib/nvidia/libcuda.so (0xdee95b)
Frame 11:
Binary file: /usr/lib/nvidia/libcuda.so (0xe01164)
Frame 10:
Binary file: /usr/lib/nvidia/libcuda.so (0xde6e03)
Frame 9:
Binary file: /usr/lib/nvidia/libcuda.so (0xde0df2)
Offset info: cuCtxCreate+0xa2
Frame 8:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x25e8c0)
Frame 7:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x25f2c1)
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0x23f33b)
Offset info: cudaMallocHost+0x2b
Frame 5:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b2862)
Offset info: MAIN+0x1ce2
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 4:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b03cd)
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 3:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b075f)
Offset info: main+0x17f
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 2:
Binary file: /lib/libc.so.6 (0x2bca66)
Offset info: __libc_start_main+0xe6
------> End of backtrace

called boinc_finish
Frame 1:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af531)
Offset info: pthread_key_create+0x35
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized

[PST]Howard
[PST]Howard
Joined: 18 Jan 05
Posts: 2
Credit: 426400
RAC: 0

WU starts running bot jumps

WU starts running bot jumps straight to 100%, according to stderrout.txt there is a "File format not recognized"

[14:53:33][5101][INFO ] Application startup - thank you for supporting Einstein@Home!
[14:53:33][5101][INFO ] Starting data processing...
[14:53:33][5101][INFO ] Using CUDA device #0 "GeForce GTX 275" (1010.88 GFLOPS)
[14:53:33][5101][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[14:53:33][5101][INFO ] Header contents:
------> Original WAPP file: p2030_54162_45910_0042_G41.76+01.37.C_0.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 54162.531365740739
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 190240.781394
------> DEC (J2000): 82856.412406
------> Galactic l: 41.758
------> Galactic b: 1.3809
------> Name: G41.76+01.37.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 357.8171
------> ZA at start: 9.8367
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: Kevin
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 70.2 cm^-3 pc
------> Scale factor: 6877.66
[14:53:35][5101][INFO ] Seed for random number generator is -1164413432.

[14:53:37][5101][ERROR] Application caught signal 11.

------> Obtained 18 stack frames for this thread.
------> Backtrace:
Frame 18:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af632)
Offset info: pthread_mutex_lock+0x5e6
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 17:
Binary file: /usr/lib32/libcuda.so (0xf780f99b)
Frame 16:
Binary file: /usr/lib32/libcuda.so (0xf780f99b)
Frame 15:
Binary file: /usr/lib32/libcuda.so (0xf7816791)
Frame 14:
Binary file: /usr/lib32/libcuda.so (0xf77e07ae)
Frame 13:
Binary file: /usr/lib32/libcuda.so (0xf77841f3)
Frame 12:
Binary file: /usr/lib32/libcuda.so (0xf7798794)
Frame 11:
Binary file: /usr/lib32/libcuda.so (0xf777a675)
Frame 10:
Binary file: /usr/lib32/libcuda.so (0xf7773992)
Frame 9:
Binary file: /usr/lib32/libcuda.so (0xf77d65bf)
Offset info: cuCtxCreate+0x4f
Frame 8:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0xf7e3c8c0)
Frame 7:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0xf7e3d2c1)
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0xf7e1d33b)
Offset info: cudaMallocHost+0x2b
Frame 5:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b2862)
Offset info: MAIN+0x1ce2
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 4:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b03cd)
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 3:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80b075f)
Offset info: main+0x17f
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized
Frame 2:
Binary file: /lib32/libc.so.6 (0xf7b71775)
Offset info: __libc_start_main+0xe5
------> End of backtrace

called boinc_finish
Frame 1:
Binary file: einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda (0x80af531)
Offset info: pthread_key_create+0x35
einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda: File format not recognized

Andris Pavenis
Andris Pavenis
Joined: 24 Feb 05
Posts: 3
Credit: 38926952
RAC: 0

The message about file format

The message about file format not recognized is only after the real problem:

[14:53:37][5101][ERROR] Application caught signal 11.

So application crashes and the message appears when outputting stack backtrace. For comparisson: do You have 32 or 64 bit Linux? I have 64-bit and I'm getting similar crashes as seen in earlier message in this thread.

Andris

[PST]Howard
[PST]Howard
Joined: 18 Jan 05
Posts: 2
Credit: 426400
RAC: 0

I'm running 64bit Ubuntu

I'm running 64bit Ubuntu

Olaf
Olaf
Joined: 16 Sep 06
Posts: 26
Credit: 190763630
RAC: 0

It works without problems on

It works without problems on my notebook (Intel Core2 Duo P7350 +
GeForce 9650M GT, driver 180.22-2).
It is about 20-30% faster for one job than with the CPU only.
Sometimes it runs three jobs at once, two S5R5 and one ABP1 on CPU+GPU.
Note, that there are only two CPUs and one GPU. In such a case one of
the CPUs seems to run one S5R5 and the ABP1 together with the GPU at the
same time - is this intended?

jstarek
jstarek
Joined: 21 Jan 08
Posts: 3
Credit: 2646415
RAC: 0

Unfortunately, I get no work

Unfortunately, I get no work for the new application. BOINC reports:

[pre]Message from Server: (Project has no jobs available)[/pre]

What seems strange is, while I get the message "Found app_info.xml; using anonymous platform" as described in the installation instructions, several lines further down I get "Can't load libcudart".

I am sure that the CUDA libs are in the search path: After re-running ldconfig, everything is visible to the binary:

bash-4.0# ldd einsteinbinary_ABP1_1.10_i686-pc-linux-gnu_cuda
        linux-gate.so.1 =>  (0xb7fc5000)
        libcufft.so.2 => /usr/local/lib/cuda/libcufft.so.2 (0xb7e89000)
        libcudart.so.2 => /usr/local/lib/cuda/libcudart.so.2 (0xb7e3e000)
        libpthread.so.0 => /lib/libpthread.so.0 (0xb7e25000)
        libm.so.6 => /lib/libm.so.6 (0xb7dff000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7d0d000)
        libc.so.6 => /lib/libc.so.6 (0xb7bc7000)
        /lib/ld-linux.so.2 (0xb7fc6000)
        libdl.so.2 => /lib/libdl.so.2 (0xb7bc2000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0xb7ba4000)
        librt.so.1 => /lib/librt.so.1 (0xb7b9b000)

I have installed the CUDA application as described on the website, but moved the libs manually to /usr/local/lib/cuda/. The system is a current Arch Linux with NVidia drivers version 185.18.31-1, BOINC version 6.4.5 and the CUDA libs that are delivered with the Einstein Beta download. Hardware: Pentium M 2,1 GHz (i686, not a 64 bit architecture) and a simple GeForce 8400 GS.

Any ideas what's going on here?

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: Unfortunately, I get no

Message 94510 in response to message 94509

Quote:

Unfortunately, I get no work for the new application. BOINC reports:

[pre]Message from Server: (Project has no jobs available)[/pre]


I recently get that at every (successful) download:

27/08/2009 13:09:06|Einstein@Home|Scheduler request succeeded: got 1 new tasks
27/08/2009 13:09:06|Einstein@Home|[sched_ops_debug] Server version 607
27/08/2009 13:09:06|Einstein@Home|Message from server: (Project has no jobs available)


So, you have to read the log carefully ;-)

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

jstarek
jstarek
Joined: 21 Jan 08
Posts: 3
Credit: 2646415
RAC: 0

Gundolf, I can confirm that

Gundolf, I can confirm that there is a workunit downloaded immediately before the "no jobs available" message. However, I think that that is a WU for the "old", non-CUDA application because after the post-install BOINC restart, I now have one WU in progress on the "classical" einstein_S5R5 1.06 application. Are you sure that you run the CUDA-enabled one?

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: Gundolf, I can confirm

Message 94512 in response to message 94511

Quote:
Gundolf, I can confirm that there is a workunit downloaded immediately before the "no jobs available" message. However, I think that that is a WU for the "old", non-CUDA application because after the post-install BOINC restart, I now have one WU in progress on the "classical" einstein_S5R5 1.06 application. Are you sure that you run the CUDA-enabled one?


I only wanted to say that you can't trust the "no jobs available" message.

I'm sure that I don't run CUDA, since I don't have such a device :-)

Computer sind nicht alles im Leben. (Kleiner Scherz)

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6591
Credit: 325370405
RAC: 183114

RE: I only wanted to say

Message 94513 in response to message 94512

Quote:
I only wanted to say that you can't trust the "no jobs available" message.


Yeah ..... I see that frequently and yet the machine(s) is/are certainly not idle. I've been ignoring it since all is otherwise running fine.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.