First CUDA App for Linux available for Beta Test

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,945,057,561
RAC: 695,594

To clarify: originally, when

To clarify: originally, when two graphics cards were running in SLI, BOINC could only see them as one card, and could only run one CUDA task. But AFAIK, it had no problems running that single task: it's just that users thought they had paid for two cards, and both should be used....

With the release of the new 190.38 drivers - which seem to work well on some cards under Windows, but cause problems on other cards, and I believe are also problematic for BOINC/Linux - NVidia has made it possible for CUDA, and hence BOINC, to see and use the two cards as separate devices for computation, while still keeping SLI active for gaming and graphics rendering.

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 982
Credit: 25,170,813
RAC: 0

Hi

Message 94157 in response to message 94155

Hi Gundolf,

Quote:
Quote:


Ok, I suppose you have an additional on-board NVIDIA device, right? We also observed problems with early CUDA-supporting BOINC clients, such that they had problems indentifying usable devices. If you want you may try to use the latest Linux client (6.6.36) that can be found here. It'd be very interesting to see if it solves the problem.

Cheers,
Oliver


I don't have any CUDA-enabled device in my laptop. I only read about it on the SETI boards.


Sorry if this has been confusing. My reply about the additional on-board device was meant to address mat's problem you has two GTX 295 in his box...

Oliver

Einstein@Home Project

Matvey Piskunov
Matvey Piskunov
Joined: 12 May 09
Posts: 5
Credit: 3,370,507
RAC: 0

RE: RE: Not any longer,

Message 94158 in response to message 94154

Quote:
Quote:

Not any longer, when using the newest drivers (according to some threads on the SETI fora, e.g. Nvidia drivers 190.38 and SLI).

Ok, I suppose you have an additional on-board NVIDIA device, right? We also observed problems with early CUDA-supporting BOINC clients, such that they had problems indentifying usable devices.

Hello,

SLI mode is not active. When the same system is working in GPUGRID project 4 tasks are calculated in parallel.

Can it be a problem, that I don't have additional video card to plug a display? There was no problems in GPUGRID so far. But I'm going to check today in the evening - will replace one of the video cards with my old 7800GT, so it will not be used for calculations.

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 982
Credit: 25,170,813
RAC: 0

Hi mat, RE: Can it

Message 94159 in response to message 94158

Hi mat,

Quote:

Can it be a problem, that I don't have additional video card to plug a display?


I don't think that this poses a problem. To the contrary: it's recommended to run CUDA tasks on devices that don't have a display attached for a number of reasons.

Looking at the two results you mentioned: typically the log output contains information about the device being used by the task - this is not the case in your examples. This, as well as the rest of the log output, indicates that there's a problem with your CUDA driver/toolkit setup (not a single CUDA instruction returns successfully). I think the root cause is that you're running a 32-bit version (application and CUDA libraries) on a x86_64 system.

Oliver

Einstein@Home Project

bzm
bzm
Joined: 22 Jan 05
Posts: 3
Credit: 1,073,672
RAC: 0

Hi, Just a quick note to

Hi,

Just a quick note to say the client works for me.

The first workunits errored out with "einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda: error while loading shared libraries: libcufft.so.2: invalid ELF header"
I solved this by removing the provided libcu* files from disk and from app_info.xml so it would link to my existing 32-bit cuda installation (which turned out to be cuda 2.0)

Driver version 180.22, BOINC version 6.6.36 x86_64

koschi
koschi
Joined: 17 Mar 05
Posts: 86
Credit: 1,674,717,555
RAC: 525,785

RE: Wonderful, a long time

Message 94161 in response to message 94150

Quote:

Wonderful, a long time of waiting comes to an end :-D

I tried the app on my main box, Nvidia 8800GTS 512, Ubuntu 9.04 x64, BOINC 6.6.36, Nvidia driver 190.18 from some Ubuntu PPA. The system was so far processing workunits from GPUGRID and latest SETI CUDA apps, no issues...

The work units start, they show 1 second of time, then the progress bar is set to 100%, but the unit remains active. Process monitors show no CPU load nor does the GPU heat up. Somehow the CUDA process is just sitting there and doing nothing...

I can stop them, start another one, switch back to the first unit, they still show the same behaviour and will not start to compute. Though "leave application in memory while suspended/paused" is disabled, the other work units stay alive on the system, their processes don't terminate.

Very strange...

Tried it today on another machine, Nvidia 9800GT, Ubuntu 9.04 x84, BOINC 6.6.17, Nvidia driver 190.18 - same behaviour... :-(

Michael Karlinsky
Michael Karlinsky
Joined: 22 Jan 05
Posts: 888
Credit: 23,502,182
RAC: 0

Hi. Just installed CUDA

Hi.

Just installed CUDA beta on SUSE11, 9800GT (Green), 185.18.31 driver (latest). Hopefully without errors, only client errors at GPUGRID :(

One first question, is this relevant for Linux?

Quote:
We claim a full CPU core for the application because BOINC won't reset the task to idle priority when less than one core is claimed

because I have 4+1 task running on a quad core.

Michael

[edit] PS: Is it a good or bad sign, that no progress is reported.

Michael Karlinsky
Michael Karlinsky
Joined: 22 Jan 05
Posts: 888
Credit: 23,502,182
RAC: 0

RE: RE: Wonderful, a long

Message 94163 in response to message 94161

Quote:
Quote:

Wonderful, a long time of waiting comes to an end :-D

I tried the app on my main box, Nvidia 8800GTS 512, Ubuntu 9.04 x64, BOINC 6.6.36, Nvidia driver 190.18 from some Ubuntu PPA. The system was so far processing workunits from GPUGRID and latest SETI CUDA apps, no issues...

The work units start, they show 1 second of time, then the progress bar is set to 100%, but the unit remains active. Process monitors show no CPU load nor does the GPU heat up. Somehow the CUDA process is just sitting there and doing nothing...

I can stop them, start another one, switch back to the first unit, they still show the same behaviour and will not start to compute. Though "leave application in memory while suspended/paused" is disabled, the other work units stay alive on the system, their processes don't terminate.

Very strange...

Tried it today on another machine, Nvidia 9800GT, Ubuntu 9.04 x84, BOINC 6.6.17, Nvidia driver 190.18 - same behaviour... :-(

Same here. (Installed 190.18 drivers, because no progress, see last post.) Problem seems to be with libcuda.

[23:42:16][3725][INFO ] Application startup - thank you for supporting Einstein@Home!
[23:42:16][3725][INFO ] Starting data processing...
[23:42:16][3725][INFO ] Using CUDA device #0 "GeForce 9800 GT" (453.60 GFLOPS)
[23:42:16][3725][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[23:42:16][3725][INFO ] Header contents:
------> Original WAPP file: p2030_53676_75581_0047_G49.46-01.13.N_4.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53676.874780092592
------> Number of samples/record: 512
------> Center freq in MHz: 1420
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 192631.897006
------> DEC (J2000): 140550.079086
------> Galactic l: 49.4483
------> Galactic b: -1.1802
------> Name: G49.46-01.13.N
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 342.9956
------> ZA at start: 4.2896
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: PF,JD
------> File size (bytes): 16190702
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 184.8 cm^-3 pc
------> Scale factor: 9094.85
[23:42:18][3725][INFO ] Seed for random number generator is -1138023556.

[23:42:18][3725][ERROR] Application caught signal 11.

------> Obtained 18 stack frames for this thread.
------> Backtrace:
Frame 18:
Binary file: einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda (0x80acde2)
Offset info: pthread_mutex_lock+0x5de
einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda: File format not recognized
Frame 17:
Binary file: /usr/lib/libcuda.so (0xf77a399b)
Frame 16:
Binary file: /usr/lib/libcuda.so (0xf77a399b)
Frame 15:
Binary file: /usr/lib/libcuda.so (0xf77aa791)
Frame 14:
Binary file: /usr/lib/libcuda.so (0xf77747ae)
Frame 13:
Binary file: /usr/lib/libcuda.so (0xf77181f3)
Frame 12:
Binary file: /usr/lib/libcuda.so (0xf772c794)
Frame 11:
Binary file: /usr/lib/libcuda.so (0xf770e675)
Frame 10:
Binary file: /usr/lib/libcuda.so (0xf7707992)
Frame 9:
Binary file: /usr/lib/libcuda.so (0xf776a5bf)
Offset info: cuCtxCreate+0x4f
Frame 8:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0xf7dee8c0)
Frame 7:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0xf7def2c1)
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0xf7dcf33b)
Offset info: cudaMallocHost+0x2b
Frame 5:
Binary file: einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda (0x80b0012)
Offset info: MAIN+0x1ce2
einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda: File format not recognized
Frame 4:
Binary file: einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda (0x80adb7d)
einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda: File format not recognized
Frame 3:
Binary file: einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda (0x80adf0f)
Offset info: main+0x17f
einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda: File format not recognized
Frame 2:
Binary file: /lib/libc.so.6 (0xf7b13705)
Offset info: __libc_start_main+0xe5
------> End of backtrace

called boinc_finish
Frame 1:
Binary file: einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda (0x80acce1)
Offset info: pthread_key_create+0x3d
einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda: File format not recognized

HTH

Michael

(SUSE 11, 190.18 beta drivers, 9800GT, BOINC 6.6.36)

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,308
Credit: 249,867,713
RAC: 32,269

I observed the same crashes

I observed the same crashes in libcuda until I switched back to the old driver that's recommended for CUDA 2.1 (Driver version 181.20); I'm not sure about the version of the driver that crashed.

Would you mind trying the old driver just for a test?

BM

BM

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 982
Credit: 25,170,813
RAC: 0

Hi guys, Let me say this

Hi guys,

Let me say this again: we provided a i386 application as well as the i386 CUDA 2.1 runtime environment only. However, both of you are using x64 systems, so that's kind of unsupported and likely to produce errors due to incompatibilities.

Cheers,
Oliver

Einstein@Home Project

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.