To clarify: originally, when two graphics cards were running in SLI, BOINC could only see them as one card, and could only run one CUDA task. But AFAIK, it had no problems running that single task: it's just that users thought they had paid for two cards, and both should be used....
With the release of the new 190.38 drivers - which seem to work well on some cards under Windows, but cause problems on other cards, and I believe are also problematic for BOINC/Linux - NVidia has made it possible for CUDA, and hence BOINC, to see and use the two cards as separate devices for computation, while still keeping SLI active for gaming and graphics rendering.
Ok, I suppose you have an additional on-board NVIDIA device, right? We also observed problems with early CUDA-supporting BOINC clients, such that they had problems indentifying usable devices. If you want you may try to use the latest Linux client (6.6.36) that can be found here. It'd be very interesting to see if it solves the problem.
Cheers,
Oliver
I don't have any CUDA-enabled device in my laptop. I only read about it on the SETI boards.
Sorry if this has been confusing. My reply about the additional on-board device was meant to address mat's problem you has two GTX 295 in his box...
Not any longer, when using the newest drivers (according to some threads on the SETI fora, e.g. Nvidia drivers 190.38 and SLI).
Ok, I suppose you have an additional on-board NVIDIA device, right? We also observed problems with early CUDA-supporting BOINC clients, such that they had problems indentifying usable devices.
Hello,
SLI mode is not active. When the same system is working in GPUGRID project 4 tasks are calculated in parallel.
Can it be a problem, that I don't have additional video card to plug a display? There was no problems in GPUGRID so far. But I'm going to check today in the evening - will replace one of the video cards with my old 7800GT, so it will not be used for calculations.
Can it be a problem, that I don't have additional video card to plug a display?
I don't think that this poses a problem. To the contrary: it's recommended to run CUDA tasks on devices that don't have a display attached for a number of reasons.
Looking at the two results you mentioned: typically the log output contains information about the device being used by the task - this is not the case in your examples. This, as well as the rest of the log output, indicates that there's a problem with your CUDA driver/toolkit setup (not a single CUDA instruction returns successfully). I think the root cause is that you're running a 32-bit version (application and CUDA libraries) on a x86_64 system.
The first workunits errored out with "einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda: error while loading shared libraries: libcufft.so.2: invalid ELF header"
I solved this by removing the provided libcu* files from disk and from app_info.xml so it would link to my existing 32-bit cuda installation (which turned out to be cuda 2.0)
Driver version 180.22, BOINC version 6.6.36 x86_64
Wonderful, a long time of waiting comes to an end :-D
I tried the app on my main box, Nvidia 8800GTS 512, Ubuntu 9.04 x64, BOINC 6.6.36, Nvidia driver 190.18 from some Ubuntu PPA. The system was so far processing workunits from GPUGRID and latest SETI CUDA apps, no issues...
The work units start, they show 1 second of time, then the progress bar is set to 100%, but the unit remains active. Process monitors show no CPU load nor does the GPU heat up. Somehow the CUDA process is just sitting there and doing nothing...
I can stop them, start another one, switch back to the first unit, they still show the same behaviour and will not start to compute. Though "leave application in memory while suspended/paused" is disabled, the other work units stay alive on the system, their processes don't terminate.
Very strange...
Tried it today on another machine, Nvidia 9800GT, Ubuntu 9.04 x84, BOINC 6.6.17, Nvidia driver 190.18 - same behaviour... :-(
Wonderful, a long time of waiting comes to an end :-D
I tried the app on my main box, Nvidia 8800GTS 512, Ubuntu 9.04 x64, BOINC 6.6.36, Nvidia driver 190.18 from some Ubuntu PPA. The system was so far processing workunits from GPUGRID and latest SETI CUDA apps, no issues...
The work units start, they show 1 second of time, then the progress bar is set to 100%, but the unit remains active. Process monitors show no CPU load nor does the GPU heat up. Somehow the CUDA process is just sitting there and doing nothing...
I can stop them, start another one, switch back to the first unit, they still show the same behaviour and will not start to compute. Though "leave application in memory while suspended/paused" is disabled, the other work units stay alive on the system, their processes don't terminate.
Very strange...
Tried it today on another machine, Nvidia 9800GT, Ubuntu 9.04 x84, BOINC 6.6.17, Nvidia driver 190.18 - same behaviour... :-(
Same here. (Installed 190.18 drivers, because no progress, see last post.) Problem seems to be with libcuda.
[23:42:16][3725][INFO ] Application startup - thank you for supporting Einstein@Home!
[23:42:16][3725][INFO ] Starting data processing...
[23:42:16][3725][INFO ] Using CUDA device #0 "GeForce 9800 GT" (453.60 GFLOPS)
[23:42:16][3725][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[23:42:16][3725][INFO ] Header contents:
------> Original WAPP file: p2030_53676_75581_0047_G49.46-01.13.N_4.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53676.874780092592
------> Number of samples/record: 512
------> Center freq in MHz: 1420
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 192631.897006
------> DEC (J2000): 140550.079086
------> Galactic l: 49.4483
------> Galactic b: -1.1802
------> Name: G49.46-01.13.N
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 342.9956
------> ZA at start: 4.2896
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: PF,JD
------> File size (bytes): 16190702
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 184.8 cm^-3 pc
------> Scale factor: 9094.85
[23:42:18][3725][INFO ] Seed for random number generator is -1138023556.
[23:42:18][3725][ERROR] Application caught signal 11.
I observed the same crashes in libcuda until I switched back to the old driver that's recommended for CUDA 2.1 (Driver version 181.20); I'm not sure about the version of the driver that crashed.
Would you mind trying the old driver just for a test?
Let me say this again: we provided a i386 application as well as the i386 CUDA 2.1 runtime environment only. However, both of you are using x64 systems, so that's kind of unsupported and likely to produce errors due to incompatibilities.
To clarify: originally, when
)
To clarify: originally, when two graphics cards were running in SLI, BOINC could only see them as one card, and could only run one CUDA task. But AFAIK, it had no problems running that single task: it's just that users thought they had paid for two cards, and both should be used....
With the release of the new 190.38 drivers - which seem to work well on some cards under Windows, but cause problems on other cards, and I believe are also problematic for BOINC/Linux - NVidia has made it possible for CUDA, and hence BOINC, to see and use the two cards as separate devices for computation, while still keeping SLI active for gaming and graphics rendering.
Hi
)
Hi Gundolf,
Sorry if this has been confusing. My reply about the additional on-board device was meant to address mat's problem you has two GTX 295 in his box...
Oliver
Einstein@Home Project
RE: RE: Not any longer,
)
Hello,
SLI mode is not active. When the same system is working in GPUGRID project 4 tasks are calculated in parallel.
Can it be a problem, that I don't have additional video card to plug a display? There was no problems in GPUGRID so far. But I'm going to check today in the evening - will replace one of the video cards with my old 7800GT, so it will not be used for calculations.
Hi mat, RE: Can it
)
Hi mat,
I don't think that this poses a problem. To the contrary: it's recommended to run CUDA tasks on devices that don't have a display attached for a number of reasons.
Looking at the two results you mentioned: typically the log output contains information about the device being used by the task - this is not the case in your examples. This, as well as the rest of the log output, indicates that there's a problem with your CUDA driver/toolkit setup (not a single CUDA instruction returns successfully). I think the root cause is that you're running a 32-bit version (application and CUDA libraries) on a x86_64 system.
Oliver
Einstein@Home Project
Hi, Just a quick note to
)
Hi,
Just a quick note to say the client works for me.
The first workunits errored out with "einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda: error while loading shared libraries: libcufft.so.2: invalid ELF header"
I solved this by removing the provided libcu* files from disk and from app_info.xml so it would link to my existing 32-bit cuda installation (which turned out to be cuda 2.0)
Driver version 180.22, BOINC version 6.6.36 x86_64
RE: Wonderful, a long time
)
Tried it today on another machine, Nvidia 9800GT, Ubuntu 9.04 x84, BOINC 6.6.17, Nvidia driver 190.18 - same behaviour... :-(
Hi. Just installed CUDA
)
Hi.
Just installed CUDA beta on SUSE11, 9800GT (Green), 185.18.31 driver (latest). Hopefully without errors, only client errors at GPUGRID :(
One first question, is this relevant for Linux?
because I have 4+1 task running on a quad core.
Michael
[edit] PS: Is it a good or bad sign, that no progress is reported.
Team Linux Users Everywhere
RE: RE: Wonderful, a long
)
Same here. (Installed 190.18 drivers, because no progress, see last post.) Problem seems to be with libcuda.
[23:42:18][3725][ERROR] Application caught signal 11.
------> Obtained 18 stack frames for this thread.
------> Backtrace:
Frame 18:
Binary file: einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda (0x80acde2)
Offset info: pthread_mutex_lock+0x5de
einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda: File format not recognized
Frame 17:
Binary file: /usr/lib/libcuda.so (0xf77a399b)
Frame 16:
Binary file: /usr/lib/libcuda.so (0xf77a399b)
Frame 15:
Binary file: /usr/lib/libcuda.so (0xf77aa791)
Frame 14:
Binary file: /usr/lib/libcuda.so (0xf77747ae)
Frame 13:
Binary file: /usr/lib/libcuda.so (0xf77181f3)
Frame 12:
Binary file: /usr/lib/libcuda.so (0xf772c794)
Frame 11:
Binary file: /usr/lib/libcuda.so (0xf770e675)
Frame 10:
Binary file: /usr/lib/libcuda.so (0xf7707992)
Frame 9:
Binary file: /usr/lib/libcuda.so (0xf776a5bf)
Offset info: cuCtxCreate+0x4f
Frame 8:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0xf7dee8c0)
Frame 7:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0xf7def2c1)
Frame 6:
Binary file: ../../projects/einstein.phys.uwm.edu/libcudart.so.2 (0xf7dcf33b)
Offset info: cudaMallocHost+0x2b
Frame 5:
Binary file: einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda (0x80b0012)
Offset info: MAIN+0x1ce2
einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda: File format not recognized
Frame 4:
Binary file: einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda (0x80adb7d)
einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda: File format not recognized
Frame 3:
Binary file: einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda (0x80adf0f)
Offset info: main+0x17f
einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda: File format not recognized
Frame 2:
Binary file: /lib/libc.so.6 (0xf7b13705)
Offset info: __libc_start_main+0xe5
------> End of backtrace
called boinc_finish
Frame 1:
Binary file: einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda (0x80acce1)
Offset info: pthread_key_create+0x3d
einsteinbinary_ABP1_1.07_i686-pc-linux-gnu-cuda: File format not recognized
HTH
Michael
(SUSE 11, 190.18 beta drivers, 9800GT, BOINC 6.6.36)
Team Linux Users Everywhere
I observed the same crashes
)
I observed the same crashes in libcuda until I switched back to the old driver that's recommended for CUDA 2.1 (Driver version 181.20); I'm not sure about the version of the driver that crashed.
Would you mind trying the old driver just for a test?
BM
BM
Hi guys, Let me say this
)
Hi guys,
Let me say this again: we provided a i386 application as well as the i386 CUDA 2.1 runtime environment only. However, both of you are using x64 systems, so that's kind of unsupported and likely to produce errors due to incompatibilities.
Cheers,
Oliver
Einstein@Home Project