I installed a GTX 650 Graphics card in my Linux computer. I installed the drivers from the NVIDIA website. However, I get computational errors on my Einstein CUDA workunits as seen below.
The only thing that looks strange to me is the line "0 CUDA cores / 0.00 GFLOPS".
I am merrily crunching away Milkyway CUDA workunits so it is not like I completely don't know what I am doing.
Can anyone shed any light on this?
Thanks
7.0.65
process exited with code 11 (0xb, -245)
[11:02:08][2349][INFO ] Application startup - thank you for supporting Einstein@Home!
[11:02:08][2349][INFO ] Starting data processing...
[11:02:08][2349][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 97 MB (1952 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[11:02:08][2349][INFO ] Using CUDA device #0 "GeForce GTX 650 Ti BOOST" (0 CUDA cores / 0.00 GFLOPS)
[11:02:08][2349][INFO ] Version of installed CUDA driver: 5050
[11:02:08][2349][INFO ] Version of CUDA driver API used: 3020
[11:02:08][2349][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[11:02:08][2349][INFO ] Header contents:
------> Original WAPP file: ./p2030.20130522.G37.93+00.18.C.b4s0g0.00000_DM155.20
------> Sample time in microseconds: 65.4762
------> Observation time in seconds: 274.62705
------> Time stamp (MJD): 56434.311847897603
------> Number of samples/record: 0
------> Center freq in MHz: 1214.289551
------> Channel band in MHz: 0.33605957
------> Number of channels/record: 960
------> Nifs: 1
------> RA (J2000): 185958.169399
------> DEC (J2000): 42537.728199
------> Galactic l: 0
------> Galactic b: 0
------> Name: G37.93+00.18.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 4194304
------> Trial dispersion measure: 155.2 cm^-3 pc
------> Scale factor: 0.00656217
[11:02:10][2349][INFO ] Seed for random number generator is 1150837623.
[11:02:11][2349][ERROR] Application caught signal 11.
------> Obtained 21 stack frames for this thread.
------> Backtrace:
Frame 21:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP4G_1.39_i686-pc-linux-gnu__BRP4G-cuda32-nv270 (0x80b8fa2)
Source file: erp_boinc_wrapper.cpp (Function: sighandler / Line: 167)
Frame 20:
Binary file: /usr/lib/libcuda.so.1 (0x2b1ea8e)
Offset info: +0x1afa8e
Frame 19:
Binary file: /usr/lib/libcuda.so.1 (0x2b1ea8e)
Offset info: +0x1afa8e
Frame 18:
Binary file: /usr/lib/libcuda.so.1 (0x2aef68c)
Offset info: +0x18068c
Frame 17:
Binary file: /usr/lib/libcuda.so.1 (0x2aef80b)
Offset info: +0x18080b
Frame 16:
Binary file: /usr/lib/libcuda.so.1 (0x2aefdf7)
Offset info: +0x180df7
Frame 15:
Binary file: /usr/lib/libcuda.so.1 (0x29ec8c0)
Offset info: +0x7d8c0
Frame 14:
Binary file: /usr/lib/libcuda.so.1 (0x29cb07f)
Offset info: cuModuleLoadFatBinary+0x5f
Frame 13:
Binary file: ./libcudart.so.3 (0x50904a)
Offset info: +0x2304a
Frame 12:
Binary file: ./libcudart.so.3 (0x4fe33d)
Offset info: +0x1833d
Frame 11:
Binary file: ./libcudart.so.3 (0x50280f)
Offset info: +0x1c80f
Frame 10:
Binary file: ./libcudart.so.3 (0x5040ef)
Offset info: +0x1e0ef
Frame 9:
Binary file: ./libcudart.so.3 (0x4fbf2c)
Offset info: +0x15f2c
Frame 8:
Binary file: ./libcudart.so.3 (0x52266f)
Offset info: cudaFree+0x4f
Frame 7:
Binary file: ./libcufft.so.3 (0xea6354)
Offset info: +0x36354
Frame 6:
Binary file: ./libcufft.so.3 (0xea6804)
Offset info: cufftPlan1d+0x44
Frame 5:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP4G_1.39_i686-pc-linux-gnu__BRP4G-cuda32-nv270 (0x80c05b3)
Offset info: set_up_fft+0xa3
Source file: demod_binary_cuda.cu (Function: set_up_fft / Line: 838)
Frame 4:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP4G_1.39_i686-pc-linux-gnu__BRP4G-cuda32-nv270 (0x80bbc48)
Offset info: MAIN+0x26a8
Source file: demod_binary.c (Function: MAIN / Line: 1066)
Frame 3:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP4G_1.39_i686-pc-linux-gnu__BRP4G-cuda32-nv270 (0x80b8643)
Source file: erp_boinc_wrapper.cpp (Function: worker / Line: 453)
Frame 2:
Binary file: ../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP4G_1.39_i686-pc-linux-gnu__BRP4G-cuda32-nv270 (0x80b8bdf)
Offset info: main+0x17f
Source file: erp_boinc_wrapper.cpp (Function: main / Line: 554)
Frame 1:
Binary file: /lib/i386-linux-gnu/libc.so.6 (0x20e4d3)
Offset info: __libc_start_main+0xf3
------> End of backtrace
11:02:11 (2349): called boinc_finish
]]>
Copyright © 2024 Einstein@Home. All rights reserved.
Signal 11 Error
)
Not sure I can be of much help here but...
I am running a GTX 650 Ti on Ubuntu 12.04 with NVIDIA drivers 304.88.
My WU log for a jobs shows:
Using CUDA device #0 "GeForce GTX 650 Ti" (0 CUDA cores / 0.00 GFLOPS)
so I don't believe that is a concern.
My log is pretty much the same as your down to:
[07:56:43][29023][INFO ] Seed for random number generator is XXXXXXXXXXX.
[07:56:43][29023][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 1.32531e-08
------> thr1 = 18.139
------> thr2 = 21.241
------> thr4 = 26.2686
------> thr8 = 34.6478
------> thr16 = 48.9581
This is where yours falls apart. Nothing in my log gives me an indication of where your problem is but I am thinking it could have something to do with missing libaries.
I did the following:
locate libcuda.so.1
and came up with the following:
/usr/lib/nvidia-304/libcuda.so.1
/usr/lib32/nvidia-304/libcuda.so.1
[edit]
ls -l /usr/lib/nvidia-304/libcuda.so.1 -> libcuda.so.304.88
ls -l /usr/lib32/nvidia-304/libcuda.so.1 -> libcuda.so.304.88
Do you have this library and others your log complains about? Also note that I have an NVIDIA library for the 304 drivers AND a lib32 reference for the same library. If you have these libraries then maybe it could be an issue with your NVIDIA driver release. Some NVIDIA drivers are known to cause problems on Linux. The 319 drivers caused me some serious headaches.
Hope this helps.
Please take a look at this
)
Please take a look at this page for prerequisites on 64 bit Linux (scroll down):
http://boinc.berkeley.edu/wiki/Installing_on_Linux
robi - I think you are on
)
robi -
I think you are on the right track.
The locate didn't show anything and the folders didn't contain the libraries. After some research, I installed from Synaptic libcudart4.
Some libraries now show up in some folders.
The locate still didn't show anything but after some more research I did a updatedb and now locate shows something but not what you have. I have
/usr/lib/libcuda.so.1
/usr/lib/nvidia-304/libcuda.so.1
The ls command you gave me gives a file not found error. (Not sure I was supposed to enter the whole thing). But,
ls -l /usr/lib/libcuda.so.1 gives me
/usr/lib/libcuda.so.1 -> libcuda.so.319.60
and
ls -l /usr/lib/nvidia-304/libcuda.so.1 gives me
/usr/lib/nvidia-304/libcuda.so.1 -> libcuda.so.304.88
I have driver 319.60 installed. That is in the name of the file I used. Isn't that 304.88 an error?
If the above shows something is messed up, I am not sure how to correct it.
Now, when I try to run an Einstein WU the screen and keyboard locks up and I have to push the reset to re-boot the computer.
I do have a file libcuda.so.319.60 in /usr/lib/ Do you think I need to have the entry pointing to 304.88 (above) point to the 319.60 file?
RE: robi - I think you
)
The above "ls" commands seem to indicate that you have two libraries installed for two different driver releases - 319 and 304. This would definitely be an issue.
I am really in the dark here because I do not know the Linux distro you are using. What possibly happened is you installed 319 over 304 without removing 304 first. This would explain the 304 softlink above.
On the NVIDIA site did you choose the 319 drivers or are they the default latest? What you need to consider is that by using the drivers install procedure from the NVIDIA site (if this is what you did) then everytime you update the kernel you will have to reinstall the drivers. This is a pain. The link I provide below will make driver management a lot less painful.
What you need to do at this point is remove all NVIDIA drivers and install again. Do not use the 319 drivers since they are problematic on Linux. I had installed them at one time and BRP5 jobs were taking 4 days to complete instead of the ~6 hours normally required.
I will refer you to this Link on how I installed the drivers on Ubuntu. This would not work if you are using some other distro because the commands would be different. When you read these instructions you will note that I am "adding" another repository for acquiring nvidia drivers.
Do not start "creating" soft links. This will only create more problems. Driver installation procedures will create the necessary links.
If your are using Ubuntu then follow the instruction on the link I provided. If some other distribution then post it and I will try to provide some specific help.
robi - I originally
)
robi -
I originally installed the drivers from a .run file I downloaded from the NVIDIA website. Somehow, without my input, I must have installed some drivers from the repository.
In the meantime I completely hosed my system trying to set the GPU fan speed on login. Not a trivial task and I did something VERY wrong.
So, I spent yesterday rebuilding my UBUNTU 12.04 system, I installed the drivers using the method you pointed me to in the last note.
The commands you suggested to try in your very first note now show what you have EXCEPT for the lib32 entries.
However, I am back to where I was. Milkyway runs fine. But Einstein either gets a computational error or locks up the computer after about 60 seconds.
One interesting observation is that if I quickly suspend the Einstein project after I see a computational error, then the Milkyway WU's all get a CL_OUT_OF_RESOURCES error, which indicate to me a probable driver problem. A reboot fixes this.
I can make this computer a Milkyway only computer and another computer a Einstein only computer. So, it is not like the GTX card is sitting idle.
In our correspondence I
)
In our correspondence I referenced this page:
edit
prerequisites for 64 bit os
This is for 64 bit Linux on Ubuntu and other Linux distros. Did you make sure these were present on your system.
Also after the rebuild of Unbuntu 12.04 did you follow these steps provided at this link?
Ubuntu Install
If you met these requirements and are still having problems then I would suggest the following.
1. launch the Ubuntu software development center gui
2. in the search bar type in nvidia-
3. at the bottom click on "show xx technical items" - IF it says "hide xx technical items" do nothing.
4. scroll down. you will see nvidia entries and those you have installed will have a white check mark in a green circle.
look through these you will see a lot of "stuff". You might be seeing drivers for 304, 310, 319 etc.
My "checked entries are for:
nvidia-common
nvidia-settings-304
optimized hardware acceleration of OpenGL with nvidia graphic cards
nvidia-304
These are the driver related packages that I have installed. Look through your list. Make sure that there is nothing checked for 319 related drivers or other drivers. I had once looked at this list and noted "checked" entries for 319 driver components. I then selected each of the 319 entries one at a time and removed them. Rebooted and all was good.
If you do not see these options then you might need to enter:
sudo add-apt-repository ppa:ubuntu-x-swat/x-updates
If you have done all of this and your package list looks good then I am really at a loss as to what is causing you the issue you are experiencing.
One other thing and I am really grasping at straws. In your BOINC manager under tools->processor usage
you might want to change the "multiprocessor use at most" to 50%. and see if that makes a difference.
If anyone else has a suggestion feel free to jump in.
robi - The "prerequisites
)
robi -
The "prerequisites for 64 bit os" includes (among of a lot of other things) the basic insturctions for installing BOINC from the BOINC website. I have done that and have BOINC running on this computer for several years (without a graphics card).
The "Ubuntu Install" link you sent has the command I used (sudo apt-get install nvidia-current). I didn't do the first items because I was doing a "clean install"
My Software Center checked items are -
Optimized Hardware Acceleration..........
nidia-common
nvivia-setting-304
nvivia-304
jockey-common
xserver-xorg-video-nouveau
configure third party proprietary drivers (jockey-gtk)
I think the last three were installed when I installed the "NVIDIA Settings" application which I use to adjust the fan speed.
Thanks for taking the time to reply to my issue. As I mentioned, I am merrily cranking away Milkyway workunits.
ok. Sorry it did not work
)
ok. Sorry it did not work out.
Maybe someone else will be able to help resolve the issue you are having.