CUDA App crashes

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

I don't know if this will

I don't know if this will help figure out what's wrong but these are the device properties returned by cudaGetDeviceProperties

Found 1 Device
Properties:
Device properties:
    maxThreadsPerBlock=512
    maxThreadsDim=[512, 512, 64]
    maxGridSize=[65535, 65535, 1]
    sharedMemPerBlock=16384
    totalConstantMemory=65536
    regsPerBlock=16384
    SIMDWidth=32
    memPitch=2147483647
    regsPerBlock=16384
    clockRate=1340000
    textureAlign=256
M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

I'm sure you did ldd and

I'm sure you did ldd and start BOINC after the CUDA drivers are loaded ether through X11 or this way?

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

RE: I'm sure you did ldd

Quote:
I'm sure you did ldd and start BOINC after the CUDA drivers are loaded ether through X11 or this way?


I think you're giving me too much credit.

ldd on boinc shows:

ldd /home2/boinc/BOINC/boinc
	linux-vdso.so.1 =>  (0x00007fff415ff000)
	libdl.so.2 => /lib/libdl.so.2 (0x00007f5fec557000)
	libnsl.so.1 => /lib/libnsl.so.1 (0x00007f5fec33d000)
	libz.so.1 => /lib/libz.so.1 (0x00007f5fec124000)
	libpthread.so.0 => /lib/libpthread.so.0 (0x00007f5febf07000)
	libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f5febc01000)
	libm.so.6 => /lib/libm.so.6 (0x00007f5feb97d000)
	libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f5feb767000)
	libc.so.6 => /lib/libc.so.6 (0x00007f5feb3e4000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f5fec76d000)


but ldd on projects/einstein.phys.uwm.edu/einsteinbinary_BRP3_1.08_i686-pc-linux-gnu__BRP3cuda32nv270 says:
not a dynamic executable

Could it be a 32/64 bit issue?

As far as your link into the nVidia forum, I don't really understand the issue. X11 is running when I try form the console or a VNC window and i am able to make calls to the GPU using CUDA libraries. I'm just dabbling in my code to understand capabilities so I may very well be missing something. The nVidia driver application on the console shows the GPU is active.

There is a /dev/nvidia0 and /dev/nvidiactl
lspci shows VGA but not a 3d device

I was going to download and reinstall the boinc software but boinc.berkeley.edu seems to be down tonight.

I reset the project again. If that doesn't work I'll try reinstalling tomorrow.

Thanks for the help.
Joe

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

[pre]micha@luemmel:~/Download

[pre]micha@luemmel:~/Downloads/BOINC/projects/einstein.phys.uwm.edu> ldd einsteinbinary_BRP3_1.08_i686-pc-linux-gnu__BRP3cuda32nv270
linux-gate.so.1 => (0xffffe000)
libcufft.so.3 => /usr/local/cuda/lib/libcufft.so.3 (0xf5c7b000)
libcudart.so.3 => /usr/local/cuda/lib/libcudart.so.3 (0xf5c27000)
libcuda.so.1 => /usr/lib/libcuda.so.1 (0xf521e000)
libpthread.so.0 => /lib/libpthread.so.0 (0xf5203000)
libm.so.6 => /lib/libm.so.6 (0xf51d8000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xf50e8000)
libc.so.6 => /lib/libc.so.6 (0xf4f7d000)
/lib/ld-linux.so.2 (0xf77aa000)
libdl.so.2 => /lib/libdl.so.2 (0xf4f78000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf4f5a000)
librt.so.1 => /lib/librt.so.1 (0xf4f50000)
libz.so.1 => /lib/libz.so.1 (0xf4f3b000)[/pre]
Do you have the 32Bit libs installed? BOINC can be a 64Bit app, but the apps around here are 32Bit(except 1). I guess you do have them, cause you got credits with that host.

I don't think there is a reason to install anything new, just let us solve the problem. I run 64Bit Linux too. I had to struggle a lot, cause I started boinc with an init script that did not work anymore. The simple reason: Drivers not loaded.

You might try to move the whole BOINC directory into your home folder and set all rights to yourself. Then you can be sure the driver is loaded. Make a link on your desktop to start the BOINC-Manager and then the core client will be started too.

I didn't try the stuff on the Nvidia page jet, but I will soon, cause I want to run one host without X11. Then I will include the script example there in my init script.

[edit] What does 'nvidia-smi -q' tell you?
Did you check the 'NVIDIA X Server Settings' program?

cu,
Michael

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

RE: I don't think there is

Quote:
I don't think there is a reason to install anything new, just let us solve the problem. I run 64Bit Linux too. I had to struggle a lot, cause I started boinc with an init script that did not work anymore. The simple reason: Drivers not loaded.

I appreciate the hand holding. I am more interested in figuring out what went wrong so I better understand what a GPU app needs.

I'm busy all day today, I am working with my mechanic to do a top overhaul on our airplane's engine but I will try your suggestions this evening.

nvidia-smi -q from a ssh login gives a lot of n/a's (output below), I'll try it from the console when I get back.

The nVidia X-Server Settings is the one I access from System/preferences/monitor (Ubuntu-GNome) right? It looks similar to what I see on my laptop which has been running the CUDA apps for a couple of months. (D'oh) I will do a comparison of everything I can think of between the working (Intel I-5, 8 GB, Ubuntu 10.04.2, NVS 3100M-Driver 270.18) and non-working (AMD Phenom II, 16GB, Ubuntu 10.10, GeForce GT 240-Driver 270.29). The difference in nvidia-smi output looks significant but it may be because one is in a terminal window and the other an ssh login.

Joe
I wish we could do attachments but here are links to nv-smi output on the working and non-working systems.

nvidia-smi -q output from non-working AMD server
nvidia-smi -q output from working Intel laptop

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

one more thing. On the

one more thing.
On the working system:
[pre]ldd einsteinbinary_BRP3_1.08_i686-pc-linux-gnu__BRP3cuda32fullCPU*
linux-gate.so.1 => (0xf7703000)
libcufft.so.3 => /usr/local/cuda/lib/libcufft.so.3 (0xf5c01000)
libcudart.so.3 => /usr/local/cuda/lib/libcudart.so.3 (0xf5bac000)
libcuda.so.1 => /usr/lib32/nvidia-current/libcuda.so.1 (0xf517d000)
libpthread.so.0 => /lib32/libpthread.so.0 (0xf5164000)
libm.so.6 => /lib32/libm.so.6 (0xf513e000)
libstdc++.so.6 => /usr/lib32/libstdc++.so.6 (0xf5048000)
libc.so.6 => /lib32/libc.so.6 (0xf4eed000)
/lib/ld-linux.so.2 (0xf7704000)
libdl.so.2 => /lib32/libdl.so.2 (0xf4ee9000)
libgcc_s.so.1 => /usr/lib32/libgcc_s.so.1 (0xf4eca000)
librt.so.1 => /lib32/librt.so.1 (0xf4ec1000)
libz.so.1 => /usr/lib32/libz.so.1 (0xf4eac000)
[/pre]
on the new non-working system:
[pre]ldd einsteinbinary_BRP3_1.00_graphics_i686-pc-linux-gnu
not a dynamic executable
[/pre]

Different programs and completely different results.
joe

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250252665
RAC: 35686

RE: one more thing. on the

Quote:
one more thing.
on the new non-working system:
[pre]ldd einsteinbinary_BRP3_1.00_graphics_i686-pc-linux-gnu
not a dynamic executable
[/pre]


This is the screensaver graphics application. It should be on both systems, and ldd should tell you the same thing.

If the actual application binary "einsteinbinary_BRP3_1.0?_i686-pc-linux-gnu*" is missing on one system this would be the file execv complains about, though I don't know why the Client wouldn't simply download it (again).

BM

BM

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

areeda wrote in this

areeda wrote in this post:

Quote:
but ldd on projects/einstein.phys.uwm.edu/einsteinbinary_BRP3_1.08_i686-pc-linux-gnu__BRP3cuda32nv270 says:
not a dynamic executable


So I think he's got the app.
Don't have access to my host atm, so I will try to reply in detail tomorrow.

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

I just got back. The

I just got back.

The system that works has these executable files in $HOME/BOINC/projects/einstein.phys.uwm.edu/

./einsteinbinary_BRP3_1.00_graphics_i686-pc-linux-gnu
./einsteinbinary_BRP3_1.08_i686-pc-linux-gnu__BRP3cuda32fullCPU
./einsteinbinary_BRP3_1.08_i686-pc-linux-gnu__BRP3cuda32nv270
./einstein_S5GC1HF_1.07_i686-pc-linux-gnu__SSE2
./einstein_S5R6_1.01_graphics_i686-pc-linux-gnu
./libcudart32_32_16.so
./libcufft32_32_16.so

The system that doesn't has these:
./einsteinbinary_BRP3_1.00_graphics_i686-pc-linux-gnu
./einsteinbinary_BRP3_1.08_i686-pc-linux-gnu__BRP3cuda32nv270
./libcudart32_32_16.so
./libcufft32_32_16.so

I have reset the project (a couple of times).

What do you think about copying BRP3cuda32fullCPU from one to the other?

Joe

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

Well, I think I fixed

Well, I think I fixed it.

The ldd failure to identify the einstein apps correctly was the lead.
I made too many changes to identify exactly what fixed it but it's been running for 30 minutes without an error using the GPU. I hope that when the scheduler forgives me for all the computation errors it starts to use the available cores too.

I did a bunch of googling and found similar problems with Google Earth so I downloaded and installed that. Finding some package problems the last thing I did that seemed to be relevant was:
apt-get install -f
apt-get upgrade

Some very basic things got updated like libc (WTF) and afterwards ldd behaved reasonably:

ldd einsteinbinary_BRP3_1.00_graphics_i686-pc-linux-gnu
	linux-gate.so.1 =>  (0xf771c000)
	libpthread.so.0 => /lib32/libpthread.so.0 (0xf76e6000)
	libm.so.6 => /lib32/libm.so.6 (0xf76bf000)
	libdl.so.2 => /lib32/libdl.so.2 (0xf76bb000)
	libX11.so.6 => /usr/lib32/libX11.so.6 (0xf759e000)
	libXext.so.6 => /usr/lib32/libXext.so.6 (0xf758e000)
	libGL.so.1 => /usr/lib32/nvidia-current/libGL.so.1 (0xf74c0000)
	libGLU.so.1 => /usr/lib32/libGLU.so.1 (0xf7450000)
	libc.so.6 => /lib32/libc.so.6 (0xf72f4000)
	/lib/ld-linux.so.2 (0xf771d000)
	libxcb.so.1 => /usr/lib32/libxcb.so.1 (0xf72da000)
	libnvidia-tls.so.270.29 => /usr/lib32/nvidia-current/tls/libnvidia-tls.so.270.29 (0xf72d8000)
	libnvidia-glcore.so.270.29 => /usr/lib32/nvidia-current/libnvidia-glcore.so.270.29 (0xf5bb9000)
	libstdc++.so.6 => /usr/lib32/libstdc++.so.6 (0xf5ace000)
	libgcc_s.so.1 => /usr/lib32/libgcc_s.so.1 (0xf5ab1000)
	libXau.so.6 => /usr/lib32/libXau.so.6 (0xf5aad000)
	libXdmcp.so.6 => /usr/lib32/libXdmcp.so.6 (0xf5aa7000)

I wish I had a better handle on what went wrong and how it got fixed. I'm more than happy to examine logs or anything else you can think of.

Joe

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.