[SOLVED] Upgrade to Cuda 5.0 NOT causing invalid results - Problem due to poor cooling.

Derek
Derek
Joined: 9 Feb 05
Posts: 9
Credit: 2059412
RAC: 0
Topic 196686

I recently upgraded to CUDA 5.0 and have since noticed a large amount of invalid results. Previously I was getting maybe one a month now its has exceeded 6 per day if not more.

Previous config:
CUDA 4.0 with the 290.10 nvidia driver
Ubuntu 10.04 LTS Server

Now
CUDA 5.0 with the 304.54 nvidia driver
Ubuntu 10.04 LTS Server

Graphics CARD:
Nvidia GTX 550 Ti

Derek
Derek
Joined: 9 Feb 05
Posts: 9
Credit: 2059412
RAC: 0

[SOLVED] Upgrade to Cuda 5.0 NOT causing invalid results - Probl

In case anyone was having similar problems. I have upgraded to the 310.19 driver, and things seem to be better now.

Derek
Derek
Joined: 9 Feb 05
Posts: 9
Credit: 2059412
RAC: 0

I think I spoke to soon...

I think I spoke to soon... Most of my tasks are in a validation inconclusive state. Is there any way that I can find out why the validator does not like the tasks? It might help me pin down the problem on my end.

Here is an example work unit that is invalid: 140678872
Here is one that is in the inconclusive state: 140693298

Thanks for any help

Derek
Derek
Joined: 9 Feb 05
Posts: 9
Credit: 2059412
RAC: 0

So, I found this thread which

So, I found this thread which seems to be a similar problem. http://einsteinathome.org/node/196578

I ran lsof and indeed BOINC is using the 64-bit libraries. But, it seems to be running fine except for the validation errors. I tried the trick at the bottom of the thread with making a link in the project directory but to no avail. I have installed the 32 bit libraries but no amount of ldconfig seems to get BOINC to use them.........

lsof | grep cuda
bash      1987     cwd       DIR              251,0     4096 12322691 /usr/local/cuda-5.0
boinc     2247     mem       REG              251,0 10321518 12066387 /usr/lib64/libcuda.so.310.19
einsteinb 2645     txt       REG              251,0  7572196  7472166 /home//BOINC/projects/einstein.phys.uwm.edu/einsteinbinary_BRP4_1.31_x86_64-pc-linux-gnu__BRP4cuda32nv270
einsteinb 2645     mem       REG              251,0 10321518 12066387 /usr/lib64/libcuda.so.310.19
einsteinb 2645     mem       REG              251,0   313872  7602456 /home//BOINC/slots/4/libcudart.so.3
Neil Newell
Neil Newell
Joined: 20 Nov 12
Posts: 176
Credit: 169699457
RAC: 0

Tried 'export

Tried 'export LD_LIBRARY_PATH=$PATH_TO_32BIT_LIBS' before running boinc? And perhaps 'ldd $INSTALL_PREFIX/boinc' to see which libs would be used.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 691076197
RAC: 262899

Hi! The version of the app

Hi!

The version of the app you are using on this host is:

einsteinbinary_BRP4_1.31_x86_64-pc-linux-gnu__BRP4cuda32nv270

so it is 64bit, and therefore, it needs 64 bit libs. That is ok.

Some of the hosts that did produce valid results for the tasks that failed on your PC actually also had CUDA 5 drivers under Linux (there is a line in the stderr.txt file that gets uploaded and is visisble in the result view).

http://einsteinathome.org/task/326520791

I doubt there is something fundamentally wrong with the app wrt. CUDA 5 drivers.

Cheers
HB

Derek
Derek
Joined: 9 Feb 05
Posts: 9
Credit: 2059412
RAC: 0

Neil, thanks for the advice I

Neil, thanks for the advice I had tried all kinds of things trying to figure out the library thing. But, I think Bikeman's comment indicates that I should be using the 64-bit drivers/libs, so thats good.

Bikeman, are you saying that I should expect invalid results from CUDA 5 until NVidia works things out or is there something that I can do on my end. I would prefer not to submit invalid results to the project on a daily basis.

I'm going to assume that rolling back to CUDA 4 might be my only option and that is going to be a royal pain. I think I still have the install files.......

Thanks for the responses.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

I´m running ubuntu x64 10.04

I´m running ubuntu x64 10.04 LTS with 310.14 which has been quite stable generally speaking.

Here is something I prepared earlier...
http://einsteinathome.org/task/327511338

I don´t recall installing any specific cuda version, so not sure if that helps.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 691076197
RAC: 262899

RE: Bikeman, are you

Quote:

Bikeman, are you saying that I should expect invalid results from CUDA 5 until NVidia works things out or is there something that I can do on my end.

To me, the fact that others are completing the same workunits with CUDA 5 on Linux seems to indicate the problem is not with NVIDIA. It always good to reboot, check the cooling, check the power supply, check the memory in this kind of situation. It's hard to diagnose these things remotely, of course.

CU
HB

microchip
microchip
Joined: 10 Jun 06
Posts: 50
Credit: 113149484
RAC: 57107

I'm on openSUSE with the

I'm on openSUSE with the latest stable NV driver (310.19) which offers CUDA 5 and I have no issues with it. Yes, I'm on 64-bit Linux too. I agree with Bikeman, check your system...

Derek
Derek
Joined: 9 Feb 05
Posts: 9
Credit: 2059412
RAC: 0

Checking the system isn't a

Checking the system isn't a bad idea. I just hadn't considered it since the upgrade to CUDA 5 is what caused invalid results to happen en-mass. At-least that is what appeared to be the case. I am going to let the majority of my work done validate and then start crunching again after a system evaluation.

Here is a listing of the typical temps in my system when crunching:

Adapter: ISA adapter
Core 0: +72.0°C (high = +82.0°C, crit = +100.0°C)

coretemp-isa-0001
Adapter: ISA adapter
Core 3: +70.0°C (high = +82.0°C, crit = +100.0°C)

coretemp-isa-0002
Adapter: ISA adapter
Core 1: +73.0°C (high = +82.0°C, crit = +100.0°C)

coretemp-isa-0003
Adapter: ISA adapter
Core 2: +70.0°C (high = +82.0°C, crit = +100.0°C)

Gpu : N/A
Gpu : 69 C
Fan Speed : 40 %

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.