All GPU WU's error out, CPU is ok.

Arvid Almstrom
Arvid Almstrom
Joined: 4 Mar 05
Posts: 19
Credit: 143575119
RAC: 0
Topic 196547

I have just started cruching and all my GPU tasks just error out with '- exit code -1073741515 (0xc0000135).

I cannot find any reference to this error anywhere, and as I am a newbie here at Einstien@Home I need some help.

Here is the host with the errors http://einsteinathome.org/host/5881329/tasks

I run Windows 7 Pro 64-bit, 16GB RAM, i7-3770K CPU, Intel 520 240GB SSD, 2 x GTX 670 GPU.

I have no problem crunching SETI work, when I can get any, but no luck with E@H.

Many thanks in advance

Arvid

Arvid Almstrom

Arvid Almstrom
Arvid Almstrom
Joined: 4 Mar 05
Posts: 19
Credit: 143575119
RAC: 0

All GPU WU's error out, CPU is ok.

Is there anyone out there that has any ideas as to why I cannot process GPU tasks?

I have tried to uninstall Intel HD Graphics, as was suggested in one thread, that did not work. I am no downloading the latest version of this driver and will try.

I have also tried to add the CUDA environment variable: CUDA_GRID_SIZE_COMPAT=1, but this also did nothing for me.

It looks, from what I have read, that there is a problem with Boinc client 7.0.28 and CUDA tasks. Can someone confirm if this is true or if you can run 7.0.28 with 2 x GTX 670 and process E@H tasks.

I have looked at the change logs for all the beta version up to 7.0.36, but I cannot see anything that would indicate that this is some issue with the client that has been resolved.

This host happily processes Seti@Home tasks.

Thanks

Arvid

Arvid Almstrom

Horacio
Horacio
Joined: 3 Oct 11
Posts: 205
Credit: 80557243
RAC: 0

There is a thread in the

There is a thread in the crunchers forum about that error.

It seems that for some hosts the combination of BOINC 7.xx, Win7 64Bit, and Keppler GPUs produce that error in the Einstein BRP apps.
I dont know if there is a new workaround but downgrading to BOINC 6.10 or 6.12 was the only solution.

Arvid Almstrom
Arvid Almstrom
Joined: 4 Mar 05
Posts: 19
Credit: 143575119
RAC: 0

RE: There is a thread in

Quote:

There is a thread in the crunchers forum about that error.

It seems that for some hosts the combination of BOINC 7.xx, Win7 64Bit, and Keppler GPUs produce that error in the Einstein BRP apps.
I dont know if there is a new workaround but downgrading to BOINC 6.10 or 6.12 was the only solution.

This was the tread that I had been reading and followed to try and resolve my issue. I just find it hard to believe that there is nothing else on this error.

The combination of Win7 64-bit, Boinc 7.0.28, and Kepler GPU is what I would expect most people getting a new computer in the last 6-months to have. If this was the problem, I would have thought that a developer would investigate and say what the problem actually is and then, hopefully, start to work on a solution.

The previous thread was started over 4-months ago and there does not seem to be any follow up on the issue. Are there any developers out there that have looked at this issue and confirm if this bug in the GPU client or if it in fact is something else, as this combination works for many other projects.

Some clarification would be wonderful.

Thanks

Arvid

Arvid Almstrom

Reed Young
Reed Young
Joined: 31 Jan 08
Posts: 9
Credit: 108843
RAC: 0

It's not particular to any

It's not particular to any version of Windows, nor Kepler GPUs. I see the same error in Debian AMD64 stable, with a GT 640 Fermi card, and software versions nvidia-opencl-common 302.17-3, libcuda1 v 304.48-1, boinc-client v 7.0.27, boinc-nvidia-cuda v 7.0.27.

All E@H Binary Pulsar Arecibo work units give "Computation Error" at exactly 2 seconds. At least it's not wasting much time! And I can easily check "don't use nVidia GPU" in BAM, but I would like to help find Binary Pulsars.

If I try to downgrade boinc-client to 6.10.58, Debian will also remove boinc-nvidia-cuda v 7.0.27 (dependency on boinc-client 7.0.27), so downgrading is not a very good option for me unless I could use the nvidia-cuda-dev or nvidia-cuda-toolkit packages, which both have version 4.2.9-1 installation candidates. Can anybody tell me if that would work, so that I don't have to set "no new tasks" on all jobs, let current ones finish, then try this new combination myself?

In case it's helpful to Einstein developers to know, GPUgrid does complete work units consistently on the same computer, but if I recall correctly PrimeGrid and Moo both fail like Einstein.

Thu 04 Oct 2012 02:54:42 AM MDT	Einstein@Home	Starting task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 using einsteinbinary_BRP4 version 128 (BRP4cuda32nv270) in slot 2
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Computation for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 finished
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0_0 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 absent
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0_1 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 absent
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0_2 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 absent
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0_3 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 absent
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0_4 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 absent
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0_5 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 absent
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0_6 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 absent
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0_7 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 absent
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Starting task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 using einsteinbinary_BRP4 version 128 (BRP4cuda32nv270) in slot 2
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Computation for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 finished
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0_0 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 absent
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0_1 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 absent
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0_2 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 absent
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0_3 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 absent
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0_4 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 absent
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0_5 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 absent
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0_6 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 absent
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0_7 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 absent
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Starting task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 using einsteinbinary_BRP4 version 128 (BRP4cuda32nv270) in slot 2
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Computation for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 finished
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0_0 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 absent
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0_1 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 absent
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0_2 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 absent
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0_3 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 absent
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0_4 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 absent
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0_5 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 absent
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0_6 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 absent
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0_7 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 absent
Horacio
Horacio
Joined: 3 Oct 11
Posts: 205
Credit: 80557243
RAC: 0

RE: It's not particular to

Quote:
It's not particular to any version of Windows, nor Kepler GPUs. I see the same error in Debian AMD64 stable, with a GT 640 Fermi card, and software versions nvidia-opencl-common 302.17-3, libcuda1 v 304.48-1, boinc-client v 7.0.27, boinc-nvidia-cuda v 7.0.27.


Your issue is different, from one of you errored results:

Quote:

7.0.27

process exited with code 252 (0xfc, -4)

[02:54:43][22650][INFO ] Application startup - thank you for supporting Einstein@Home!
[02:54:43][22650][INFO ] Starting data processing...
Error: API mismatch: the NVIDIA kernel module has version 304.48,
but this NVIDIA driver component has version 195.36.31. Please make
sure that the kernel module and all NVIDIA driver components
have the same version.

[02:54:43][22650][ERROR] Couldn't initialize CUDA driver API (error: 100)!
[02:54:43][22650][ERROR] Demodulation failed (error: 1020)!
02:54:43 (22650): called boinc_finish

]]>

I know very little about Linux so Im not sure how to solve this...

Reed Young
Reed Young
Joined: 31 Jan 08
Posts: 9
Credit: 108843
RAC: 0

Maybe that error message

Maybe that error message doesn't make sense to you, but it helps me -- I think.

Error: API mismatch: the NVIDIA kernel module has version 304.48,
but this NVIDIA driver component has version 195.36.31. Please make
sure that the kernel module and all NVIDIA driver components
have the same version.

It would have been more convenient if the error message said clearly exactly which package is meant by "this NVIDIA driver component" but I know that 195.36.31 is the version of nvidia drivers that installs by default on the Linux distro & release I'm using, and I had to mess with the package manager to get the 304.##.## drivers. Apparently, I missed one: libcuda1-ia32. That message at least told me where to start looking. Does Einstein send 32-bit work for GPU? If so, the wrong version of that package should matter. I won't know for sure that it's solved until I get more Einstein work units, but I bet that was the problem. Thanks, Horacio.

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

Reed, Yes the CUDA apps at

Reed,

Yes the CUDA apps at least are 32 bit and finding the correct linux package is always a challenge for me.

Tracking down error messages in a program that runs on this many platforms is always a challenge not just E@H. Most of the ones that I've seen are cryptic and end up translating into something like "crap, something is wrong" and not much more.

Joe

Reed Young
Reed Young
Joined: 31 Jan 08
Posts: 9
Credit: 108843
RAC: 0

Ha! RE: Reed, Yes the

Ha!

Quote:

Reed,

Yes the CUDA apps at least are 32 bit and finding the correct linux package is always a challenge for me.

Tracking down error messages in a program that runs on this many platforms is always a challenge not just E@H. Most of the ones that I've seen are cryptic and end up translating into something like "crap, something is wrong" and not much more.

Joe


Installing a higher version # of libcuda1-ia32 seems to be part of the solution, but still not the whole solution.

Stderr output

../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP4_1.28_i686-pc-linux-gnu__BRP4cuda32nv270: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory

I think that means I need to create a symlink to a file named libcuda.so.1 whose location I now know from grep, but where does the symlink to it need to go so that Einstein can find & use it? Thanks.

Reed Young
Reed Young
Joined: 31 Jan 08
Posts: 9
Credit: 108843
RAC: 0

Have you tried

Have you tried this?

Quote:
I had also such and other trouble with crunching errors on BRP4 tasks. That´s some malfunctioning of the BRP4 tasks in conjunktion with BOINC clients >= 7.0.25. Since I rolled back to 6.12.34 all things are running very well, but I can´t crunch A@H any more. As you have NVIDIA cards, you don´t need the newest BOINC clients.


It seems that astro-marwil has been satisfied with the results of reverting to an older version of BOINC software. It's what I'll probably try next, too, if nobody tells me soon where the symlink to libcuda.so.1 should be.

Horacio
Horacio
Joined: 3 Oct 11
Posts: 205
Credit: 80557243
RAC: 0

RE: It's what I'll probably

Quote:
It's what I'll probably try next, too, if nobody tells me soon where the symlink to libcuda.so.1 should be.

Don't... Your current issue is with the nVidia drivers and libraries, and the issue will remain no matter wich version of BOINC you use.
Sadly, I cant give you another clue, but Ive readed somewhere that there is some command utility that tells you which library dependencies an app needs, and also, there is some tricky thing about 32 and 64 bit libraries that sometimes have the same name, but need to be put in different folders... I know Im not giving you something really usefull...

Meanwhile, the best thing you can do is it to set the Einstein project prefferences to not use Nvidia GPU (in fact this will not stop the usage of the GPU, it will just stop sending new GPU tasks), then go to the Boinc Manager, and in advanced view, in the tasks tab, sort them by application, select all the BRP tasks and suspend them all. This way you can keep the host crunching with the CPU while waiting for somebody that gives you something more usefull to solve the GPU issue...
And also, in this way you will be able to resume just one at a time after doing changes to see if it's working...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.