All GPU WU's error out, CPU is ok.

Arvid Almstrom

Joined: 4 Mar 05

Posts: 19

Credit: 143575119

RAC: 0

1 Oct 2012 20:59:51 UTC

Topic 196547

(moderation:

)

I have just started cruching and all my GPU tasks just error out with '- exit code -1073741515 (0xc0000135).

I cannot find any reference to this error anywhere, and as I am a newbie here at Einstien@Home I need some help.

Here is the host with the errors http://einsteinathome.org/host/5881329/tasks

I run Windows 7 Pro 64-bit, 16GB RAM, i7-3770K CPU, Intel 520 240GB SSD, 2 x GTX 670 GPU.

I have no problem crunching SETI work, when I can get any, but no luck with E@H.

Many thanks in advance

Arvid

Arvid Almstrom

Joined: 4 Mar 05

Posts: 19

Credit: 143575119

RAC: 0

All GPU WU's error out, CPU is ok.

2 Oct 2012 7:15:56 UTC

Message 111565

(moderation:

)

Is there anyone out there that has any ideas as to why I cannot process GPU tasks?

I have tried to uninstall Intel HD Graphics, as was suggested in one thread, that did not work. I am no downloading the latest version of this driver and will try.

I have also tried to add the CUDA environment variable: CUDA_GRID_SIZE_COMPAT=1, but this also did nothing for me.

It looks, from what I have read, that there is a problem with Boinc client 7.0.28 and CUDA tasks. Can someone confirm if this is true or if you can run 7.0.28 with 2 x GTX 670 and process E@H tasks.

I have looked at the change logs for all the beta version up to 7.0.36, but I cannot see anything that would indicate that this is some issue with the client that has been resolved.

This host happily processes Seti@Home tasks.

Thanks

Arvid

Arvid Almstrom

Horacio

Joined: 3 Oct 11

Posts: 205

Credit: 80557243

RAC: 0

There is a thread in the

2 Oct 2012 21:24:16 UTC

Message 111566

(moderation:

)

There is a thread in the crunchers forum about that error.

It seems that for some hosts the combination of BOINC 7.xx, Win7 64Bit, and Keppler GPUs produce that error in the Einstein BRP apps.
I dont know if there is a new workaround but downgrading to BOINC 6.10 or 6.12 was the only solution.

Arvid Almstrom

Joined: 4 Mar 05

Posts: 19

Credit: 143575119

RAC: 0

RE: There is a thread in

2 Oct 2012 23:09:19 UTC

Message 111567 in response to message 111566

(moderation:

)

Quote:

There is a thread in the crunchers forum about that error.

It seems that for some hosts the combination of BOINC 7.xx, Win7 64Bit, and Keppler GPUs produce that error in the Einstein BRP apps.
I dont know if there is a new workaround but downgrading to BOINC 6.10 or 6.12 was the only solution.

This was the tread that I had been reading and followed to try and resolve my issue. I just find it hard to believe that there is nothing else on this error.

The combination of Win7 64-bit, Boinc 7.0.28, and Kepler GPU is what I would expect most people getting a new computer in the last 6-months to have. If this was the problem, I would have thought that a developer would investigate and say what the problem actually is and then, hopefully, start to work on a solution.

The previous thread was started over 4-months ago and there does not seem to be any follow up on the issue. Are there any developers out there that have looked at this issue and confirm if this bug in the GPU client or if it in fact is something else, as this combination works for many other projects.

Some clarification would be wonderful.

Thanks

Arvid

Arvid Almstrom

Reed Young

Joined: 31 Jan 08

Posts: 9

Credit: 108843

RAC: 0

It's not particular to any

4 Oct 2012 16:37:13 UTC

Message 111568 in response to message 111566

(moderation:

)

It's not particular to any version of Windows, nor Kepler GPUs. I see the same error in Debian AMD64 stable, with a GT 640 Fermi card, and software versions nvidia-opencl-common 302.17-3, libcuda1 v 304.48-1, boinc-client v 7.0.27, boinc-nvidia-cuda v 7.0.27.

All E@H Binary Pulsar Arecibo work units give "Computation Error" at exactly 2 seconds. At least it's not wasting much time! And I can easily check "don't use nVidia GPU" in BAM, but I would like to help find Binary Pulsars.

If I try to downgrade boinc-client to 6.10.58, Debian will also remove boinc-nvidia-cuda v 7.0.27 (dependency on boinc-client 7.0.27), so downgrading is not a very good option for me unless I could use the nvidia-cuda-dev or nvidia-cuda-toolkit packages, which both have version 4.2.9-1 installation candidates. Can anybody tell me if that would work, so that I don't have to set "no new tasks" on all jobs, let current ones finish, then try this new combination myself?

In case it's helpful to Einstein developers to know, GPUgrid does complete work units consistently on the same computer, but if I recall correctly PrimeGrid and Moo both fail like Einstein.

Thu 04 Oct 2012 02:54:42 AM MDT	Einstein@Home	Starting task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 using einsteinbinary_BRP4 version 128 (BRP4cuda32nv270) in slot 2
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Computation for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 finished
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0_0 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 absent
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0_1 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 absent
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0_2 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 absent
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0_3 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 absent
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0_4 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 absent
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0_5 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 absent
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0_6 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 absent
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0_7 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_3080_0 absent
Thu 04 Oct 2012 02:54:46 AM MDT	Einstein@Home	Starting task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 using einsteinbinary_BRP4 version 128 (BRP4cuda32nv270) in slot 2
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Computation for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 finished
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0_0 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 absent
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0_1 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 absent
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0_2 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 absent
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0_3 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 absent
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0_4 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 absent
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0_5 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 absent
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0_6 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 absent
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0_7 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2952_0 absent
Thu 04 Oct 2012 02:54:49 AM MDT	Einstein@Home	Starting task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 using einsteinbinary_BRP4 version 128 (BRP4cuda32nv270) in slot 2
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Computation for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 finished
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0_0 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 absent
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0_1 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 absent
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0_2 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 absent
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0_3 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 absent
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0_4 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 absent
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0_5 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 absent
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0_6 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 absent
Thu 04 Oct 2012 02:54:52 AM MDT	Einstein@Home	Output file p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0_7 for task p2030.20110226.G177.64-02.68.S.b0s0g0.00000_2960_0 absent

August 2012 marks the 36th consecutive August and 330th consecutive month with a global temperature above the 20th century average.

Horacio

Joined: 3 Oct 11

Posts: 205

Credit: 80557243

RAC: 0

RE: It's not particular to

4 Oct 2012 22:20:13 UTC

Message 111569 in response to message 111568

(moderation:

)

Quote:

It's not particular to any version of Windows, nor Kepler GPUs. I see the same error in Debian AMD64 stable, with a GT 640 Fermi card, and software versions nvidia-opencl-common 302.17-3, libcuda1 v 304.48-1, boinc-client v 7.0.27, boinc-nvidia-cuda v 7.0.27.

Your issue is different, from one of you errored results:

Quote:

7.0.27

process exited with code 252 (0xfc, -4)

[02:54:43][22650][INFO ] Application startup - thank you for supporting Einstein@Home!
[02:54:43][22650][INFO ] Starting data processing...
Error: API mismatch: the NVIDIA kernel module has version 304.48,
but this NVIDIA driver component has version 195.36.31. Please make
sure that the kernel module and all NVIDIA driver components
have the same version.
[02:54:43][22650][ERROR] Couldn't initialize CUDA driver API (error: 100)!
[02:54:43][22650][ERROR] Demodulation failed (error: 1020)!
02:54:43 (22650): called boinc_finish

]]>

I know very little about Linux so Im not sure how to solve this...

Reed Young

Joined: 31 Jan 08

Posts: 9

Credit: 108843

RAC: 0

Maybe that error message

5 Oct 2012 0:01:16 UTC

Message 111570 in response to message 111569

(moderation:

)

Maybe that error message doesn't make sense to you, but it helps me -- I think.

Error: API mismatch: the NVIDIA kernel module has version 304.48,
but this NVIDIA driver component has version 195.36.31. Please make
sure that the kernel module and all NVIDIA driver components
have the same version.

It would have been more convenient if the error message said clearly exactly which package is meant by "this NVIDIA driver component" but I know that 195.36.31 is the version of nvidia drivers that installs by default on the Linux distro & release I'm using, and I had to mess with the package manager to get the 304.##.## drivers. Apparently, I missed one: libcuda1-ia32. That message at least told me where to start looking. Does Einstein send 32-bit work for GPU? If so, the wrong version of that package should matter. I won't know for sure that it's solved until I get more Einstein work units, but I bet that was the problem. Thanks, Horacio.

August 2012 marks the 36th consecutive August and 330th consecutive month with a global temperature above the 20th century average.

joe areeda

Joined: 13 Dec 10

Posts: 285

Credit: 320378898

RAC: 0

Reed, Yes the CUDA apps at

5 Oct 2012 3:48:02 UTC

Message 111571

(moderation:

)

Reed,

Yes the CUDA apps at least are 32 bit and finding the correct linux package is always a challenge for me.

Tracking down error messages in a program that runs on this many platforms is always a challenge not just E@H. Most of the ones that I've seen are cryptic and end up translating into something like "crap, something is wrong" and not much more.

Joe

Reed Young

Joined: 31 Jan 08

Posts: 9

Credit: 108843

RAC: 0

Ha! RE: Reed, Yes the

6 Oct 2012 16:52:09 UTC

Message 111572 in response to message 111571

(moderation:

)

Ha!

Quote:

Reed,

Yes the CUDA apps at least are 32 bit and finding the correct linux package is always a challenge for me.

Tracking down error messages in a program that runs on this many platforms is always a challenge not just E@H. Most of the ones that I've seen are cryptic and end up translating into something like "crap, something is wrong" and not much more.

Joe

Installing a higher version # of libcuda1-ia32 seems to be part of the solution, but still not the whole solution.

Stderr output

../../projects/einstein.phys.uwm.edu/einsteinbinary_BRP4_1.28_i686-pc-linux-gnu__BRP4cuda32nv270: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory

I think that means I need to create a symlink to a file named libcuda.so.1 whose location I now know from grep, but where does the symlink to it need to go so that Einstein can find & use it? Thanks.

August 2012 marks the 36th consecutive August and 330th consecutive month with a global temperature above the 20th century average.

Reed Young

Joined: 31 Jan 08

Posts: 9

Credit: 108843

RAC: 0

Have you tried

6 Oct 2012 17:04:49 UTC

Message 111573 in response to message 111567

(moderation:

)

Have you tried this?

Quote:

I had also such and other trouble with crunching errors on BRP4 tasks. ThatÂ´s some malfunctioning of the BRP4 tasks in conjunktion with BOINC clients >= 7.0.25. Since I rolled back to 6.12.34 all things are running very well, but I canÂ´t crunch A@H any more. As you have NVIDIA cards, you donÂ´t need the newest BOINC clients.

It seems that astro-marwil has been satisfied with the results of reverting to an older version of BOINC software. It's what I'll probably try next, too, if nobody tells me soon where the symlink to libcuda.so.1 should be.

August 2012 marks the 36th consecutive August and 330th consecutive month with a global temperature above the 20th century average.

Horacio

Joined: 3 Oct 11

Posts: 205

Credit: 80557243

RAC: 0

RE: It's what I'll probably

6 Oct 2012 20:14:13 UTC

Message 111574 in response to message 111573

(moderation:

)

Quote:

It's what I'll probably try next, too, if nobody tells me soon where the symlink to libcuda.so.1 should be.

Don't... Your current issue is with the nVidia drivers and libraries, and the issue will remain no matter wich version of BOINC you use.
Sadly, I cant give you another clue, but Ive readed somewhere that there is some command utility that tells you which library dependencies an app needs, and also, there is some tricky thing about 32 and 64 bit libraries that sometimes have the same name, but need to be put in different folders... I know Im not giving you something really usefull...

Meanwhile, the best thing you can do is it to set the Einstein project prefferences to not use Nvidia GPU (in fact this will not stop the usage of the GPU, it will just stop sending new GPU tasks), then go to the Boinc Manager, and in advanced view, in the tasks tab, sort them by application, select all the BRP tasks and suspend them all. This way you can keep the host crunching with the CPU while waiting for somebody that gives you something more usefull to solve the GPU issue...
And also, in this way you will be able to resume just one at a time after doing changes to see if it's working...

All GPU WU's error out, CPU is ok.

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports