BOINC GPU Tasks - Computation Errors

Alnitak
Alnitak
Joined: 27 Apr 21
Posts: 2
Credit: 297887
RAC: 0
Topic 225368

Hi, I've been having few computation errors lately in GPU related tasks and I would like to know what caused it. I've overclocked my GPU today and I think that this could be the possible cause.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117557170025
RAC: 35393991

Remove the overclock and if

Remove the overclock and if the errors stop then you will know!

 

Cheers,
Gary.

Alnitak
Alnitak
Joined: 27 Apr 21
Posts: 2
Credit: 297887
RAC: 0

Thanks, I have overclocked it

Thanks, I have overclocked it a bit better and everything seems normal again.

Masse
Masse
Joined: 18 Mar 05
Posts: 5
Credit: 570591
RAC: 20

All my GPU tasks are getting

All my GPU tasks are getting computation errors. Default clocks used on both GPU's (nVidia GTX 1650+GT 710, both Gigabyte cards).

Task is either dropped (changed to 100%) or restarted at some point after reaching 50% of progress. The highest value I've seen is 99% before failing.

 

One interesting thing is that both my GPU's perform at the very same speed, while they shouldn't.

 

Using BOINC 7.16.11

[ 4x E5-2680v4 + i9-10900K + Q9550S + A57 | 334GB | 4x RTX A2000 + M6000 + M40 + 3x GTX 1650 + UHD Graphics 630 + 2x Tegra X1 | Lubuntu 21.04 ]

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6439
Credit: 9568977126
RAC: 8552061

Hi, I am getting massive

Hi, I am getting massive computation errors from Gamma-Ray GPUs on an intermittent basis.

Here is an example: https://einsteinathome.org/task/1124476209

It is on this machine.

65 (0x00000041) Unknown error code

Any ideas?

Stderr output

<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
Network access is denied.
 (0x41) - exit code 65 (0x41)</message>
<stderr_txt>
22:05:37 (4180): [normal]: This Einstein@home App was built at: May  8 2019 13:29:27

22:05:37 (4180): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl1K-ati.exe'.
22:05:37 (4180): [debug]: 1e+016 fp, 1e+009 fp/s, 10500000 s, 2916h40m00s00
22:05:37 (4180): [normal]: % CPU usage: 0.333000, GPU usage: 0.333000
command line: projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl1K-ati.exe --inputfile ../../projects/einstein.phys.uwm.edu/LATeah3011L00.dat --alpha 2.59819959601 --delta -0.694603692878 --skyRadius 1.890770e-06 --ldiBins 15 --f0start 668.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.69860773e-15 --ephemdir ..\..\projects\einstein.phys.uwm.edu\JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah3011L00_0676_4306860.dat --debug 0 --device 0 -o LATeah3011L00_676.0_0_0.0_4306860_1_0.out
output files: 'LATeah3011L00_676.0_0_0.0_4306860_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah3011L00_676.0_0_0.0_4306860_1_0' 'LATeah3011L00_676.0_0_0.0_4306860_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah3011L00_676.0_0_0.0_4306860_1_1'
22:05:37 (4180): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
22:05:37 (4180): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [000000000389e550 , 00007ffb72559490]
Using OpenCL platform provided by: Advanced Micro Devices, Inc.
Using OpenCL device "gfx1010" by: Advanced Micro Devices, Inc.
Max allocation limit: 2764046336
Global mem size: 4278190080
OpenCL device has FP64 support
read_checkpoint(): Couldn't open file 'LATeah3011L00_676.0_0_0.0_4306860_1_0.out.cpt': No such file or directory (2)
% fft length: 16777216 (0x1000000)
% Scratch buffer size: 136314880
INFO: Major Windows version: 6
% C 0 46
% C 0 106
Error in computing index of fft input array, i:-1042237252 pair:1211772
ERROR: prepare_ts_2_phase_diff_sorted() returned with error -1
22:08:23 (4180): [CRITICAL]: ERROR: MAIN() returned with error '1'
FPU status flags: COND_1 PRECISION
22:08:35 (4180): [normal]: done. calling boinc_finish(65).
22:08:35 (4180): called boinc_finish

</stderr_txt>
]]>


A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Masse
Masse
Joined: 18 Mar 05
Posts: 5
Credit: 570591
RAC: 20

Masse skrev: All my GPU

Masse wrote:

All my GPU tasks are getting computation errors. Default clocks used on both GPU's (nVidia GTX 1650+GT 710, both Gigabyte cards).

...

One interesting thing is that both my GPU's perform at the very same speed, while they shouldn't.

GPU tasks work fine and performance differ significant between the GPU's in other projects (like those from MilkyWay@Home).

[ 4x E5-2680v4 + i9-10900K + Q9550S + A57 | 334GB | 4x RTX A2000 + M6000 + M40 + 3x GTX 1650 + UHD Graphics 630 + 2x Tegra X1 | Lubuntu 21.04 ]

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 401
Credit: 10156033455
RAC: 25750984

Masse wrote: Masse

Masse wrote:

Masse wrote:

All my GPU tasks are getting computation errors. Default clocks used on both GPU's (nVidia GTX 1650+GT 710, both Gigabyte cards).

...

One interesting thing is that both my GPU's perform at the very same speed, while they shouldn't.

GPU tasks work fine and performance differ significant between the GPU's in other projects (like those from MilkyWay@Home).

 

Just some thoughts of mine:

Please consider that the NVIDIA GT710 has "only" 2GB VRAM -- which will not be sufficient for GW-GPU-WUs.

Also the driver does not "officially" support both GPUs.

And be aware, that MW and EH handle "computations" differently.

To check things, I would pull the GT710, re-install the GTX1650 driver (don't forget to do a clean driver install each time), and see if the GW-WUs run OK.

Then do the same the other way around -- pull the GTX1650 and soley try the GT710 with the installed driver, if not OK try the re-installing the correct GT710 driver ...

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 401
Credit: 10156033455
RAC: 25750984

Tom M wrote: Hi, I am

Tom M wrote:

Hi, I am getting massive computation errors from Gamma-Ray GPUs on an intermittent basis.

Here is an example: https://einsteinathome.org/task/1124476209

It is on this machine.

65 (0x00000041) Unknown error code

Any ideas?

 

Just a vague idea:

Hope you are not using a driver version higher than 21.3.2 ?

There are posts about this cruising around ...

Masse
Masse
Joined: 18 Mar 05
Posts: 5
Credit: 570591
RAC: 20

nVidia are using the very

nVidia are using the very same driver for both cards. It works with other projects.

[ 4x E5-2680v4 + i9-10900K + Q9550S + A57 | 334GB | 4x RTX A2000 + M6000 + M40 + 3x GTX 1650 + UHD Graphics 630 + 2x Tegra X1 | Lubuntu 21.04 ]

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 401
Credit: 10156033455
RAC: 25750984

Ok, fine - then it is

Ok, fine - then it is "un-officially" working/supported.

 

But you missed the part about the 2GB VRAM issue for GW WUs.

Your tasks (WUs) that use the 1650 are validating OK.

The trouble you have is with the "mini" 710.

 

"Other" projects don't need more than 2GB VRAM GPUs.

Have a nice day ...

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46768922642
RAC: 64073259

San-Fernando-Valley

San-Fernando-Valley wrote:

Ok, fine - then it is "un-officially" working/supported.

It's officially supported for both. check the supported GPUs chart in the release notes for his driver (446.14)

 

http://us.download.nvidia.com/Windows/446.14/446.14-win10-win8-win7-release-notes.pdf

p.28-29

_________________________________________________________________________

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.