All things Nvidia GPU

Keith Myers

Joined: 11 Feb 11

Posts: 5061

Credit: 19279999869

RAC: 7239702

Are you sure you didn't

11 Dec 2021 16:32:16 UTC

Message 190896 in response to message 190889

(moderation:

)

Are you sure you didn't change the defaults at the beginning and forget you did?

I have never seen a new BOINC installation have those defaults, never.

Unless something has been changed in the code and I am not aware of that.

I regularly read the commits and merges at the BOINC github repository and I don't remember reading anything about changing the client to default to 0.01 days of work and 0.01 days of additional work.

I'll have to visit the site again and do a search for this I guess.

San-Fernando-Valley

Joined: 16 Mar 16

Posts: 563

Credit: 10931449501

RAC: 15911131

... didn't want to "create"

11 Dec 2021 16:52:35 UTC

Message 190897 in response to message 190896

(moderation:

)

... didn't want to "create" work for you ...

Maybe I'm getting to old for this kind of fiddeling around.

I tend to believe what you are saying, sometimes I have trouble remembering the basics ...

What do I conclude out of this?

Well, NO MORE POSTS from me, I guess !

Cheers

petri33

Joined: 4 Mar 20

Posts: 129

Credit: 4462236383

RAC: 5554843

Hi, my normal runtime for

11 Dec 2021 23:01:18 UTC

Message 190902

(moderation:

)

Hi,

my normal runtime for a task is 124-179 seconds depending on which GPU it is run on.

Tonight I got a task that took 1200+ seconds to finish.

I have never seen a line like any of these:

% C 0 154

% C 0 309

% C 0 463

% C 0 615

% C 0 768

% C 0 921

Are they "candidates" to be verified or processing errors? Why the extra time?

petri33

p.s.

Here is a list of the total output of the task:

Task 1191519630

Name: LATeah4013L01_1124.0_0_0.0_9033365_1

Workunit ID: 587140640

Created: 14 Nov 2021 21:43:52 UTC

Sent: 14 Nov 2021 21:58:15 UTC

Report deadline: 28 Nov 2021 21:58:15 UTC

Received: 15 Nov 2021 10:33:39 UTC

Server state: Over

Outcome: Success

Client state: Done

Exit status: 0 (0x00000000)

Computer: 12836077

Run time (sec): 1,266.08

CPU time (sec): 269.10

Peak working set size (MB): 420.33

Peak swap size (MB): 19050.06

Peak disk usage (MB): 0.02

Validation state: Valid

Granted credit: 3,465

Application: Gamma-ray pulsar binary search #1 on GPUs v1.28 (FGRPopencl2Pup-nvidia)
x86_64-pc-linux-gnu

Stderr output

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<stderr_txt>
12:12:26 (26533): [normal]: This Einstein@home App was built at: Aug 17 2021 16:19:40

12:12:26 (26533): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.28_x86_64-pc-linux-gnu__FGRPopencl2Pup-nvidia'.
12:12:26 (26533): [debug]: 1e+16 fp, 6.1e+09 fp/s, 1710426 s, 475h07m05s57
12:12:26 (26533): [normal]: % CPU usage: 1.000000, GPU usage: 0.130000
command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.28_x86_64-pc-linux-gnu__FGRPopencl2Pup-nvidia --inputfile ../../projects/einstein.phys.uwm.edu/LATeah4013L01.dat --alpha 0.943218186562 --delta 1.30995332125 --skyRadius 2.617990e-08 --ldiBins 30 --f0start 1116.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.713401242e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah4013L01_1124_9033365.dat --debug 0 -o LATeah4013L01_1124.0_0_0.0_9033365_1_0.out
output files: 'LATeah4013L01_1124.0_0_0.0_9033365_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah4013L01_1124.0_0_0.0_9033365_1_0' 'LATeah4013L01_1124.0_0_0.0_9033365_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah4013L01_1124.0_0_0.0_9033365_1_1'
12:12:26 (26533): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
12:12:26 (26533): [debug]: glibc version/release: 2.31/stable
12:12:26 (26533): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0x2c72ea0 , 0x2c72ca0]
Using OpenCL platform provided by: NVIDIA Corporation
Using OpenCL device "NVIDIA GeForce RTX 2080 Ti" by: NVIDIA Corporation
Max allocation limit: 2888679424
Global mem size: 11554717696
read_checkpoint(): Couldn't open file 'LATeah4013L01_1124.0_0_0.0_9033365_1_0.out.cpt': No such file or directory (2)
% fft length: 16777216 (0x1000000)
% Scratch buffer size: 136314880
% C 0 154
% C 0 309
% C 0 463
% C 0 615
% C 0 768
% C 0 921
FPU status flags:
12:33:30 (26533): [normal]: done. calling boinc_finish(0).
12:33:30 (26533): called boinc_finish(0)

</stderr_txt>
]]>

Keith Myers

Joined: 11 Feb 11

Posts: 5061

Credit: 19279999869

RAC: 7239702

The % C datapoint is a

12 Dec 2021 0:03:14 UTC

Message 190905

(moderation:

)

The % C datapoint is a checkpoint at various stages in the computation. Checkpoints on the faster devices normally go straight through to the end and print a single checkpoint stage

But on the slower devices you get multiple checkpoints. I believe each checkpoint is when BOINC steps or switches away from crunching the task.

For example on my 3080 I got one checkpoint.

Using OpenCL device "NVIDIA GeForce RTX 3080" by: NVIDIA Corporation
Max allocation limit: 2626174976
Global mem size: 10504699904
read_checkpoint(): Couldn't open file 'LATeah4013L02_940.0_0_0.0_5767805_1_0.out.cpt': No such file or directory (2)
% fft length: 16777216 (0x1000000)
% Scratch buffer size: 136314880
% C 0 939
FPU status flags: 
17:06:47 (3969000): [normal]: done. calling boinc_finish(0).
17:06:47 (3969000): called boinc_finish(0)

Task 1193178428

But on my Raspberry Pi 4 I got dozens of checkpoints.

% checkpoint read: skypoint 67 binarypoint 9
% C 68 10
% C 69 11
05:31:34 (6565): [normal]: done. calling boinc_finish(0).
05:31:34 (6565): called boinc_finish(0)

Task 1198454623

Tomahawk4196

Joined: 31 Jan 14

Posts: 11

Credit: 2425776369

RAC: 2159774

RTX 3080 - memory

12 Dec 2021 14:43:42 UTC

Message 190915

(moderation:

)

RTX 3080 - memory temps

CPUID HW Monitor says the memory chips on my 3080 are hitting 108 degrees C, which is not good for the life of the card, of course.

There is much discussion on various sites about replacing the thermal transfer pads between the memory modules and the heat sink - is this something I should have done a while ago? Or can I tweak Einstein@home somehow to reduce the load on the card?

Thanks again

[img]

Keith Myers

Joined: 11 Feb 11

Posts: 5061

Credit: 19279999869

RAC: 7239702

No, there is not much you can

12 Dec 2021 16:27:55 UTC

Message 190917

(moderation:

)

No, there is not much you can tweak on the card other than to declock it so it doesn't work so hard on Einstein tasks.

If you were worried about the memory temps you should have chosen a different card or done as you stated and removed the heat sink and replaced the thermal pads with better quality than the OEM.

Or gone with a water cooled card via a AIO Hybrid or Custom cooling model.

I'm surprised on the temps on the memory as AFAIK the 3080 does not have any memory on the backside of the PCB like the 3090 which also has had these high memory temps on the backside modules. Hadn't seen many reports of high temps on the front side modules.

It might warrant taking the air cooler off the card and check for the fit of the cooler to the die and RAM modules. You should have obvious indents in the pads. There are better quality thermal pads available that have better heat transfer characteristics. I am a fan of FujiPoly pads myself.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4155

Credit: 50096158219

RAC: 42333396

Which model of 3080 do you

12 Dec 2021 17:01:42 UTC

Message 190918 in response to message 190915

(moderation:

)

Which model of 3080 do you have? Nvidia Founders Edition? Or some AIB (3rd party) model?

the FE nvidia cards are known for memory temp issues. Even on the 3080.

i have two EVGA 3070Ti cards with GDDR6X memory, and I don’t seem to be having any issues. But I can’t really check memory temps under Linux. My watercooled 3080Ti showed about 60C memory temps when booted into windows under memory intense loads.

_________________________________________________________________________

Tomahawk4196

Joined: 31 Jan 14

Posts: 11

Credit: 2425776369

RAC: 2159774

The machine in question is a

12 Dec 2021 22:07:06 UTC

Message 190920 in response to message 190915

(moderation:

)

The machine in question is a Dell Alienware R11 with liquid cooling, and I use it only for crunching, no mining or gaming at all. The CPU is also running too hot (crunching WCG), so i will change out the thermal paste on that to see if things improve.

This video discusses changing the thermal pads on the GPU, which the author claims is essentially a Founder's Edition: https://www.youtube.com/watch?v=bpmYlk4dnys

Edit: This video is better, it includes the addition of a Noctua case fan: https://www.youtube.com/watch?v=YklybEdoKIM

Wish me luck, the good kind, plz

Thank you

[img]

Keith Myers

Joined: 11 Feb 11

Posts: 5061

Credit: 19279999869

RAC: 7239702

All the reviews I have read

12 Dec 2021 21:19:28 UTC

Message 190921

(moderation:

)

All the reviews I have read of that case and system say that it runs hot and loud.

Crappy case that restricts air flow.

You should move the components to a better case that allows the components to shed the heat outside the case if you are tearing apart the gpu to repaste.

Keith Myers

Joined: 11 Feb 11

Posts: 5061

Credit: 19279999869

RAC: 7239702

That YT video was for the

12 Dec 2021 23:01:09 UTC

Message 190923 in response to message 190921

(moderation:

)

That YT video was for the year later R12 version. Hope your R11 build is identical for your modifications.

All things Nvidia GPU

Forums › Cruncher's Corner

Task 1191519630

Stderr output

Comment viewing options

Forums › Cruncher's Corner