Are you sure you didn't change the defaults at the beginning and forget you did?
I have never seen a new BOINC installation have those defaults, never.
Unless something has been changed in the code and I am not aware of that.
I regularly read the commits and merges at the BOINC github repository and I don't remember reading anything about changing the client to default to 0.01 days of work and 0.01 days of additional work.
I'll have to visit the site again and do a search for this I guess.
<core_client_version>7.16.6</core_client_version>
<![CDATA[
<stderr_txt>
12:12:26 (26533): [normal]: This Einstein@home App was built at: Aug 17 2021 16:19:40
12:12:26 (26533): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.28_x86_64-pc-linux-gnu__FGRPopencl2Pup-nvidia'.
12:12:26 (26533): [debug]: 1e+16 fp, 6.1e+09 fp/s, 1710426 s, 475h07m05s57
12:12:26 (26533): [normal]: % CPU usage: 1.000000, GPU usage: 0.130000
command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.28_x86_64-pc-linux-gnu__FGRPopencl2Pup-nvidia --inputfile ../../projects/einstein.phys.uwm.edu/LATeah4013L01.dat --alpha 0.943218186562 --delta 1.30995332125 --skyRadius 2.617990e-08 --ldiBins 30 --f0start 1116.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.713401242e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah4013L01_1124_9033365.dat --debug 0 -o LATeah4013L01_1124.0_0_0.0_9033365_1_0.out
output files: 'LATeah4013L01_1124.0_0_0.0_9033365_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah4013L01_1124.0_0_0.0_9033365_1_0' 'LATeah4013L01_1124.0_0_0.0_9033365_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah4013L01_1124.0_0_0.0_9033365_1_1'
12:12:26 (26533): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
12:12:26 (26533): [debug]: glibc version/release: 2.31/stable
12:12:26 (26533): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0x2c72ea0 , 0x2c72ca0]
Using OpenCL platform provided by: NVIDIA Corporation
Using OpenCL device "NVIDIA GeForce RTX 2080 Ti" by: NVIDIA Corporation
Max allocation limit: 2888679424
Global mem size: 11554717696
read_checkpoint(): Couldn't open file 'LATeah4013L01_1124.0_0_0.0_9033365_1_0.out.cpt': No such file or directory (2)
% fft length: 16777216 (0x1000000)
% Scratch buffer size: 136314880
% C 0 154
% C 0 309
% C 0 463
% C 0 615
% C 0 768
% C 0 921
FPU status flags:
12:33:30 (26533): [normal]: done. calling boinc_finish(0).
12:33:30 (26533): called boinc_finish(0)
The % C datapoint is a checkpoint at various stages in the computation. Checkpoints on the faster devices normally go straight through to the end and print a single checkpoint stage
But on the slower devices you get multiple checkpoints. I believe each checkpoint is when BOINC steps or switches away from crunching the task.
For example on my 3080 I got one checkpoint.
Using OpenCL device "NVIDIA GeForce RTX 3080" by: NVIDIA Corporation
Max allocation limit: 2626174976
Global mem size: 10504699904
read_checkpoint(): Couldn't open file 'LATeah4013L02_940.0_0_0.0_5767805_1_0.out.cpt': No such file or directory (2)
% fft length: 16777216 (0x1000000)
% Scratch buffer size: 136314880
% C 0 939
FPU status flags:
17:06:47 (3969000): [normal]: done. calling boinc_finish(0).
17:06:47 (3969000): called boinc_finish(0)
CPUID HW Monitor says the memory chips on my 3080 are hitting 108 degrees C, which is not good for the life of the card, of course.
There is much discussion on various sites about replacing the thermal transfer pads between the memory modules and the heat sink - is this something I should have done a while ago? Or can I tweak Einstein@home somehow to reduce the load on the card?
No, there is not much you can tweak on the card other than to declock it so it doesn't work so hard on Einstein tasks.
If you were worried about the memory temps you should have chosen a different card or done as you stated and removed the heat sink and replaced the thermal pads with better quality than the OEM.
Or gone with a water cooled card via a AIO Hybrid or Custom cooling model.
I'm surprised on the temps on the memory as AFAIK the 3080 does not have any memory on the backside of the PCB like the 3090 which also has had these high memory temps on the backside modules. Hadn't seen many reports of high temps on the front side modules.
It might warrant taking the air cooler off the card and check for the fit of the cooler to the die and RAM modules. You should have obvious indents in the pads. There are better quality thermal pads available that have better heat transfer characteristics. I am a fan of FujiPoly pads myself.
Which model of 3080 do you have? Nvidia Founders Edition? Or some AIB (3rd party) model?
the FE nvidia cards are known for memory temp issues. Even on the 3080.
i have two EVGA 3070Ti cards with GDDR6X memory, and I don’t seem to be having any issues. But I can’t really check memory temps under Linux. My watercooled 3080Ti showed about 60C memory temps when booted into windows under memory intense loads.
The machine in question is a Dell Alienware R11 with liquid cooling, and I use it only for crunching, no mining or gaming at all. The CPU is also running too hot (crunching WCG), so i will change out the thermal paste on that to see if things improve.
All the reviews I have read of that case and system say that it runs hot and loud.
Crappy case that restricts air flow.
You should move the components to a better case that allows the components to shed the heat outside the case if you are tearing apart the gpu to repaste.
Are you sure you didn't
)
Are you sure you didn't change the defaults at the beginning and forget you did?
I have never seen a new BOINC installation have those defaults, never.
Unless something has been changed in the code and I am not aware of that.
I regularly read the commits and merges at the BOINC github repository and I don't remember reading anything about changing the client to default to 0.01 days of work and 0.01 days of additional work.
I'll have to visit the site again and do a search for this I guess.
... didn't want to "create"
)
... didn't want to "create" work for you ...
Maybe I'm getting to old for this kind of fiddeling around.
I tend to believe what you are saying, sometimes I have trouble remembering the basics ...
What do I conclude out of this?
Well, NO MORE POSTS from me, I guess !
Cheers
Hi, my normal runtime for
)
Hi,
my normal runtime for a task is 124-179 seconds depending on which GPU it is run on.
Tonight I got a task that took 1200+ seconds to finish.
I have never seen a line like any of these:
% C 0 154
% C 0 309
% C 0 463
% C 0 615
% C 0 768
% C 0 921
Are they "candidates" to be verified or processing errors? Why the extra time?
--
petri33
p.s.
Here is a list of the total output of the task:
Task 1191519630
x86_64-pc-linux-gnu
Stderr output
12:12:26 (26533): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.28_x86_64-pc-linux-gnu__FGRPopencl2Pup-nvidia'.
12:12:26 (26533): [debug]: 1e+16 fp, 6.1e+09 fp/s, 1710426 s, 475h07m05s57
12:12:26 (26533): [normal]: % CPU usage: 1.000000, GPU usage: 0.130000
command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.28_x86_64-pc-linux-gnu__FGRPopencl2Pup-nvidia --inputfile ../../projects/einstein.phys.uwm.edu/LATeah4013L01.dat --alpha 0.943218186562 --delta 1.30995332125 --skyRadius 2.617990e-08 --ldiBins 30 --f0start 1116.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.713401242e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah4013L01_1124_9033365.dat --debug 0 -o LATeah4013L01_1124.0_0_0.0_9033365_1_0.out
output files: 'LATeah4013L01_1124.0_0_0.0_9033365_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah4013L01_1124.0_0_0.0_9033365_1_0' 'LATeah4013L01_1124.0_0_0.0_9033365_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah4013L01_1124.0_0_0.0_9033365_1_1'
12:12:26 (26533): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
12:12:26 (26533): [debug]: glibc version/release: 2.31/stable
12:12:26 (26533): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0x2c72ea0 , 0x2c72ca0]
Using OpenCL platform provided by: NVIDIA Corporation
Using OpenCL device "NVIDIA GeForce RTX 2080 Ti" by: NVIDIA Corporation
Max allocation limit: 2888679424
Global mem size: 11554717696
read_checkpoint(): Couldn't open file 'LATeah4013L01_1124.0_0_0.0_9033365_1_0.out.cpt': No such file or directory (2)
% fft length: 16777216 (0x1000000)
% Scratch buffer size: 136314880
% C 0 154
% C 0 309
% C 0 463
% C 0 615
% C 0 768
% C 0 921
FPU status flags:
12:33:30 (26533): [normal]: done. calling boinc_finish(0).
12:33:30 (26533): called boinc_finish(0)
</stderr_txt>
]]>
The % C datapoint is a
)
The % C datapoint is a checkpoint at various stages in the computation. Checkpoints on the faster devices normally go straight through to the end and print a single checkpoint stage
But on the slower devices you get multiple checkpoints. I believe each checkpoint is when BOINC steps or switches away from crunching the task.
For example on my 3080 I got one checkpoint.
Task 1193178428
But on my Raspberry Pi 4 I got dozens of checkpoints.
Task 1198454623
RTX 3080 - memory
)
RTX 3080 - memory temps
CPUID HW Monitor says the memory chips on my 3080 are hitting 108 degrees C, which is not good for the life of the card, of course.
There is much discussion on various sites about replacing the thermal transfer pads between the memory modules and the heat sink - is this something I should have done a while ago? Or can I tweak Einstein@home somehow to reduce the load on the card?
Thanks again
[img]
No, there is not much you can
)
No, there is not much you can tweak on the card other than to declock it so it doesn't work so hard on Einstein tasks.
If you were worried about the memory temps you should have chosen a different card or done as you stated and removed the heat sink and replaced the thermal pads with better quality than the OEM.
Or gone with a water cooled card via a AIO Hybrid or Custom cooling model.
I'm surprised on the temps on the memory as AFAIK the 3080 does not have any memory on the backside of the PCB like the 3090 which also has had these high memory temps on the backside modules. Hadn't seen many reports of high temps on the front side modules.
It might warrant taking the air cooler off the card and check for the fit of the cooler to the die and RAM modules. You should have obvious indents in the pads. There are better quality thermal pads available that have better heat transfer characteristics. I am a fan of FujiPoly pads myself.
Which model of 3080 do you
)
Which model of 3080 do you have? Nvidia Founders Edition? Or some AIB (3rd party) model?
the FE nvidia cards are known for memory temp issues. Even on the 3080.
i have two EVGA 3070Ti cards with GDDR6X memory, and I don’t seem to be having any issues. But I can’t really check memory temps under Linux. My watercooled 3080Ti showed about 60C memory temps when booted into windows under memory intense loads.
_________________________________________________________________________
The machine in question is a
)
The machine in question is a Dell Alienware R11 with liquid cooling, and I use it only for crunching, no mining or gaming at all. The CPU is also running too hot (crunching WCG), so i will change out the thermal paste on that to see if things improve.
This video discusses changing the thermal pads on the GPU, which the author claims is essentially a Founder's Edition: https://www.youtube.com/watch?v=bpmYlk4dnys
Edit: This video is better, it includes the addition of a Noctua case fan: https://www.youtube.com/watch?v=YklybEdoKIM
Wish me luck, the good kind, plz
Thank you
[img]
All the reviews I have read
)
All the reviews I have read of that case and system say that it runs hot and loud.
Crappy case that restricts air flow.
You should move the components to a better case that allows the components to shed the heat outside the case if you are tearing apart the gpu to repaste.
That YT video was for the
)
That YT video was for the year later R12 version. Hope your R11 build is identical for your modifications.