Crunching time becomes longer and longer

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,229
Credit: 44,580,430,756
RAC: 39,647,132

Rolf wrote:...To solve the

Rolf wrote:
...To solve the problem, you can abort the tasks which means ...

I would be quite interested to know how you "solve" something by throwing all the evidence away.  What would you then suggest if after throwing everything away and getting a whole bunch of new stuff, the same thing happens again?

The CPU version of the O2AS20-500 GW app has been around for a long time and is regarded by the Devs as 'behaving properly and giving the correct answers'.  Lots of people have been running it for quite a long time.  There is little evidence of any problem with either the app or the data that is being processed.  The logical course of action is to rule out any local issues with the host in question first before assuming that a project reset would 'fix' the problem.  It's probably quite confusing for the person having the problem to be given totally different courses of action to follow.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,229
Credit: 44,580,430,756
RAC: 39,647,132

vdquang wrote:... copied the

vdquang wrote:
... copied the stderr.txt of all the 4 tasks (at the points when they ran approx. 10 min.). I will try to send them to you later.

Please do that by copying and pasting from those logs into your next message.  Just enclose the snips between code tags - as shown in the BBCode help if you're not sure how to do it.  Here is an example of what you should see if checkpoints are being created.  Click the "Paste as plain text" button (3rd from far right in menu bar) before pasting into your message.  You only need to copy just enough context to see the end messages during the startup and then the rows of dots (......) signifying calculation loops and the 'c' characters, some of which would be potential places where checkpoints would be saved.

 

......
2019-09-20 01:58:55.7478 (41184) [normal]: Reading input data ... 2019-09-20 02:01:00.1628 (41184) [normal]: Search FstatMethod used: 'ResampGeneric'
2019-09-20 02:01:00.1648 (41184) [normal]: Recalc FstatMethod used: 'DemodSSE'
2019-09-20 02:01:20.5606 (41184) [normal]: Number of segments: 64, total number of SFTs in segments: 10190
done.
% --- GPS reference time = 1177858472.0000 , GPS data mid time = 1177858472.0000
2019-09-20 02:01:20.5856 (41184) [normal]: dFreqStack = 3.340013e-006, df1dot = 1.637397e-010, df2dot = 0.000000e+000, df3dot = 0.000000e+000
% --- Setup, N = 64, T = 216000 s, Tobs = 19750204 s, gammaRefine = 500, gamma2Refine = 28226, gamma3Refine = 1

DEPRECATION WARNING: program has invoked obsolete function InitDopplerSkyScan(). Please see XLALInitDopplerSkyScan() for information about a replacement.
2019-09-20 02:01:33.8965 (41184) [normal]: INFO: No checkpoint checkpoint.cpt found - starting from scratch
% --- Cpt:0, total:13110, sky:1/690, f1dot:1/19

0.% --- CG:989248 FG:14971 f1dotmin_fg:-2.724189077486e-009 df1dot_fg:3.268256487026e-013 f2dotmin_fg:0 df2dot_fg:0 f3dotmin_fg:0 df3dot_fg:1
INFO: Major Windows version: 6
c
.......c
........c
...
1......c
.......c
......
2.c
.........c
.........
3.c
...........c
.......
4.c
...........c
.......
5...c

Cheers,
Gary.

vdquang
vdquang
Joined: 2 Mar 06
Posts: 10
Credit: 1,624,935
RAC: 673

Thank you, Gary. Now I make

Thank you, Gary.

Now I make copies of the contents of the stderr.txt files and paste them into this message. I also include here the tasks' properties (I have retyped words from pictures of the tasks' properties).

(Sorry, I could not paste the texts as plain texts, because the 'Paste as plain text' button did not work)


Task 1: properties

Properties of task h1_0584.10_O2C02Cl1In0_O2AS20-500_584.25Hz_1953_0
Application Continuous Gravitational Wave search O2 All-sky 1.01
Name h1_0584.10_O2C02Cl1In0_O2AS20-500_584.25Hz_1953
State Waiting to run
Received 19-Sep-19 21:53:46
Report deadline 03-Oct-19 21:53:46
Estimated computational size 144,000 GFLOPs
CPU time ---
CPU time since checkpoint ---
Elapsed time 00:10:16
Estimated time remaining 11:10:48
Fraction done 1.686%
Virtual memory size 20.18 MB
Working set size 18.87 MB
Directory slots/0
Process ID 1536
Progress rate 9.720% per hour
Executable einstein_O2AS20-500_1.01_windows_x86_64.exe

--------------------

Task 1: stderr.txt

putenv 'LAL_DEBUG_LEVEL=3'
2019-09-22 12:00:05.0948 (5636) [normal]: This program is published under the GNU General Public License, version 2
2019-09-22 12:00:05.0958 (5636) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2019-09-22 12:00:05.0968 (5636) [normal]: This Einstein@home App was built at: Apr 5 2018 14:15:53

2019-09-22 12:00:05.0968 (5636) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O2AS20-500_1.01_windows_x86_64.exe'.
Activated exception handling...
2019-09-22 12:00:05.0998 (5636) [debug]: BSGL output files
2019-09-22 12:00:05.1018 (5636) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2019-09-22 12:00:05.1038 (5636) [debug]: Set up communication with graphics process.

DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.18.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALPulsar: 1.16.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALApps: 6.21.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)

2019-09-22 12:00:06.8329 (5636) [normal]: Reading input data ... 2019-09-22 12:00:27.8821 (5636) [normal]: Search FstatMethod used: 'ResampGeneric'
2019-09-22 12:00:27.8821 (5636) [normal]: Recalc FstatMethod used: 'DemodSSE'
2019-09-22 12:00:32.1714 (5636) [normal]: Number of segments: 64, total number of SFTs in segments: 10190
done.
% --- GPS reference time = 1177858472.0000 , GPS data mid time = 1177858472.0000
2019-09-22 12:00:32.1844 (5636) [normal]: dFreqStack = 3.340013e-006, df1dot = 1.637397e-010, df2dot = 0.000000e+000, df3dot = 0.000000e+000
% --- Setup, N = 64, T = 216000 s, Tobs = 19750204 s, gammaRefine = 500, gamma2Refine = 28226, gamma3Refine = 1

DEPRECATION WARNING: program has invoked obsolete function InitDopplerSkyScan(). Please see XLALInitDopplerSkyScan() for information about a replacement.
putenv 'LAL_DEBUG_LEVEL=3'
2019-09-24 15:20:53.2721 (1536) [normal]: This program is published under the GNU General Public License, version 2
2019-09-24 15:20:53.2877 (1536) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2019-09-24 15:20:53.2877 (1536) [normal]: This Einstein@home App was built at: Apr 5 2018 14:15:53

2019-09-24 15:20:53.2877 (1536) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O2AS20-500_1.01_windows_x86_64.exe'.
Activated exception handling...
2019-09-24 15:20:53.2877 (1536) [debug]: BSGL output files
2019-09-24 15:20:53.2877 (1536) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2019-09-24 15:20:53.2877 (1536) [debug]: Set up communication with graphics process.

DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.18.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALPulsar: 1.16.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALApps: 6.21.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)


Task 2: properties

Properties of task h1_0584.65_O2C02Cl1In0_O2AS20-500_584.80Hz_1267_0
Application Continuous Gravitational Wave search O2 All-sky 1.01
Name h1_0584.65_O2C02Cl1In0_O2AS20-500_584.80Hz_1267
State Waiting to run
Received 24-Sep-19 15:11:44
Report deadline 08-Oct-19 15:11:43
Estimated computational size 144,000 GFLOPs
CPU time 00:10:01
CPU time since checkpoint 00:02:24
Elapsed time 00:10:06
Estimated time remaining 11:11:23
Fraction done 0.670%
Virtual memory size 315.57 MB
Working set size 317.60 MB
Directory slots/6
Process ID 1936
Progress rate 10.080% per hour
Executable einstein_O2AS20-500_1.01_windows_x86_64.exe

--------------------

Task 2: stderr.txt

putenv 'LAL_DEBUG_LEVEL=3'
2019-09-24 15:30:00.4933 (1936) [normal]: This program is published under the GNU General Public License, version 2
2019-09-24 15:30:00.5089 (1936) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2019-09-24 15:30:00.5089 (1936) [normal]: This Einstein@home App was built at: Apr 5 2018 14:15:53

2019-09-24 15:30:00.5089 (1936) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O2AS20-500_1.01_windows_x86_64.exe'.
Activated exception handling...
2019-09-24 15:30:00.5245 (1936) [debug]: BSGL output files
2019-09-24 15:30:00.5245 (1936) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2019-09-24 15:30:00.5245 (1936) [debug]: Set up communication with graphics process.

DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.18.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALPulsar: 1.16.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALApps: 6.21.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)

2019-09-24 15:30:02.2561 (1936) [normal]: Reading input data ... 2019-09-24 15:30:19.5098 (1936) [normal]: Search FstatMethod used: 'ResampGeneric'
2019-09-24 15:30:19.5098 (1936) [normal]: Recalc FstatMethod used: 'DemodSSE'
2019-09-24 15:30:23.4878 (1936) [normal]: Number of segments: 64, total number of SFTs in segments: 10190
done.
% --- GPS reference time = 1177858472.0000 , GPS data mid time = 1177858472.0000
2019-09-24 15:30:23.5034 (1936) [normal]: dFreqStack = 3.340013e-006, df1dot = 1.637397e-010, df2dot = 0.000000e+000, df3dot = 0.000000e+000
% --- Setup, N = 64, T = 216000 s, Tobs = 19750204 s, gammaRefine = 500, gamma2Refine = 28226, gamma3Refine = 1

DEPRECATION WARNING: program has invoked obsolete function InitDopplerSkyScan(). Please see XLALInitDopplerSkyScan() for information about a replacement.
2019-09-24 15:30:27.5594 (1936) [normal]: INFO: No checkpoint checkpoint.cpt found - starting from scratch
% --- Cpt:0, total:13110, sky:1/690, f1dot:1/19

0.% --- CG:989248 FG:14971 f1dotmin_fg:-2.724189077486e-009 df1dot_fg:3.268256487026e-013 f2dotmin_fg:0 df2dot_fg:0 f3dotmin_fg:0 df3dot_fg:1
..................
1...................
2...................
3.........INFO: Major Windows version: 6
c
..........
4...................
5...................
6...................
7.....c
..............
8.............c
......
9...................
10...................
11............


Task 3: properties

Properties of task h1_0584.65_O2C02Cl1In0_O2AS20-500_584.80Hz_1268_0
Application Continuous Gravitational Wave search O2 All-sky 1.01
Name h1_0584.65_O2C02Cl1In0_O2AS20-500_584.80Hz_1268
State Waiting to run
Received 24-Sep-19 15:11:44
Report deadline 08-Oct-19 15:11:43
Estimated computational size 144,000 GFLOPs
CPU time 00:10:09
CPU time since checkpoint 00:02:45
Elapsed time 00:10:18
Estimated time remaining 11:11:17
Fraction done 0.686%
Virtual memory size 320.90 MB
Working set size 317.16 MB
Directory slots/2
Process ID 4348
Progress rate 10.440% per hour
Executable einstein_O2AS20-500_1.01_windows_x86_64.exe

--------------------

Task 3: stderr.txt

putenv 'LAL_DEBUG_LEVEL=3'
2019-09-24 15:20:53.8181 (4348) [normal]: This program is published under the GNU General Public License, version 2
2019-09-24 15:20:53.8337 (4348) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2019-09-24 15:20:53.8337 (4348) [normal]: This Einstein@home App was built at: Apr 5 2018 14:15:53

2019-09-24 15:20:53.8337 (4348) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O2AS20-500_1.01_windows_x86_64.exe'.
Activated exception handling...
2019-09-24 15:20:53.8337 (4348) [debug]: BSGL output files
2019-09-24 15:20:53.8337 (4348) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2019-09-24 15:20:53.8337 (4348) [debug]: Set up communication with graphics process.

DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.18.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALPulsar: 1.16.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALApps: 6.21.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)

2019-09-24 15:20:55.5497 (4348) [normal]: Reading input data ... 2019-09-24 15:21:12.5849 (4348) [normal]: Search FstatMethod used: 'ResampGeneric'
2019-09-24 15:21:12.5849 (4348) [normal]: Recalc FstatMethod used: 'DemodSSE'
2019-09-24 15:21:16.5785 (4348) [normal]: Number of segments: 64, total number of SFTs in segments: 10190
done.
% --- GPS reference time = 1177858472.0000 , GPS data mid time = 1177858472.0000
2019-09-24 15:21:16.5941 (4348) [normal]: dFreqStack = 3.340013e-006, df1dot = 1.637397e-010, df2dot = 0.000000e+000, df3dot = 0.000000e+000
% --- Setup, N = 64, T = 216000 s, Tobs = 19750204 s, gammaRefine = 500, gamma2Refine = 28226, gamma3Refine = 1

DEPRECATION WARNING: program has invoked obsolete function InitDopplerSkyScan(). Please see XLALInitDopplerSkyScan() for information about a replacement.
2019-09-24 15:21:20.6657 (4348) [normal]: INFO: No checkpoint checkpoint.cpt found - starting from scratch
% --- Cpt:0, total:13110, sky:1/690, f1dot:1/19

0.% --- CG:989248 FG:14971 f1dotmin_fg:-2.724189077486e-009 df1dot_fg:3.268256487026e-013 f2dotmin_fg:0 df2dot_fg:0 f3dotmin_fg:0 df3dot_fg:1
..................
1...................
2...................
3.........INFO: Major Windows version: 6
c
..........
4...................
5...................
6...................
7.....c
.......c
.......
8......c
.............
9...................
10...................
11.............


Task 4: properties

Properties of task h1_0584.65_O2C02Cl1In0_O2AS20-500_584.80Hz_1269_0
Application Continuous Gravitational Wave search O2 All-sky 1.01
Name h1_0584.65_O2C02Cl1In0_O2AS20-500_584.80Hz_1269
State Waiting to run
Received 24-Sep-19 15:11:44
Report deadline 08-Oct-19 15:11:43
Estimated computational size 144,000 GFLOPs
CPU time ---
CPU time since checkpoint ---
Elapsed time 00:10:10
Estimated time remaining 11:10:51
Fraction done 1.670%
Virtual memory size 20.18 MB
Working set size 16.81 MB
Directory slots/5
Process ID 4960
Progress rate 9.720% per hour
Executable einstein_O2AS20-500_1.01_windows_x86_64.exe

--------------------

Task 4: stderr.txt

putenv 'LAL_DEBUG_LEVEL=3'
2019-09-24 15:30:00.4465 (4960) [normal]: This program is published under the GNU General Public License, version 2
2019-09-24 15:30:00.4621 (4960) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2019-09-24 15:30:00.4621 (4960) [normal]: This Einstein@home App was built at: Apr 5 2018 14:15:53

2019-09-24 15:30:00.4621 (4960) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O2AS20-500_1.01_windows_x86_64.exe'.
Activated exception handling...
2019-09-24 15:30:00.5245 (4960) [debug]: BSGL output files
2019-09-24 15:30:00.5245 (4960) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2019-09-24 15:30:00.5245 (4960) [debug]: Set up communication with graphics process.

DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.18.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALPulsar: 1.16.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALApps: 6.21.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)


 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,229
Credit: 44,580,430,756
RAC: 39,647,132

vdquang wrote:Now I make

vdquang wrote:
Now I make copies of the contents of the stderr.txt files and paste them into this message. I also include here the tasks' properties (I have retyped words from pictures of the tasks' properties).

Thank you for doing that.

Firstly, from the 4 sets of properties, you can see that two (tasks 1 & 4) have the entry, "CPU time since checkpoint ---" which tells you immediately that no checkpoint has been written.  As soon as a checkpoint does get written, the --- entry is replaced with an actual time for how long ago the checkpoint was written.  Tasks 2 & 3 show such time entries.

Secondly, the stderr.txt logs show the startup of all 4 tasks, with quite a few lines of "initialization" type messages which can be ignored.  To know that crunching has actually started, you need to see some lines like

2019-09-24 15:30:27.5594 (1936) [normal]: INFO: No checkpoint checkpoint.cpt found - starting from scratch
% --- Cpt:0, total:13110, sky:1/690, f1dot:1/19

0.% --- CG:989248 FG:14971 f1dotmin_fg:-2.724189077486e-009 df1dot_fg:3.268256487026e-013 f2dotmin_fg:0 df2dot_fg:0 f3dotmin_fg:0 df3dot_fg:1

Before a task actually starts to crunch, there has to be a check to see if a previous checkpoint happens to exist - just in case there is one.  After all, if you are restarting the computer after turning it off yesterday, you do want to use the previous progress that has been saved.  In the above example there was none.  You would see different wording if a previous checkpoint had been found.

So, from the logs, you get the same picture as from the properties entries, but with the extra information that the full initialization hadn't completed for tasks 1 & 4.  For those two, there should have been more lines prior to where the test for a checkpoint file would have appeared.  Also if you look carefully at the log for task 1, there was a 'double' initialization. with a gap of over 2 days between them.  The very first line of a startup is, "putenv 'LAL_DEBUG_LEVEL=3'" and you see this entry twice for task 1 with timestamps showing the time difference.  Task 1 is obviously the task you were talking about when you first started this thread and the complete stop and restart 2 days later made no real difference to the problem, apart from the fact that somewhat different points in the initialization stage were reached in the two different startup attempts.

I have no idea what is causing the problem, but now that there are two different examples for dual core hosts which seem to show 1 core working OK whilst the other makes no progress even after a complete restart, I'll send a message to the Devs and ask them to look at both reports about this.  They are always busy so it might take a while for any information to come.

vdquang wrote:
Sorry, I could not paste the texts as plain texts, because the 'Paste as plain text' button did not work

Probably it "did not work" because you weren't using code tags so there was nothing for that button to do.  What you pasted just appeared as if you had been typing it at the keyboard rather than pasting.  I like using code tags because it uses a fixed width font (that can easily be resized) which makes it very obvious as to what was typed at the keyboard and what was pasted from some sort of log file.  The fixed width font is very helpful if you are trying to make data columns appear nicely formatted and easy for the reader to interpret.  In this case, the log file was essentially just quite readable text so there was no real advantage to be had from placing the text between code tags so I perhaps should have just allowed you to do whatever you wanted :-).

Cheers,
Gary.

vdquang
vdquang
Joined: 2 Mar 06
Posts: 10
Credit: 1,624,935
RAC: 673

Gary Roberts wrote:I have no

Gary Roberts wrote:
I have no idea what is causing the problem, but now that there are two different examples for dual core hosts which seem to show 1 core working OK whilst the other makes no progress even after a complete restart...

I decided to check whether the dual core of my computer or the work units of tasks 1 & 4 got trouble.

I shutdown the computer and let it rest for a few hours then restated it. I suspended other BOINC applications and ran the E@H tasks 2 & 3 (the 'good' ones). After they ran for a while, I checked the properties, viewed the stderr.txt files of the tasks 2 & 3 and found there was no problem with these tasks.  I suspended the tasks 2 & 3 and let the tasks 1 & 4 (the 'bad' ones) run. Surprisingly, both the 'bad' tasks 1 & 4 were smoothly running!!! In the tab of properties, the 'CPU time' and the 'elapsed time' were continuously increasing, the time period of the 'CPU time since checkpoint' being about 3 min. and 12-13 sec. As for the stderr.txt files, at the bottom lines there appeared numbers 1, 2, 3... with dots. At later checks, the numbers (figures) with dots were increasing to 5, 6.. and then 11, 12, 13... and so on. It is so perfectly!

I think the recent bad things of running E@H tasks were related to my computer that ran uninterruptedly for few days so it become hot and 'tired'. The situation was not improved even after computer restating because it was not cool enough.

From this pot topic I have learnt a good lesson that, while running BOINC tasks, I should take time to pay attention to the tab 'properties', especially 'CPU time since checkpoint' of this tab.

Matt White
Matt White
Joined: 9 Jul 19
Posts: 114
Credit: 120,661,850
RAC: 3,015

vdquang wrote:I think the

vdquang wrote:

I think the recent bad things of running E@H tasks were related to my computer that ran uninterruptedly for few days so it become hot and 'tired'. The situation was not improved even after computer restating because it was not cool enough.

From this pot topic I have learnt a good lesson that, while running BOINC tasks, I should take time to pay attention to the tab 'properties', especially 'CPU time since checkpoint' of this tab.

All things being equal, you should be able to run your box 7/24/365 crunching without issues, providing there is sufficient cooling for the CPU. This is something which may not be readily apparent even if you are monitoring the CPU temperatures. A few people here have noted large discrepancies between indicated and actual temp readings. With this in mind, there are a couple of things I would check:

1. Is the CPU heatsink clean and free of dust?

2. Are the case and CPU fans clear of dust and are they in good order?

3. Are there any fan shrouds missing in the case?

If all of these items have been checked and are okay, I would consider inspecting the thermal compound between the CPU heatsink and the CPU itself. Before doing this, you should pick up a tube of thermal compound. I have a few HP XW4600 boxes with Intel Core 2 Duo processors. These boxes are about 10 years old. Inspecting the processor heatsinks, I found that the thermal compound had dried out. Replacing the compound is easy, and does not require one to remove the CPU itself. (There should be a separate locking bar holding the CPU heatsink in place.) NOTE: be careful with the thermal compound, it is toxic.

Another note on cooling, your box should be placed where it has free airflow, front and rear. If the BIOS has a fan speed idle setting, you may wish to bump it up a notch or two from the default setting.

This is all basic stuff, but I didn't see it mentioned earlier, so I thought I'd bring it up. 

Clear skies,
Matt
cecht
cecht
Joined: 7 Mar 18
Posts: 717
Credit: 791,653,706
RAC: 578,740

Matt White wrote:...If all of

Matt White wrote:
...If all of these items have been checked and are okay, I would consider inspecting the thermal compound between the CPU heatsink and the CPU itself. Before doing this, you should pick up a tube of thermal compound. I have a few HP XW4600 boxes with Intel Core 2 Duo processors. These boxes are about 10 years old. Inspecting the processor heatsinks, I found that the thermal compound had dried out. Replacing the compound is easy, and does not require one to remove the CPU itself. (There should be a separate locking bar holding the CPU heatsink in place.) NOTE: be careful with the thermal compound, it is toxic.

I second this. I did this very thing just yesterday on my 9-year-old box and saw a 10+ C drop in CPU temps and lower CPU fan speeds.

My case also has foam air filters which had become clogged with dust. Cleaning those improved air flow and allowed me to lower GPU fan speeds.

 

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,229
Credit: 44,580,430,756
RAC: 39,647,132

vdquang wrote:I decided to

vdquang wrote:
I decided to check whether the dual core of my computer or the work units of tasks 1 & 4 got trouble.

Thank you very much for being resourceful and solving your problem with some smart thinking.  As Matt and cecht have commented, internal cleanliness is very important when you are putting these heavy crunching loads on your computer.  You have just proved to yourself that you need to check for internal blockages or fan/filter problems that are allowing the excess heat to be created.  CPUs don't become 'tired' but bad things do happen if the internals are not properly cooled.  You need to check the whole cooling system.  Follow the advice that Matt and cecht have given.

Cheers,
Gary.

vdquang
vdquang
Joined: 2 Mar 06
Posts: 10
Credit: 1,624,935
RAC: 673

Thanks to you all.

Thanks to you all. I have really left my computer without cleaning for a couple of year. I'll take care of this.

robl
robl
Joined: 2 Jan 13
Posts: 1,639
Credit: 1,119,006,983
RAC: 696,050

Continuing this thread along

Continuing this thread along the lines of keeping your PCs clean I have been forced to shutdown one large tower because of the enormous amount of heat that it and another tower generate.  The first tower had been cleaned about 2 months ago but with the current ambient air temps the AC was being taxed.  I decided to shutdown the 2nd tower and look at its condition not too bad but I did do some vacuuming/blowing etc to get better air flow.  I also noticed that the CPU fan mount while attached had lost adhesion so time to get some of "that stuff" to ensure better heat transfer.  Now for the shocker.  I thought I would clean the 3rd tower.  It had been quite a while.  What became immediately noticeable is that one of the large fans was "frozen", i.e. unable to move.  Fortunately I had a spare and replaced it.  I also did some additional cleaning.  The point:  for now I am limited to one tower because of the heat it generates along with the house ambient air temps.  Eventually when winter (I live in Florida) comes I will be able to power up the other two towers but not for now.  One tower will have to do.  Second point:  check your fans and keep the equipment clean.  

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.