putenv 'LAL_DEBUG_LEVEL=3'
2019-09-21 07:17:06.0394 (41792) [normal]: This program is published under the GNU General Public License, version 2
2019-09-21 07:17:06.0554 (41792) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2019-09-21 07:17:06.2294 (41792) [normal]: This Einstein@home App was built at: Apr 5 2018 14:15:53
2019-09-21 07:17:06.2464 (41792) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O2AS20-500_1.01_windows_x86_64.exe'.
Activated exception handling...
2019-09-21 07:17:06.2784 (41792) [debug]: BSGL output files
2019-09-21 07:17:06.2954 (41792) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2019-09-21 07:17:06.3184 (41792) [debug]: Set up communication with graphics process.
DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.18.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALPulsar: 1.16.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALApps: 6.21.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
2019-09-21 07:17:09.4986 (41792) [normal]: Reading input data ... 2019-09-21 07:18:10.6433 (41792) [normal]: Search FstatMethod used: 'ResampGeneric'
2019-09-21 07:18:10.7323 (41792) [normal]: Recalc FstatMethod used: 'DemodSSE'
2019-09-21 07:18:28.7082 (41792) [normal]: Number of segments: 64, total number of SFTs in segments: 10190
done.
% --- GPS reference time = 1177858472.0000 , GPS data mid time = 1177858472.0000
2019-09-21 07:18:28.8072 (41792) [normal]: dFreqStack = 3.340013e-006, df1dot = 1.637397e-010, df2dot = 0.000000e+000, df3dot = 0.000000e+000
% --- Setup, N = 64, T = 216000 s, Tobs = 19750204 s, gammaRefine = 500, gamma2Refine = 28226, gamma3Refine = 1
DEPRECATION WARNING: program has invoked obsolete function InitDopplerSkyScan(). Please see XLALInitDopplerSkyScan() for information about a replacement.
2019-09-21 07:18:38.0341 (41792) [normal]: INFO: No checkpoint checkpoint.cpt found - starting from scratch
% --- Cpt:0, total:13110, sky:1/690, f1dot:1/19
My bad. I found the right file and am posting it below:
putenv 'LAL_DEBUG_LEVEL=3'
2019-09-15 23:15:51.7006 (45952) [normal]: This program is published under the GNU General Public License, version 2
2019-09-15 23:15:51.7066 (45952) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2019-09-15 23:15:51.7086 (45952) [normal]: This Einstein@home App was built at: Apr 5 2018 14:15:53
2019-09-15 23:15:51.7126 (45952) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O2AS20-500_1.01_windows_x86_64.exe'.
Activated exception handling...
2019-09-15 23:15:51.7216 (45952) [debug]: BSGL output files
2019-09-15 23:15:51.7366 (45952) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2019-09-15 23:15:51.7496 (45952) [debug]: Set up communication with graphics process.
DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.18.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALPulsar: 1.16.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALApps: 6.21.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
2019-09-15 23:15:55.9306 (45952) [normal]: Reading input data ... 2019-09-15 23:18:13.3769 (45952) [normal]: Search FstatMethod used: 'ResampGeneric'
2019-09-15 23:18:13.3779 (45952) [normal]: Recalc FstatMethod used: 'DemodSSE'
2019-09-15 23:18:32.3088 (45952) [normal]: Number of segments: 64, total number of SFTs in segments: 10190
done.
% --- GPS reference time = 1177858472.0000 , GPS data mid time = 1177858472.0000
2019-09-15 23:18:32.3988 (45952) [normal]: dFreqStack = 3.340013e-006, df1dot = 1.637397e-010, df2dot = 0.000000e+000, df3dot = 0.000000e+000
% --- Setup, N = 64, T = 216000 s, Tobs = 19750204 s, gammaRefine = 500, gamma2Refine = 28226, gamma3Refine = 1
DEPRECATION WARNING: program has invoked obsolete function InitDopplerSkyScan(). Please see XLALInitDopplerSkyScan() for information about a replacement.
putenv 'LAL_DEBUG_LEVEL=3'
2019-09-16 07:13:57.0566 (10348) [normal]: This program is published under the GNU General Public License, version 2
2019-09-16 07:13:57.0566 (10348) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2019-09-16 07:13:57.0722 (10348) [normal]: This Einstein@home App was built at: Apr 5 2018 14:15:53
2019-09-16 07:13:57.0722 (10348) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O2AS20-500_1.01_windows_x86_64.exe'.
Activated exception handling...
2019-09-16 07:13:57.0878 (10348) [debug]: BSGL output files
2019-09-16 07:13:59.5567 (10348) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2019-09-16 07:13:59.5567 (10348) [debug]: Set up communication with graphics process.
DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.18.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALPulsar: 1.16.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALApps: 6.21.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
2019-09-16 07:14:01.8543 (10348) [normal]: Reading input data ... 2019-09-16 07:15:53.1511 (10348) [normal]: Search FstatMethod used: 'ResampGeneric'
2019-09-16 07:15:53.1511 (10348) [normal]: Recalc FstatMethod used: 'DemodSSE'
2019-09-16 07:16:07.0586 (10348) [normal]: Number of segments: 64, total number of SFTs in segments: 10190
done.
% --- GPS reference time = 1177858472.0000 , GPS data mid time = 1177858472.0000
2019-09-16 07:16:07.0742 (10348) [normal]: dFreqStack = 3.340013e-006, df1dot = 1.637397e-010, df2dot = 0.000000e+000, df3dot = 0.000000e+000
% --- Setup, N = 64, T = 216000 s, Tobs = 19750204 s, gammaRefine = 500, gamma2Refine = 28226, gamma3Refine = 1
DEPRECATION WARNING: program has invoked obsolete function InitDopplerSkyScan(). Please see XLALInitDopplerSkyScan() for information about a replacement.
That's actually very useful. The first text you posted showed the application reaching beyond the startup phase, and embarking on
INFO: Major Windows version: 6
c
.........c
.........
1..c
.............c
....
2........c
which I think is where checkpointing is likely to start.
But the second file you posted never reached that stage. The question for the devs is: why?
Edit - looking at the Gravitational wave results for your computer, roughly half have completed and half have stalled. Someone should look at those too, but it's beyond my knowledge.
Many thanks to Richard for showing you how to look at the task properties and for getting you to post the log information for the science app while it is running.
Bobby Conger wrote:
So how do I write the checkpoint that should have been written, or should I just restart the computer?
Checkpoints are created automatically by the running science application when it reaches particular stages of the computation. It's up to the app to decide when the suitable time has arrived. The user cannot do this manually.
It's quite fortuitous that you posted the two lots of output. The first lot is for a task running normally (it would seem) with checkpoints being created. The rows of dots (.....) show calculation loops being completed and the 'c' usually indicates a point where a checkpoint is created. I'm not sure that every 'c' actually represents a written checkpoint but at least some of them must. The second log you posted has no checkpoints.
If you open each log in separate windows of a text editor and put them side by side on the screen, it makes it very easy to see where the first difference occurs. The two logs are very similar up to a certain point. You should be able to see it very easily. The difference occurs where the 'good' log shows
2019-09-21 07:18:38.0341 (41792) [normal]: INFO: No checkpoint checkpoint.cpt found - starting from scratch
and the 'bad' log shows
putenv 'LAL_DEBUG_LEVEL=3'
This line in the 'bad' log is the very same output text that both logs start with. It seems like the app is re-launching itself. The next line contains a time stamp - and that shows a time approximately 8 hours after the original launch time. In other words, it seems like things got stuck for around 8 hours and then something caused the app to start right back at the beginning. After relaunch, the app produced a repeat set of log output and then got stuck again at the very same point as last time. At no stage was a first checkpoint ever written so for all the time this particular task was supposedly runnning, the 'progress' you were seeing wasn't 'real' - it was just BOINC's 'simulated' progress whilst waiting for the very first checkpoint.
There are no other people that I've noticed complaining about this type of behaviour for this particular app. It seems rather likely to be specific to your computer. I suspect it will need someone with physical access to your machine to work out what might be causing it.
It took me a while to research the logs and compose an answer so I hadn't noticed that Richard had also replied in the meantime.
He raises a point that I was also wondering about - some tasks do finish and some don't, maybe in a 50/50 ratio. Your machine is a rather old dual core (Core 2 Duo). I have no idea if this is possible but could there be a problem with one of the cores? Maybe you need to look into testing the processor cores somehow.
It took me a while to research the logs and compose an answer so I hadn't noticed that Richard had also replied in the meantime.
He raises a point that I was also wondering about - some tasks do finish and some don't, maybe in a 50/50 ratio. Your machine is a rather old dual core (Core 2 Duo). I have no idea if this is possible but could there be a problem with one of the cores? Maybe you need to look into testing the processor cores somehow.
It is possible to force the machine to run on a single core, either in the BIOS, or in the windows environmental settings (CPU =1) if I remember. Setting which core runs a task can be determined using the affinity setting in task manager.
One off my boxes is a Core 2 Duo, which has run the CPU GW task with no issues, so there is something going on. The methods listed above might help you determine if one of the cores is causing the problem.
It is possible to force the machine to run on a single core, either in the BIOS, or in the windows environmental settings (CPU =1) if I remember. Setting which core runs a task can be determined using the affinity setting in task manager.
While Windows Process Explorer allows you to assign a running task to run on only a specified CPU (that is the term the interface uses, not core) the setting does not apply to the next task. So if you want to try running on just one for a day, then the other for a day, you may wish to download and install Process Lasso, which several of us use to set CPU affinities (and also priorities) for various tasks to tune Einstein operation efficiency versus system impact.
So if you want to try running on just one for a day, then the other for a day, you may wish to download and install Process Lasso, which several of us use to set CPU affinities (and also priorities) for various tasks to tune Einstein operation efficiency versus system impact.
I decided to remove and re-install the software, however, I cannot get a project to work on. The website shows me as logged in but I cannot get a project to start over again or back with the same project. Please help
Here is the stderr.txt
)
Here is the stderr.txt information:
2019-09-21 07:17:06.0394 (41792) [normal]: This program is published under the GNU General Public License, version 2
2019-09-21 07:17:06.0554 (41792) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2019-09-21 07:17:06.2294 (41792) [normal]: This Einstein@home App was built at: Apr 5 2018 14:15:53
Activated exception handling...
2019-09-21 07:17:06.2784 (41792) [debug]: BSGL output files
2019-09-21 07:17:06.2954 (41792) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2019-09-21 07:17:06.3184 (41792) [debug]: Set up communication with graphics process.
Code-version: %% LAL: 6.18.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALPulsar: 1.16.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALApps: 6.21.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
2019-09-21 07:18:10.7323 (41792) [normal]: Recalc FstatMethod used: 'DemodSSE'
2019-09-21 07:18:28.7082 (41792) [normal]: Number of segments: 64, total number of SFTs in segments: 10190
done.
% --- GPS reference time = 1177858472.0000 , GPS data mid time = 1177858472.0000
2019-09-21 07:18:28.8072 (41792) [normal]: dFreqStack = 3.340013e-006, df1dot = 1.637397e-010, df2dot = 0.000000e+000, df3dot = 0.000000e+000
% --- Setup, N = 64, T = 216000 s, Tobs = 19750204 s, gammaRefine = 500, gamma2Refine = 28226, gamma3Refine = 1
2019-09-21 07:18:38.0341 (41792) [normal]: INFO: No checkpoint checkpoint.cpt found - starting from scratch
% --- Cpt:0, total:13110, sky:1/690, f1dot:1/19
INFO: Major Windows version: 6
c
.........c
.........
1..c
.............c
....
2........c
...........
3.c
..............c
....
4.........c
..........
5...c
..............c
..
6..........c
.........
7...c
..........c
......
8....c
............c
...
9........c
...........
10.c
............c
......
11.....c
...........c
...
12.........c
..........
13.c
.............c
.....
14.....c
............c
..
15...........c
........
16.....c
..............c
......
18...c
..........c
......
19....c
............c
...
20........c
...........
21.c
.............c
.....
22......c
.............c
........
24...c
............c
....
25........c
..........c
.
26..........c
.........
27.c
............c
......
28.....c
...........c
...
29.........c
..........
30.c
............c
......
31.....c
............c
..
32.........c
..........
33.c
............c
......
34....c
............c
...
35.......c
............c
......
37......c
.............
38.c
............c
......
39......c
.............c
.......
41......c
.............c
......
43.....c
..............c
.......
45......c
.............
46.c
.............c
.....
47........c
...........
48.c
..............c
....
49.........c
..........
50...c
..............c
..
51.......c
........c
....
52...c
........c
........c
.........c
...
54......c
.........c
....
55....c
........c
.......
56.c
........c
........c
..
57.....c
........c
......
58.c
.........c
.........
59...c
............c
....
60.........c
..........
61...c
..............c
..
62..........c
.........
63....c
.........c
......
64.c
........c
........c
..
65.....c
........c
......
66.c
........c
.........c
.
67......c
........c
.....
68..c
........c
........c
.
69......c
........c
.....
70..c
........c
........c
.
71......c
........c
.....
72..c
..........c
.......
73...c
...........c
.....
74.....c
........c
......
75..c
.........c
........c
........c
....
77....c
.........c
......
78...c
..........c
......
79...c
..........c
......
80..c
..........c
.......
81.c
..........c
........
82.c
.........c
........c
.
83......c
........c
.....
84..c
........c
........c
.
85......c
........c
.....
86..c
........c
........c
.
87......c
........c
.....
88...c
...........c
.....
89.....c
...........c
...
90......c
........c
.....
91..c
........c
........c
.
92......c
........c
.....
93..c
.........c
........c
........c
....
95......c
............c
.
96.........c
..........
97.c
........c
........c
..
98.....c
........c
......
99.c
........c
........c
..
100.....c
........c
......
101.c
........c
.........c
.
102......c
........c
.....
103..c
........c
........c
.
104......c
.......c
......
105.c
.........c
........c
.
106......c
........c
.....
107..c
........c
........c
.
108......c
........c
.....
109...c
........c
........c
.......c
.....
111..c
........c
.........
112.c
..........c
........
113.c
..........c
........
114.c
.........c
.........
115.c
............c
......
116......c
.............c
........
118.....c
..............c
......
120......c
.............c
.......
122.....c
.............c
.
123...
My bad. I found the right
)
My bad. I found the right file and am posting it below:
2019-09-15 23:15:51.7006 (45952) [normal]: This program is published under the GNU General Public License, version 2
2019-09-15 23:15:51.7066 (45952) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2019-09-15 23:15:51.7086 (45952) [normal]: This Einstein@home App was built at: Apr 5 2018 14:15:53
Activated exception handling...
2019-09-15 23:15:51.7216 (45952) [debug]: BSGL output files
2019-09-15 23:15:51.7366 (45952) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2019-09-15 23:15:51.7496 (45952) [debug]: Set up communication with graphics process.
Code-version: %% LAL: 6.18.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALPulsar: 1.16.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALApps: 6.21.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
2019-09-15 23:18:13.3779 (45952) [normal]: Recalc FstatMethod used: 'DemodSSE'
2019-09-15 23:18:32.3088 (45952) [normal]: Number of segments: 64, total number of SFTs in segments: 10190
done.
% --- GPS reference time = 1177858472.0000 , GPS data mid time = 1177858472.0000
2019-09-15 23:18:32.3988 (45952) [normal]: dFreqStack = 3.340013e-006, df1dot = 1.637397e-010, df2dot = 0.000000e+000, df3dot = 0.000000e+000
% --- Setup, N = 64, T = 216000 s, Tobs = 19750204 s, gammaRefine = 500, gamma2Refine = 28226, gamma3Refine = 1
putenv 'LAL_DEBUG_LEVEL=3'
2019-09-16 07:13:57.0566 (10348) [normal]: This program is published under the GNU General Public License, version 2
2019-09-16 07:13:57.0566 (10348) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2019-09-16 07:13:57.0722 (10348) [normal]: This Einstein@home App was built at: Apr 5 2018 14:15:53
Activated exception handling...
2019-09-16 07:13:57.0878 (10348) [debug]: BSGL output files
2019-09-16 07:13:59.5567 (10348) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2019-09-16 07:13:59.5567 (10348) [debug]: Set up communication with graphics process.
Code-version: %% LAL: 6.18.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALPulsar: 1.16.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
%% LALApps: 6.21.0.1 (CLEAN f9f1c94b0a4ae84fd5e8c6992235dd36200ffd1b)
2019-09-16 07:15:53.1511 (10348) [normal]: Recalc FstatMethod used: 'DemodSSE'
2019-09-16 07:16:07.0586 (10348) [normal]: Number of segments: 64, total number of SFTs in segments: 10190
done.
% --- GPS reference time = 1177858472.0000 , GPS data mid time = 1177858472.0000
2019-09-16 07:16:07.0742 (10348) [normal]: dFreqStack = 3.340013e-006, df1dot = 1.637397e-010, df2dot = 0.000000e+000, df3dot = 0.000000e+000
% --- Setup, N = 64, T = 216000 s, Tobs = 19750204 s, gammaRefine = 500, gamma2Refine = 28226, gamma3Refine = 1
That's actually very useful.
)
That's actually very useful. The first text you posted showed the application reaching beyond the startup phase, and embarking on
which I think is where checkpointing is likely to start.
But the second file you posted never reached that stage. The question for the devs is: why?
Edit - looking at the Gravitational wave results for your computer, roughly half have completed and half have stalled. Someone should look at those too, but it's beyond my knowledge.
Many thanks to Richard for
)
Many thanks to Richard for showing you how to look at the task properties and for getting you to post the log information for the science app while it is running.
Checkpoints are created automatically by the running science application when it reaches particular stages of the computation. It's up to the app to decide when the suitable time has arrived. The user cannot do this manually.
It's quite fortuitous that you posted the two lots of output. The first lot is for a task running normally (it would seem) with checkpoints being created. The rows of dots (.....) show calculation loops being completed and the 'c' usually indicates a point where a checkpoint is created. I'm not sure that every 'c' actually represents a written checkpoint but at least some of them must. The second log you posted has no checkpoints.
If you open each log in separate windows of a text editor and put them side by side on the screen, it makes it very easy to see where the first difference occurs. The two logs are very similar up to a certain point. You should be able to see it very easily. The difference occurs where the 'good' log shows
2019-09-21 07:18:38.0341 (41792) [normal]: INFO: No checkpoint checkpoint.cpt found - starting from scratch
and the 'bad' log shows
putenv 'LAL_DEBUG_LEVEL=3'
This line in the 'bad' log is the very same output text that both logs start with. It seems like the app is re-launching itself. The next line contains a time stamp - and that shows a time approximately 8 hours after the original launch time. In other words, it seems like things got stuck for around 8 hours and then something caused the app to start right back at the beginning. After relaunch, the app produced a repeat set of log output and then got stuck again at the very same point as last time. At no stage was a first checkpoint ever written so for all the time this particular task was supposedly runnning, the 'progress' you were seeing wasn't 'real' - it was just BOINC's 'simulated' progress whilst waiting for the very first checkpoint.
There are no other people that I've noticed complaining about this type of behaviour for this particular app. It seems rather likely to be specific to your computer. I suspect it will need someone with physical access to your machine to work out what might be causing it.
Cheers,
Gary.
It took me a while to
)
It took me a while to research the logs and compose an answer so I hadn't noticed that Richard had also replied in the meantime.
He raises a point that I was also wondering about - some tasks do finish and some don't, maybe in a 50/50 ratio. Your machine is a rather old dual core (Core 2 Duo). I have no idea if this is possible but could there be a problem with one of the cores? Maybe you need to look into testing the processor cores somehow.
Cheers,
Gary.
Gary Roberts wrote:It took me
)
It is possible to force the machine to run on a single core, either in the BIOS, or in the windows environmental settings (CPU =1) if I remember. Setting which core runs a task can be determined using the affinity setting in task manager.
One off my boxes is a Core 2 Duo, which has run the CPU GW task with no issues, so there is something going on. The methods listed above might help you determine if one of the cores is causing the problem.
Matt White wrote:It is
)
While Windows Process Explorer allows you to assign a running task to run on only a specified CPU (that is the term the interface uses, not core) the setting does not apply to the next task. So if you want to try running on just one for a day, then the other for a day, you may wish to download and install Process Lasso, which several of us use to set CPU affinities (and also priorities) for various tasks to tune Einstein operation efficiency versus system impact.
archae86 wrote:So if you want
)
Didn't know about that one, good tip!
Thanks for the tips. I'll
)
Thanks for the tips. I'll check the core's and do some more plugging around. Thanks again.
I decided to remove and
)
I decided to remove and re-install the software, however, I cannot get a project to work on. The website shows me as logged in but I cannot get a project to start over again or back with the same project. Please help