Einstein jobs frequently restart without progress

mopel
mopel
Joined: 2 Sep 05
Posts: 6
Credit: 21991221
RAC: 0
Topic 193520

Einstein/Boinc is running some years without problems, till my mainboard started making errors and finally crahed windows xp completely. Ok, mainboard has been replaced, dual core CPU upgraded... - then I was able to bring back XP finally (somehow). Maybe the XP still has some hidden issues, but I'm not aware of any more yet.

Boinc was coninuing fine at first, fetching new work and completing previously started tasks... - but then it seems that Boinc is frequently restarting Einstein tasks without any reason, and without any visible progress.

I have installed a newer Boinc release (5.10.30), but no luck. Any idea whats going wrong here?

Many thx

18-Feb-2008 12:53:45 [---] Starting BOINC client version 5.10.30 for windows_intelx86
18-Feb-2008 12:53:45 [---] log flags: task, file_xfer, sched_ops
18-Feb-2008 12:53:45 [---] Libraries: libcurl/7.17.1 OpenSSL/0.9.8e zlib/1.2.3
18-Feb-2008 12:53:45 [---] Data directory: C:\\Programme\\BOINC
18-Feb-2008 12:53:46 [---] Processor: 2 AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ [x86 Family 15 Model 43 Stepping 1]
18-Feb-2008 12:53:46 [---] Processor features: fpu tsc pae nx sse sse2 3dnow mmx
18-Feb-2008 12:53:46 [---] OS: Microsoft Windows XP: Home Edition, Service Pack 2, (05.01.2600.00)
18-Feb-2008 12:53:46 [---] Memory: 2.00 GB physical, 2.35 GB virtual
18-Feb-2008 12:53:46 [---] Disk: 17.58 GB total, 6.64 GB free
18-Feb-2008 12:53:46 [---] Local time is UTC +1 hours
18-Feb-2008 12:53:46 [Einstein@Home] URL: http://einstein.phys.uwm.edu/; Computer ID: 403837; location: home; project prefs: default
18-Feb-2008 12:53:46 [---] General prefs: from Einstein@Home (last modified 18-Jan-2007 18:15:51)
18-Feb-2008 12:53:46 [---] Host location: home
18-Feb-2008 12:53:46 [---] General prefs: using separate prefs for home
18-Feb-2008 12:53:46 [---] Reading preferences override file
18-Feb-2008 12:53:46 [---] Preferences limit memory usage when active to 614.24MB
18-Feb-2008 12:53:46 [---] Preferences limit memory usage when idle to 1842.73MB
18-Feb-2008 12:53:46 [---] Preferences limit disk usage to 0.93GB
18-Feb-2008 12:53:47 [Einstein@Home] Restarting task h1_0812.40_S5R3__517_S5R3b_0 using einstein_S5R3 version 426
18-Feb-2008 12:53:48 [Einstein@Home] Restarting task h1_0812.40_S5R3__516_S5R3b_0 using einstein_S5R3 version 426
18-Feb-2008 12:56:56 [Einstein@Home] Restarting task h1_0812.40_S5R3__516_S5R3b_0 using einstein_S5R3 version 426
18-Feb-2008 13:34:44 [Einstein@Home] [error] einstein_S5R3 not responding to screensaver, requesting exit
18-Feb-2008 13:34:48 [Einstein@Home] [error] einstein_S5R3 not responding to screensaver, killing it
18-Feb-2008 13:37:53 [Einstein@Home] Restarting task h1_0812.40_S5R3__517_S5R3b_0 using einstein_S5R3 version 426
18-Feb-2008 13:40:57 [Einstein@Home] Restarting task h1_0812.40_S5R3__516_S5R3b_0 using einstein_S5R3 version 426
18-Feb-2008 13:58:27 [Einstein@Home] Restarting task h1_0812.40_S5R3__517_S5R3b_0 using einstein_S5R3 version 426
18-Feb-2008 13:58:27 [Einstein@Home] Restarting task h1_0812.40_S5R3__517_S5R3b_0 using einstein_S5R3 version 426
18-Feb-2008 14:18:46 [Einstein@Home] Restarting task h1_0812.40_S5R3__517_S5R3b_0 using einstein_S5R3 version 426
18-Feb-2008 14:22:39 [Einstein@Home] [error] einstein_S5R3 not responding to screensaver, requesting exit
18-Feb-2008 14:22:40 [Einstein@Home] Task h1_0812.40_S5R3__517_S5R3b_0 exited with a DLL initialization error.
18-Feb-2008 14:22:40 [Einstein@Home] If this happens repeatedly you may need to reboot your computer.
18-Feb-2008 14:23:19 [Einstein@Home] Restarting task h1_0812.40_S5R3__516_S5R3b_0 using einstein_S5R3 version 426
18-Feb-2008 14:36:38 [Einstein@Home] Restarting task h1_0812.40_S5R3__516_S5R3b_0 using einstein_S5R3 version 426

ded1o1
ded1o1
Joined: 29 Sep 07
Posts: 3
Credit: 10183773
RAC: 0

Einstein jobs frequently restart without progress

Have also noticed Einstein WUs restarting frequently of late, when this happens BOINC seems unable to switch over and process SETI (or any other projects) WUs. After about 20 Einstein restarts processor usage drops to 0% and the only way I have found to fix this is to shut down BOINC then start it up again.

I'm thinking that this probably has something to do with the version 426 upgrade as it never happened under the previous versions.

Reading through the other threads here I see that other users have been experiencing similar issues, it seems that most of the effected users are those that process multiple projects and have their processor usage set to less than 100%. My processor usage was set to 80% as it is summer here, I set it to 100% to see if that helps but it seems to make no difference.

Hopefully someone will come up with as fix soon so we had all better keep a close eye on this board.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 6

Can either of you, or both,

Can either of you, or both, please navigate to your BOINC directory, then to the slots directory and find the slot that Einstein is running from and post the contents of the stderr.txt file that's in there?

ded1o1
ded1o1
Joined: 29 Sep 07
Posts: 3
Credit: 10183773
RAC: 0

RE: Can either of you, or

Message 79305 in response to message 79304

Quote:
Can either of you, or both, please navigate to your BOINC directory, then to the slots directory and find the slot that Einstein is running from and post the contents of the stderr.txt file that's in there?

While the full range of symptoms have yet to appear today (both CPUs stuck on Einstein tasks, no CPU usage and no progress) it does seem that Einstein tasks are restarting frequently. Have pasted a dump of BOINC messages as well as the stderr.txt file you requested in the hope that it may shed some light.

Will keep an eye on things and next time progress halts completely I'll post stderr.txt again.

BOINC Messages
19/02/2008 3:04:52 PM|Einstein@Home|Restarting task h1_0800.50_S5R3__361_S5R3b_0 using einstein_S5R3 version 426
19/02/2008 3:08:11 PM|Einstein@Home|Restarting task h1_0800.50_S5R3__361_S5R3b_0 using einstein_S5R3 version 426
19/02/2008 3:20:12 PM|Einstein@Home|Restarting task h1_0800.50_S5R3__361_S5R3b_0 using einstein_S5R3 version 426
19/02/2008 3:37:30 PM|Einstein@Home|Restarting task h1_0800.50_S5R3__360_S5R3b_0 using einstein_S5R3 version 426
19/02/2008 3:40:51 PM|Einstein@Home|Restarting task h1_0800.50_S5R3__360_S5R3b_0 using einstein_S5R3 version 426
19/02/2008 4:01:45 PM|Einstein@Home|Restarting task h1_0800.50_S5R3__361_S5R3b_0 using einstein_S5R3 version 426

stderr.txt
2008-02-19 08:23:34.7968 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 08:23:34.7968 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 08:23:34.8906 [debug]: Set up communication with graphics process.
2008-02-19 08:23:35.2031 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 08:23:53.0156 [normal]: INFO: Couldn't open checkpoint h1_0800.50_S5R3__361_S5R3b_0_0.cpt
2008-02-19 08:23:53.0156 [debug]: Total skypoints = 1201. Progress: 0,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
1, c
2, 3, 4, c
5, 6, 7, c
8, 9, 2008-02-19 08:31:28.7343 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 08:31:28.7343 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 08:31:28.7343 [debug]: Set up communication with graphics process.
2008-02-19 08:31:29.0468 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 08:31:46.6093 [debug]: Successfully read checkpoint
2008-02-19 08:31:46.6093 [debug]: Total skypoints = 1201. Progress: 8,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
9, c
10, 11, 12, c
13, 14, 15, c
16, 17, 18, c
19, 20, 21, c
22, 23, 24, c
25, 26, 27, c
28, 29, 30, c
31, 32, 33, c
34, 35, 36, c
37, 38, 39, c
40, 41, 42, c
43, 44, 45, c
46, 47, c
48, 49, c
50, 51, c
52, 53, c
54, 55, c
56, 57, c
58, 59, c
60, 61, c
62, 63, c
64, 65, c
66, 67, c
68, 69, c
70, 71, c
72, 73, c
74, 75, c
76, 77, c
78, 79, c
80, 81, c
82, 83, c
84, 85, c
86, 2008-02-19 09:13:44.5625 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 09:13:44.5625 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 09:13:44.5625 [debug]: Set up communication with graphics process.
2008-02-19 09:13:44.8750 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 09:14:02.5625 [debug]: Successfully read checkpoint
2008-02-19 09:14:02.5625 [debug]: Total skypoints = 1201. Progress: 86,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
87, c
88, 89, c
90, 91, c
92, 93, c
94, 95, c
96, 97, c
98, 99, c
100, 101, c
102, 103, c
104, 105, c
106, 107, c
108, 109, c
110, 111, c
112, 113, c
114, 115, c
116, 117, c
118, 119, c
120, 121, c
122, 123, 2008-02-19 09:37:29.6718 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 09:37:29.6718 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 09:37:29.6875 [debug]: Set up communication with graphics process.
2008-02-19 09:37:29.9843 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 09:37:47.5625 [debug]: Successfully read checkpoint
2008-02-19 09:37:47.5625 [debug]: Total skypoints = 1201. Progress: 122,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
123, c
124, 125, c
126, 127, c
128, 129, c
130, 131, c
132, 133, c
134, 135, c
136, 137, c
138, 139, c
140, 141, c
142, 143, c
144, 145, c
146, 147, c
148, 149, 2008-02-19 09:56:21.7343 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 09:56:21.7343 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 09:56:21.7343 [debug]: Set up communication with graphics process.
2008-02-19 09:56:22.0468 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 09:56:39.4687 [debug]: Successfully read checkpoint
2008-02-19 09:56:39.4687 [debug]: Total skypoints = 1201. Progress: 148,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
149, c
150, 151, c
152, 153, c
154, 155, c
156, 157, 2008-02-19 10:05:22.7031 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 10:05:22.7031 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 10:05:22.7187 [debug]: Set up communication with graphics process.
2008-02-19 10:05:23.0312 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 10:05:40.2031 [debug]: Successfully read checkpoint
2008-02-19 10:05:40.2031 [debug]: Total skypoints = 1201. Progress: 156,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
157, c
158, 159, c
160, 161, c
162, 163, c
164, 165, c
166, 167, c
168, 169, c
170, 171, c
172, 173, c
174, 175, c
176, 177, c
178, 179, c
180, 181, c
182, 183, c
184, 185, c
186, 187, c
188, 189, c
190, 191, c
192, 193, c
194, 195, c
196, 197, c
198, 199, c
200, 201, c
202, 203, c
204, 205, c
206, 207, c
208, 209, c
210, 211, c
212, 213, c
214, 215, c
216, 217, c
218, 219, c
220, 221, c
222, 223, c
224, 225, c
226, 227, c
228, 229, c
230, 231, c
232, 233, c
234, 235, c
236, 237, c
238, 239, c
240, 241, c
242, 243, c
244, 245, c
246, 247, c
248, 249, c
250, 251, c
252, 2008-02-19 11:06:39.9218 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 11:06:39.9218 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 11:06:39.9218 [debug]: Set up communication with graphics process.
2008-02-19 11:06:40.2343 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 11:06:57.7343 [debug]: Successfully read checkpoint
2008-02-19 11:06:57.7343 [debug]: Total skypoints = 1201. Progress: 252,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
253, c
254, 255, c
256, 257, c
258, 259, c
260, 261, c
262, 263, c
264, 265, c
266, 2008-02-19 11:18:56.6562 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 11:18:56.6562 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 11:18:56.6718 [debug]: Set up communication with graphics process.
2008-02-19 11:18:56.9687 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 11:19:14.4531 [debug]: Successfully read checkpoint
2008-02-19 11:19:14.4531 [debug]: Total skypoints = 1201. Progress: 266,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
267, c
268, 269, c
270, 271, c
272, 273, c
274, 275, c
276, 277, c
278, 279, c
280, 281, c
282, 283, c
284, 285, c
286, 287, c
288, 289, c
290, 2008-02-19 11:37:32.7031 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 11:37:32.7031 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 11:37:32.7031 [debug]: Set up communication with graphics process.
2008-02-19 11:37:33.0156 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 11:37:50.4843 [debug]: Successfully read checkpoint
2008-02-19 11:37:50.4843 [debug]: Total skypoints = 1201. Progress: 290,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
291, c
292, 293, c
294, 295, c
296, 297, c
298, 299, c
300, 301, c
302, 303, c
304, 305, c
306, 307, c
308, 309, c
310, 311, c
312, 313, c
314, 315, c
316, 317, c
318, 319, c
320, c
321, c
322, c
323, c
324, c
325, c
326, c
327, c
328, c
329, 2008-02-19 12:10:33.6562 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 12:10:33.6562 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 12:10:33.6718 [debug]: Set up communication with graphics process.
2008-02-19 12:10:33.9687 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 12:11:07.0781 [debug]: Successfully read checkpoint
2008-02-19 12:11:07.0781 [debug]: Total skypoints = 1201. Progress: 329,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
c
330, 2008-02-19 12:15:59.8906 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 12:15:59.8906 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 12:15:59.8906 [debug]: Set up communication with graphics process.
2008-02-19 12:16:00.2031 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 12:16:29.9218 [debug]: Successfully read checkpoint
2008-02-19 12:16:29.9218 [debug]: Total skypoints = 1201. Progress: 330,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
c
331, c
332, c
333, c
334, c
335, c
336, c
337, c
338, 339, c
340, c
341, c
342, 2008-02-19 12:32:03.8906 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 12:32:03.8906 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 12:32:03.8906 [debug]: Set up communication with graphics process.
2008-02-19 12:32:04.2031 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 12:32:32.7343 [debug]: Successfully read checkpoint
2008-02-19 12:32:32.7343 [debug]: Total skypoints = 1201. Progress: 342,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
c
343, c
344, c
345, c
346, c
347, c
348, 2008-02-19 12:42:26.5781 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 12:42:26.5781 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 12:42:26.5937 [debug]: Set up communication with graphics process.
2008-02-19 12:42:26.8906 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 12:42:56.1875 [debug]: Successfully read checkpoint
2008-02-19 12:42:56.1875 [debug]: Total skypoints = 1201. Progress: 348,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
c
349, c
350, c
351, c
352, c
353, c
354, 2008-02-19 12:52:40.2500 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 12:52:40.2500 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 12:52:40.2500 [debug]: Set up communication with graphics process.
2008-02-19 12:52:40.5625 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 12:53:10.1250 [debug]: Successfully read checkpoint
2008-02-19 12:53:10.1250 [debug]: Total skypoints = 1201. Progress: 354,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
c
355, c
356, c
357, c
358, c
359, c
360, c
361, c
362, c
363, c
364, c
365, c
366, c
367, c
368, c
369, c
370, 371, c
372, c
373, c
374, 2008-02-19 13:17:43.4375 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 13:17:43.4375 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 13:17:43.4375 [debug]: Set up communication with graphics process.
2008-02-19 13:17:43.7343 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 13:18:13.3281 [debug]: Successfully read checkpoint
2008-02-19 13:18:13.3281 [debug]: Total skypoints = 1201. Progress: 374,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
c
375, c
376, c
377, c
378, c
379, c
380, 381, c
382, c
383, c
384, c
385, c
386, c
387, c
388, c
389, c
390, c
391, c
392, 2008-02-19 13:40:50.7500 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 13:40:50.7500 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 13:40:50.7500 [debug]: Set up communication with graphics process.
2008-02-19 13:40:51.0625 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 13:41:20.6875 [debug]: Successfully read checkpoint
2008-02-19 13:41:20.6875 [debug]: Total skypoints = 1201. Progress: 392,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
c
393, c
394, c
395, c
396, c
397, 2008-02-19 13:49:43.7656 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 13:49:43.7656 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 13:49:43.7656 [debug]: Set up communication with graphics process.
2008-02-19 13:49:44.0781 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 13:50:13.5156 [debug]: Successfully read checkpoint
2008-02-19 13:50:13.5156 [debug]: Total skypoints = 1201. Progress: 397,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
c
398, c
399, c
400, c
401, c
402, c
403, c
404, c
405, c
406, c
407, c
408, c
409, c
410, c
411, c
412, 2008-02-19 14:09:26.9687 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 14:09:26.9687 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 14:09:26.9687 [debug]: Set up communication with graphics process.
2008-02-19 14:09:27.2812 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 14:09:55.7968 [debug]: Successfully read checkpoint
2008-02-19 14:09:55.7968 [debug]: Total skypoints = 1201. Progress: 412,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
c
413, c
414, c
415, c
416, c
417, c
418, c
419, c
420, 2008-02-19 14:21:40.9062 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 14:21:40.9062 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 14:21:40.9062 [debug]: Set up communication with graphics process.
2008-02-19 14:21:41.2187 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 14:22:10.8437 [debug]: Successfully read checkpoint
2008-02-19 14:22:10.8437 [debug]: Total skypoints = 1201. Progress: 420,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
c
421, c
422, c
423, c
424, c
425, c
426, c
427, c
428, c
429, c
430, 2008-02-19 14:36:55.6093 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 14:36:55.6093 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 14:36:55.6093 [debug]: Set up communication with graphics process.
2008-02-19 14:36:55.9218 [debug]: Reading SFTs and setting up stacks ... 2008-02-19 14:40:15.9531 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 14:40:15.9531 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 14:40:15.9531 [debug]: Set up communication with graphics process.
2008-02-19 14:40:16.2656 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 14:40:47.6406 [debug]: Successfully read checkpoint
2008-02-19 14:40:47.6406 [debug]: Total skypoints = 1201. Progress: 430,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
c
431, c
432, c
433, c
434, c
435, c
436, c
437, c
438, c
439, c
440, c
441, c
442, c
443, c
444, c
445, c
446, c
447, c
448, c
449, 2008-02-19 15:04:52.0468 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 15:04:52.0468 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 15:04:52.0625 [debug]: Set up communication with graphics process.
2008-02-19 15:04:52.3750 [debug]: Reading SFTs and setting up stacks ... 2008-02-19 15:08:11.4218 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 15:08:11.4218 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 15:08:11.4218 [debug]: Set up communication with graphics process.
2008-02-19 15:08:11.7500 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 15:08:42.0000 [debug]: Successfully read checkpoint
2008-02-19 15:08:42.0000 [debug]: Total skypoints = 1201. Progress: 449,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
c
450, c
451, c
452, 453, c
454, 455, c
456, c
457, 2008-02-19 15:20:12.1875 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 15:20:12.1875 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 15:20:12.1875 [debug]: Set up communication with graphics process.
2008-02-19 15:20:12.5000 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 15:20:43.3437 [debug]: Successfully read checkpoint
2008-02-19 15:20:43.3437 [debug]: Total skypoints = 1201. Progress: 457,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
c
458, c
459, 460, c
461, c
462, 463, c
464, c
465, 466, c
467, 468, c
469, c
470, 471, c
472, 473, c
474, 475, c
476, c
477, c
478, c
479, c
480, c
481, c
482, 2008-02-19 15:50:40.1250 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 15:50:40.1250 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 15:50:40.1250 [debug]: Set up communication with graphics process.
2008-02-19 15:50:40.4843 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 15:51:18.0468 [debug]: Successfully read checkpoint
2008-02-19 15:51:18.0468 [debug]: Total skypoints = 1201. Progress: 482,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
c
483, c
484, c
485, c
486, c
487, 2008-02-19 16:01:45.6718 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 16:01:45.6718 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 16:01:45.6718 [debug]: Set up communication with graphics process.
2008-02-19 16:01:45.9687 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 16:02:17.8281 [debug]: Successfully read checkpoint
2008-02-19 16:02:17.8281 [debug]: Total skypoints = 1201. Progress: 487,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
c
488, c
489, c
490, c
491, c
492, c
493, 494, c
495, c
496, c
497, c
498, 499, c
500, 2008-02-19 16:18:52.6406 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-19 16:18:52.6406 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-19 16:18:52.6406 [debug]: Set up communication with graphics process.
2008-02-19 16:18:53.0000 [debug]: Reading SFTs and setting up stacks ... done
2008-02-19 16:19:24.4531 [debug]: Successfully read checkpoint
2008-02-19 16:19:24.4531 [debug]: Total skypoints = 1201. Progress: 500,
$Revision: 1.82 $ OPT:0 SCV:9, SCTRIM:8
c
501, 502, c
503, 504, c
505, 506, c
507, 508, c
509, 510, c
511, 512, c
513, 514, c
515,

mopel
mopel
Joined: 2 Sep 05
Posts: 6
Credit: 21991221
RAC: 0

RE: Can either of you, or

Message 79306 in response to message 79304

Quote:
Can either of you, or both, please navigate to your BOINC directory, then to the slots directory and find the slot that Einstein is running from and post the contents of the stderr.txt file that's in there?

Hi, there is nothing what would make sense to me... no errors, just frequently (re-)starting jobs. But when it has reached ~ 12..16 CPU-seconds it seems to hang around a while until it is restarted again... - no overall progress on the job. Wired... - is it possible to set a Debug environment flag to let it create more verbose messages?

The messages are the same for both slots:
[slots\\0\\stderror.txt:]
2008-02-18 12:53:52.8281 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-18 12:53:52.8593 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-18 12:53:54.4687 [debug]: Set up communication with graphics process.
2008-02-18 12:53:56.7031 [debug]: Reading SFTs and setting up stacks ... 2008-02-18 12:57:37.2500 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-18 12:57:37.2500 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-18 12:57:37.2656 [debug]: Set up communication with graphics process.
2008-02-18 12:57:37.6718 [debug]: Reading SFTs and setting up stacks ... 2008-02-18 13:01:23.1250 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-18 13:01:23.1250 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-18 13:01:23.1250 [debug]: Set up communication with graphics process.
2008-02-18 13:01:23.5625 [debug]: Reading SFTs and setting up stacks ... 2008-02-18 13:05:09.0468 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-18 13:05:09.0468 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-18 13:05:09.0625 [debug]: Set up communication with graphics process.
2008-02-18 13:05:09.4687 [debug]: Reading SFTs and setting up stacks ... 2008-02-18 13:08:52.9375 [normal]: Built at: Jan 21 2008 15:44:15

2008-02-18 13:08:52.9375 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R3_4.26_windows_intelx86.exe'.
2008-02-18 13:08:52.9375 [debug]: Set up communication with graphics process.
2008-02-18 13:08:53.3593 [debug]: Reading SFTs and setting up stacks ... 2008-02-18 13:11:59.7968 [normal]: Built at: Jan 21 2008 15:44:15
...(snip)

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 757630240
RAC: 1151917

Very weird indeed. E@H App

Very weird indeed.

E@H App has an initialization phase where it has to read in some data, which can take maybe 20 seconds.

The log indicates that frequently the app is terminated before it can finish this phase and even begin to carry on from its last checkpoint. I don't think the app crashes by itself, BOINC would notice that and would complain badly. So it seems BOINC itself is telling E@H to stop very shortly after it has started.

What is your setting concerning the app staying in memory when it is suspended? Do you allow this? It might be worth trying.

Puzzled,
Bikeman

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 6

RE: The log indicates that

Message 79308 in response to message 79307

Quote:
The log indicates that frequently the app is terminated before it can finish this phase and even begin to carry on from its last checkpoint. I don't think the app crashes by itself, BOINC would notice that and would complain badly.


BOINC will completely stop the application when it's stopped for unknown reasons more than 100 times, though.

Quote:
So it seems BOINC itself is telling E@H to stop very shortly after it has started.


I don't think so as it would say it then, probably something along the lines of application exited. It's not even getting to the "[normal]: INFO: Couldn't open checkpoint task_name" part. The application just mysteriously crashes.

ded1o1's log looks as if his application is going back on iterations. For instance, 9 already logged, but going back to 8 upon the restart. 123 already logged, but going back to 122. That could give a restarting message.

mopel, could you please run with a cc_config.xml file? You can see here how to set one up.

Please add to it these flags:


1
1
1
1
1
1
1

Please exit BOINC, navigate to your BOINC directory and rename stdoutdae.txt to stdoutdae.old
Restart BOINC. It'll read the cc_config.xml file and start logging away. You'll get a lot of messages from these flags. Let it run for a while, about 5 to 10 minutes.
Exit BOINC and go back to your BOINC directory.
Depending on the size of the stdoutdae.txt file either post the contents here, or email the file to me. I'll PM you with my email address.

I prefer you email it... especially if the file size is over 50KB, which it can easily do. Then I'll make sure it'll get to the admins and moderators here.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 757630240
RAC: 1151917

RE: ded1o1's log looks as

Message 79309 in response to message 79308

Quote:

ded1o1's log looks as if his application is going back on iterations. For instance, 9 already logged, but going back to 8 upon the restart. 123 already logged, but going back to 122. That could give a restarting message.

I think that is normal: the "9" in the log just states that the 9th sky position was calculated, but the checkpoints are made for the sky positions following the "c" at the line breaks (for checkpoint, I guess).

CU
Bikeman

ded1o1
ded1o1
Joined: 29 Sep 07
Posts: 3
Credit: 10183773
RAC: 0

RE: Can either of you, or

Message 79310 in response to message 79304

Quote:
Can either of you, or both, please navigate to your BOINC directory, then to the slots directory and find the slot that Einstein is running from and post the contents of the stderr.txt file that's in there?

Still seeing a lot of restarts here, also have 2 Einstein tasks that have been running for over 24 hours and are still far from complete. Normally Einstein tasks do not take this long. Typically in a 24 hour period this PC would complete at least 2 Einstein tasks as wells as many SETI tasks.

Since Einstein version S5R3_4.26 was installed SETI tasks do not seem to be getting the opportunity to run as often as they should.

While progress has not stopped entirely in the last 24 hours there have been many occasions where CPU usage has dropped to 50% of what I'd expect it to be for long periods, often it has dropped to 0% for periods varying from 5mins to half an hour.

Will continue to keep a close eye on this board in the hopes that someone will find a fix, will be particularly interested to see if mopels debug results shed some light.

In the meantime will post a couple of links to my latest stderr.txt files (they are too big to post here).

Einstein stderr.txt slot 1
http://www.megaupload.com/?d=SWJ2DM0Z

Einstein stderr.txt slot 2
http://www.megaupload.com/?d=03LLCEQL

mopel
mopel
Joined: 2 Sep 05
Posts: 6
Credit: 21991221
RAC: 0

RE: RE: The log indicates

Message 79311 in response to message 79308

Quote:
Quote:
The log indicates that frequently the app is terminated before it can finish this phase and even begin to carry on from its last checkpoint. I don't think the app crashes by itself, BOINC would notice that and would complain badly.

BOINC will completely stop the application when it's stopped for unknown reasons more than 100 times, though.

Quote:
So it seems BOINC itself is telling E@H to stop very shortly after it has started.

I don't think so as it would say it then, probably something along the lines of application exited. It's not even getting to the "[normal]: INFO: Couldn't open checkpoint task_name" part. The application just mysteriously crashes.

ded1o1's log looks as if his application is going back on iterations. For instance, 9 already logged, but going back to 8 upon the restart. 123 already logged, but going back to 122. That could give a restarting message.

mopel, could you please run with a cc_config.xml file? You can see here how to set one up.

Please add to it these flags:


1
1
1
1
1
1
1

Please exit BOINC, navigate to your BOINC directory and rename stdoutdae.txt to stdoutdae.old
Restart BOINC. It'll read the cc_config.xml file and start logging away. You'll get a lot of messages from these flags. Let it run for a while, about 5 to 10 minutes.
Exit BOINC and go back to your BOINC directory.
Depending on the size of the stdoutdae.txt file either post the contents here, or email the file to me. I'll PM you with my email address.

I prefer you email it... especially if the file size is over 50KB, which it can easily do. Then I'll make sure it'll get to the admins and moderators here.

hi, many thanks for all the ideas and help, well appreciated! Jord, I'll send you my logs separated from this thread. While watching the debug output ... just tested what would happen with:
100.000000
instead of
60.000000
... - now both threads were able to make progress beyond 0.00% :-) It seems to workaround my problem, but ... not really fix it :-/ Feeling a bit unhappy with 100% bcz the temperature is high all the time and the power consumption as well. I used to have a 60% limit on two older PCs without issues, and a 60% limit on a new 2ndary XP-64 partition of the same box with boinc running fine (I think). But it puzzling me why the CPU-limit breaks the Einstein job on my old, repaired XP-home partition. Something of the XP-home environment is not ok, killing boinc scheduler with CPU limit, although the XP event log looks clean. Hmmm...

mopel
mopel
Joined: 2 Sep 05
Posts: 6
Credit: 21991221
RAC: 0

RE: RE: RE: The log

Message 79312 in response to message 79311

Quote:
Quote:
Quote:
The log indicates that frequently the app is terminated before it can finish this phase and even begin to carry on from its last checkpoint. I don't think the app crashes by itself, BOINC would notice that and would complain badly.

BOINC will completely stop the application when it's stopped for unknown reasons more than 100 times, though.

Quote:
So it seems BOINC itself is telling E@H to stop very shortly after it has started.

I don't think so as it would say it then, probably something along the lines of application exited. It's not even getting to the "[normal]: INFO: Couldn't open checkpoint task_name" part. The application just mysteriously crashes.

ded1o1's log looks as if his application is going back on iterations. For instance, 9 already logged, but going back to 8 upon the restart. 123 already logged, but going back to 122. That could give a restarting message.

mopel, could you please run with a cc_config.xml file? You can see here how to set one up.

Please add to it these flags:


1
1
1
1
1
1
1

Please exit BOINC, navigate to your BOINC directory and rename stdoutdae.txt to stdoutdae.old
Restart BOINC. It'll read the cc_config.xml file and start logging away. You'll get a lot of messages from these flags. Let it run for a while, about 5 to 10 minutes.
Exit BOINC and go back to your BOINC directory.
Depending on the size of the stdoutdae.txt file either post the contents here, or email the file to me. I'll PM you with my email address.

I prefer you email it... especially if the file size is over 50KB, which it can easily do. Then I'll make sure it'll get to the admins and moderators here.

hi, many thanks for all the ideas and help, well appreciated! Jord, I'll send you my logs separated from this thread. While watching the debug output ... just tested what would happen with:
100.000000
instead of
60.000000
... - now both threads were able to make progress beyond 0.00% :-) It seems to workaround my problem, but ... not really fix it :-/ Feeling a bit unhappy with 100% bcz the temperature is high all the time and the power consumption as well. I used to have a 60% limit on two older PCs without issues, and a 60% limit on a new 2ndary XP-64 partition of the same box with boinc running fine (I think). But it puzzling me why the CPU-limit breaks the Einstein job on my old, repaired XP-home partition. Something of the XP-home environment is not ok, killing boinc scheduler with CPU limit, although the XP event log looks clean. Hmmm...

Now the fresh XP-64 has the same problem since yesterday. I didn't install anything there the last weeks or changed settings, suddenly boinc began to restart both jobs frequently, I think there is no more overall progress. The boinc function to limit CPU usage below 100% seems to be broken. When I can't sort it out, I'll look for an older version than boinc 5.10.30 :-P

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.