Notebook failures

WayneKFord
WayneKFord
Joined: 21 Aug 05
Posts: 25
Credit: 1424120
RAC: 0
Topic 191625

I have three computers working on einstein, two desktops and a Dell i600m notebook. They each run the same version of einstein, which seems to be S5 R1.402

The notebook rarely returns a valid result. The id for this computer is 393166. The log file is full of info, none of which makes sense to me. The desktops are pretty successful.

These computers also run other projects and have no 'trouble'.

Any ideas? Can anyone who can help examine the log files?

Scott Brown
Scott Brown
Joined: 9 Feb 05
Posts: 38
Credit: 215235
RAC: 0

Notebook failures

Quote:

I have three computers working on einstein, two desktops and a Dell i600m notebook. They each run the same version of einstein, which seems to be S5 R1.402

The notebook rarely returns a valid result. The id for this computer is 393166. The log file is full of info, none of which makes sense to me. The desktops are pretty successful.

These computers also run other projects and have no 'trouble'.

Any ideas? Can anyone who can help examine the log files?

I am certainly not an expert at reading the log files, so perhaps someone with greater knowledge will also chime in...

But I noticed that all your results seem to begin with an error reading or finding the checkpoint file...You might take a look at this to make sure something simple isn't going on (such as a glitch that made the file or directory read-only, etc.).

Michael Karlinsky
Michael Karlinsky
Joined: 22 Jan 05
Posts: 888
Credit: 23502182
RAC: 0

RE: Any ideas? Can anyone

Quote:

Any ideas? Can anyone who can help examine the log files?

Quote:


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0040BF6A read attempt to address 0x2B6B2830

Engaging BOINC Windows Runtime Debugger...

Maybe heat. The usual advice is to run memtest and prime95.

Michael

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2142
Credit: 2821422972
RAC: 920902

I seem to remember a

I seem to remember a discussion some time ago about computer suspend/hibernate functions: IIRC, the hypothesis was that if Windows was fully powered down, applications such as BOINC were given plenty of time to flush their state files to disk before closedown; but if Windows went to suspend/hibernate, it used an accelerated procedure which didn't give BOINC enough time.

Your problem is with your notebook, which I guess is more likely to be suspended - could that be the problem?

If you use the suspend function, you could try closing BOINC manually (a full file|exit, not just hiding the window) at the end of your work sessions for a few days. If you get fewer errors/more credit, then all you have to do is find a way of automating the process!

WayneKFord
WayneKFord
Joined: 21 Aug 05
Posts: 25
Credit: 1424120
RAC: 0

RE: I seem to remember a

Message 43143 in response to message 43142

Quote:

I seem to remember a discussion some time ago about computer suspend/hibernate functions: IIRC, the hypothesis was that if Windows was fully powered down, applications such as BOINC were given plenty of time to flush their state files to disk before closedown; but if Windows went to suspend/hibernate, it used an accelerated procedure which didn't give BOINC enough time.

I should also say that the last several wu's completed (and invalidated) were computed over several days when I never turned the beast off (used as a desktop most of the time.)

Your problem is with your notebook, which I guess is more likely to be suspended - could that be the problem?

If you use the suspend function, you could try closing BOINC manually (a full file|exit, not just hiding the window) at the end of your work sessions for a few days. If you get fewer errors/more credit, then all you have to do is find a way of automating the process!

I think there may be a thread of truth to this theory, because I frequently see in the event logs a comment (from my memory now) about some app not releasing some part of the registry in time and that the registry will be somehow restored later (sorry for not being precise here). That app is always something to do with boinc (I run it as a service, not a screen saver).

Regarding the memtest, etc., I've done that and never have any troubles. Several other projects also run on this notebook and don't report the same type of issue. (I actually posted this complaint about 12 m ago; but a lot has changed over time in that I am now getting almost all my einstein wu's invalidated.)

More comments are welcome. When my queue flushes on the other projects I expect I will try to re-install the einstein stuff from scratch to see if something is corrupted. Too bad all that diagnostic stuff isn't useful.

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

I have never encountered any

Message 43144 in response to message 43143

I have never encountered any problems or crashes with either SETI or E@H and very rarely get any invalid results on my laptop (although it is a relatively slow Celeron and I use hibernate/standby a lot). If you are sure the problem is not temperature-related I would try switching to not leaving WUs in memory. I guess that might do the trick as your WUs will be saved on the hard disk before you shut down. I'm afraid I am not sure if this works as I have never used a different setting (because I always seem to have too little RAM anyway, using a shared RAM gfx card and so on) but as I said, BOINC works fine for me the way it is...

WayneKFord
WayneKFord
Joined: 21 Aug 05
Posts: 25
Credit: 1424120
RAC: 0

Here is some more data. After

Here is some more data. After crunching successfully for a while (completing wu's) the notebook had about 13 compute errors in a row. If you look at the sequence, the first failed unit made it through 3470 sec and failed (below), but the subsequent 12 failed out of the box. The notebook is shared with seti and I think the einstein wu's were started sequentially because the seti queue was exhausted. see 393166

2006-08-17 06:28:02.7900 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R1_4.24_windows_intelx86.exe'.
2006-08-17 06:28:02.8000 [normal]: Started search at lalDebugLevel = 0
2006-08-17 06:28:03.6813 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2006-08-17 06:28:03.6813 [normal]: No usable checkpoint found, starting from beginning.
Detected CPU type 1
small x
small x
small x
small x
small x
ERROR! sftIndex = -2147483648 < 0 in TestLALDemod run 0
alpha=51, xTemp=542621.64819343563000000, Dterms=16, ifmin=542535
Level 0: $Id: ComputeFStatistic.c,v 1.371 2006/06/09 12:48:58 reinhard Exp $
Function call `TestLALDemod(status, &Fstat, SFTData, DemodParams)' failed.
file ComputeFStatistic.c, line 958
2006-08-17 07:27:36.4286 [normal]:
Level 1: $Id: CFSLALDemod_SSEgas.c,v 1.6 2006/07/28 17:01:40 bema Exp $
2006-08-17 07:27:36.4286 [normal]: Status code 3: Invalid input
2006-08-17 07:27:36.4286 [normal]: function TestLALDemod, file FDS_isolated/CFSLALDemod_SSEgas.c, line 190
2006-08-17 07:27:36.4286 [CRITICAL]: BOINC_ERR_EXIT(): now calling boinc_finish()

Mahray
Mahray
Joined: 11 Nov 04
Posts: 43
Credit: 95188524
RAC: 258884

Do you have any defrag

Message 43146 in response to message 43145

Do you have any defrag software running? I've had problems in the past with it trying to move Boinc files when they weren't necessarily being used, but still open (or something like that, I know Boinc really doesn't like being defragged while running).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.