I have three computers working on einstein, two desktops and a Dell i600m notebook. They each run the same version of einstein, which seems to be S5 R1.402
The notebook rarely returns a valid result. The id for this computer is 393166. The log file is full of info, none of which makes sense to me. The desktops are pretty successful.
These computers also run other projects and have no 'trouble'.
Any ideas? Can anyone who can help examine the log files?
Copyright © 2024 Einstein@Home. All rights reserved.
Notebook failures
)
I am certainly not an expert at reading the log files, so perhaps someone with greater knowledge will also chime in...
But I noticed that all your results seem to begin with an error reading or finding the checkpoint file...You might take a look at this to make sure something simple isn't going on (such as a glitch that made the file or directory read-only, etc.).
RE: Any ideas? Can anyone
)
Maybe heat. The usual advice is to run memtest and prime95.
Michael
Team Linux Users Everywhere
I seem to remember a
)
I seem to remember a discussion some time ago about computer suspend/hibernate functions: IIRC, the hypothesis was that if Windows was fully powered down, applications such as BOINC were given plenty of time to flush their state files to disk before closedown; but if Windows went to suspend/hibernate, it used an accelerated procedure which didn't give BOINC enough time.
Your problem is with your notebook, which I guess is more likely to be suspended - could that be the problem?
If you use the suspend function, you could try closing BOINC manually (a full file|exit, not just hiding the window) at the end of your work sessions for a few days. If you get fewer errors/more credit, then all you have to do is find a way of automating the process!
RE: I seem to remember a
)
I think there may be a thread of truth to this theory, because I frequently see in the event logs a comment (from my memory now) about some app not releasing some part of the registry in time and that the registry will be somehow restored later (sorry for not being precise here). That app is always something to do with boinc (I run it as a service, not a screen saver).
Regarding the memtest, etc., I've done that and never have any troubles. Several other projects also run on this notebook and don't report the same type of issue. (I actually posted this complaint about 12 m ago; but a lot has changed over time in that I am now getting almost all my einstein wu's invalidated.)
More comments are welcome. When my queue flushes on the other projects I expect I will try to re-install the einstein stuff from scratch to see if something is corrupted. Too bad all that diagnostic stuff isn't useful.
I have never encountered any
)
I have never encountered any problems or crashes with either SETI or E@H and very rarely get any invalid results on my laptop (although it is a relatively slow Celeron and I use hibernate/standby a lot). If you are sure the problem is not temperature-related I would try switching to not leaving WUs in memory. I guess that might do the trick as your WUs will be saved on the hard disk before you shut down. I'm afraid I am not sure if this works as I have never used a different setting (because I always seem to have too little RAM anyway, using a shared RAM gfx card and so on) but as I said, BOINC works fine for me the way it is...
Here is some more data. After
)
Here is some more data. After crunching successfully for a while (completing wu's) the notebook had about 13 compute errors in a row. If you look at the sequence, the first failed unit made it through 3470 sec and failed (below), but the subsequent 12 failed out of the box. The notebook is shared with seti and I think the einstein wu's were started sequentially because the seti queue was exhausted. see 393166
2006-08-17 06:28:02.7900 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_S5R1_4.24_windows_intelx86.exe'.
2006-08-17 06:28:02.8000 [normal]: Started search at lalDebugLevel = 0
2006-08-17 06:28:03.6813 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2006-08-17 06:28:03.6813 [normal]: No usable checkpoint found, starting from beginning.
Detected CPU type 1
small x
small x
small x
small x
small x
ERROR! sftIndex = -2147483648 < 0 in TestLALDemod run 0
alpha=51, xTemp=542621.64819343563000000, Dterms=16, ifmin=542535
Level 0: $Id: ComputeFStatistic.c,v 1.371 2006/06/09 12:48:58 reinhard Exp $
Function call `TestLALDemod(status, &Fstat, SFTData, DemodParams)' failed.
file ComputeFStatistic.c, line 958
2006-08-17 07:27:36.4286 [normal]:
Level 1: $Id: CFSLALDemod_SSEgas.c,v 1.6 2006/07/28 17:01:40 bema Exp $
2006-08-17 07:27:36.4286 [normal]: Status code 3: Invalid input
2006-08-17 07:27:36.4286 [normal]: function TestLALDemod, file FDS_isolated/CFSLALDemod_SSEgas.c, line 190
2006-08-17 07:27:36.4286 [CRITICAL]: BOINC_ERR_EXIT(): now calling boinc_finish()
Do you have any defrag
)
Do you have any defrag software running? I've had problems in the past with it trying to move Boinc files when they weren't necessarily being used, but still open (or something like that, I know Boinc really doesn't like being defragged while running).