I had one of my hosts throw 4 of these today. Not sure why as they are normally very reliable.
Stderr snippet
2013-05-29 20:50:44.0335 (2576) [CRITICAL]: Required frequency-bins [871757, 871772] not covered by SFT-interval [872378, 872862]
[Parameters: alpha:0, Dphi_alpha:8.717647e+005, Tsft:1.800000e+003, *Tdot_al:9.999694e-001]
XLAL Error - LocalXLALComputeFaFb (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/FDS_isolated/OptimizedCFS/LocalComputeFstat.c:554): Input domain error
LocalXALComputeFaFb() failed
Error[1] 5: function LocalComputeFStat, file /home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/FDS_isolated/OptimizedCFS/LocalComputeFstat.c, line 338, $Id$
ABORT: XLAL function call failed
XLALComputeExtraStatsSemiCoherent, line 360 : Failed call to LAL function ComputeFStat(). statusCode=5
XLAL Error - XLALComputeExtraStatsSemiCoherent (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/LineVeto.c:361): Internal function call failed: Input domain error
Error in function XLALComputeExtraStatsForToplist, line 220 : Failed call to XLALComputeLineVetoSemiCoherent().
XLAL Error - XLALComputeExtraStatsForToplist (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/LineVeto.c:221): Internal function call failed: Input domain error
XLAL Error - MAIN (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c:1814): Check (XLAL_SUCCESS == XLALComputeExtraStatsForToplist ( semiCohToplist, "GCTtop", &stackMultiSFT, &stackMultiNoiseWeights, &stackMultiDetStates, &CFparams, refTimeGPS, uvar_SignalOnly, uvar_outputSingleSegStats )) failed
XLAL Error - MAIN (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c:1814): XLALComputeExtraStatsForToplist() failed with xlalErrno = 1057.
XLAL Error - MAIN (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c:1814): Invalid pointer
2013-05-29 20:50:44.0491 (2576) [CRITICAL]: ERROR: MAIN() returned with error '-1'
Copyright © 2024 Einstein@Home. All rights reserved.
GW LineVeto (extended) errors
)
Its now trashed 49 work units. My other hosts seem fine so I will remove it from Einstein for the time being. Time to run some diagnostics on it.
BOINC blog
Interesting, I seem to be
)
Interesting, I seem to be having the same problem with this host 6550900 and sample stderr output here So far, since yesterday, 40 tasks have failed in the same way.
I was beginning to think perhaps I had picked up a bad work unit or units but so far a couple of my earlier errors have been resent, completed and validated by others so I guess that's that idea out of the window!
As it is I am not sure how to progress with a diagnosis, it would be much easier if it wasn't for the fact that whilst getting these errors I am successfully completing other GW units.
RE: Interesting, I seem to
)
Yes I have similar. Some work units work (which makes me suspect faulty app/work units) and other don't on the same host.
It just trashed another 5 after I've been running diagnostics on it for the last 6 hours. I did find a failed CD/DVD burner and some disk errors, but nothing else. They all fail with 30 seconds. Like yours my host is a Win7 x64 box (with 8Gb of RAM) and plenty of disk space.
My next thoughts are to detach/reattach to the project to clean out the project directories. It was using 3.2Gb of space just for Einstein
BOINC blog
RE: My next thoughts are to
)
Well that didn't help. It just trashed 10 more after doing that so looks like it will have to stay off Einstein.
Update
It seems the ones failing are the 8th one in. It it has 7 other running successfully at the same time, all GW lineveto. When the 8th wu tries to run it crashes within 30 seconds.
As an experiment I have a bunch suspended (after download, so they haven't started). As I release each one the 8th one will crash while the other 7 continue to run. It looks like we can't run 8 of these at the same time for whatever reason.
BOINC blog
RE: As an experiment I have
)
Unfortunately, my affected host is at work and I am not currently able to access it directly so will have to wait until tomorrow. However, I do know that that machine does not run all 8 cores on GW tasks (or any other CPU tasks), if I remember correctly I have perhaps 2 cores freed to feed the GPU and its BRP4 tasks...
I wonder if the problem started with the introduction of BRP5 tasks on our hosts?
I am just guessing and clutching at straws, but it would (without closer inspection) fit nicely with the issues in my host, more investigating needed methinks.
RE: RE: As an experiment
)
I don't have a GPU in mine (well its an on-chip Intel one) so no BRP tasks running at all which is probably why I get so many Lineveto tasks now.
BOINC blog
RE: Its now trashed 49 work
)
This one has a lot of validate errors:
http://einsteinathome.org/host/4871586/tasks&offset=0&show_names=1&state=4&appid=0
On the other one just set "use at most 90% of the processors" in your BOINC client settings and see how that works.
Just to update, my host
)
Just to update, my host appears to have righted itself there have been no new errors since the early hours of 2nd June.