GW LineVeto (extended) errors

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0
Topic 196975

I had one of my hosts throw 4 of these today. Not sure why as they are normally very reliable.

Stderr snippet
2013-05-29 20:50:44.0335 (2576) [CRITICAL]: Required frequency-bins [871757, 871772] not covered by SFT-interval [872378, 872862]
[Parameters: alpha:0, Dphi_alpha:8.717647e+005, Tsft:1.800000e+003, *Tdot_al:9.999694e-001]
XLAL Error - LocalXLALComputeFaFb (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/FDS_isolated/OptimizedCFS/LocalComputeFstat.c:554): Input domain error

LocalXALComputeFaFb() failed
Error[1] 5: function LocalComputeFStat, file /home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/FDS_isolated/OptimizedCFS/LocalComputeFstat.c, line 338, $Id$
ABORT: XLAL function call failed
XLALComputeExtraStatsSemiCoherent, line 360 : Failed call to LAL function ComputeFStat(). statusCode=5

XLAL Error - XLALComputeExtraStatsSemiCoherent (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/LineVeto.c:361): Internal function call failed: Input domain error

Error in function XLALComputeExtraStatsForToplist, line 220 : Failed call to XLALComputeLineVetoSemiCoherent().

XLAL Error - XLALComputeExtraStatsForToplist (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/LineVeto.c:221): Internal function call failed: Input domain error
XLAL Error - MAIN (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c:1814): Check (XLAL_SUCCESS == XLALComputeExtraStatsForToplist ( semiCohToplist, "GCTtop", &stackMultiSFT, &stackMultiNoiseWeights, &stackMultiDetStates, &CFparams, refTimeGPS, uvar_SignalOnly, uvar_outputSingleSegStats )) failed
XLAL Error - MAIN (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c:1814): XLALComputeExtraStatsForToplist() failed with xlalErrno = 1057.

XLAL Error - MAIN (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c:1814): Invalid pointer
2013-05-29 20:50:44.0491 (2576) [CRITICAL]: ERROR: MAIN() returned with error '-1'

Links to wu
One
Two
Three
Four

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

GW LineVeto (extended) errors

Its now trashed 49 work units. My other hosts seem fine so I will remove it from Einstein for the time being. Time to run some diagnostics on it.

Gavin
Gavin
Joined: 21 Sep 10
Posts: 191
Credit: 40644337738
RAC: 1

Interesting, I seem to be

Interesting, I seem to be having the same problem with this host 6550900 and sample stderr output here So far, since yesterday, 40 tasks have failed in the same way.

I was beginning to think perhaps I had picked up a bad work unit or units but so far a couple of my earlier errors have been resent, completed and validated by others so I guess that's that idea out of the window!

As it is I am not sure how to progress with a diagnosis, it would be much easier if it wasn't for the fact that whilst getting these errors I am successfully completing other GW units.

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

RE: Interesting, I seem to

Quote:

Interesting, I seem to be having the same problem with this host 6550900 and sample stderr output here So far, since yesterday, 40 tasks have failed in the same way.

I was beginning to think perhaps I had picked up a bad work unit or units but so far a couple of my earlier errors have been resent, completed and validated by others so I guess that's that idea out of the window!

As it is I am not sure how to progress with a diagnosis, it would be much easier if it wasn't for the fact that whilst getting these errors I am successfully completing other GW units.

Yes I have similar. Some work units work (which makes me suspect faulty app/work units) and other don't on the same host.

It just trashed another 5 after I've been running diagnostics on it for the last 6 hours. I did find a failed CD/DVD burner and some disk errors, but nothing else. They all fail with 30 seconds. Like yours my host is a Win7 x64 box (with 8Gb of RAM) and plenty of disk space.

My next thoughts are to detach/reattach to the project to clean out the project directories. It was using 3.2Gb of space just for Einstein

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

RE: My next thoughts are to

Quote:
My next thoughts are to detach/reattach to the project to clean out the project directories. It was using 3.2Gb of space just for Einstein

Well that didn't help. It just trashed 10 more after doing that so looks like it will have to stay off Einstein.

Update
It seems the ones failing are the 8th one in. It it has 7 other running successfully at the same time, all GW lineveto. When the 8th wu tries to run it crashes within 30 seconds.

As an experiment I have a bunch suspended (after download, so they haven't started). As I release each one the 8th one will crash while the other 7 continue to run. It looks like we can't run 8 of these at the same time for whatever reason.

Gavin
Gavin
Joined: 21 Sep 10
Posts: 191
Credit: 40644337738
RAC: 1

RE: As an experiment I have

Quote:
As an experiment I have a bunch suspended (after download, so they haven't started). As I release each one the 8th one will crash while the other 7 continue to run. It looks like we can't run 8 of these at the same time for whatever reason.

Unfortunately, my affected host is at work and I am not currently able to access it directly so will have to wait until tomorrow. However, I do know that that machine does not run all 8 cores on GW tasks (or any other CPU tasks), if I remember correctly I have perhaps 2 cores freed to feed the GPU and its BRP4 tasks...

I wonder if the problem started with the introduction of BRP5 tasks on our hosts?
I am just guessing and clutching at straws, but it would (without closer inspection) fit nicely with the issues in my host, more investigating needed methinks.

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

RE: RE: As an experiment

Quote:
Quote:
As an experiment I have a bunch suspended (after download, so they haven't started). As I release each one the 8th one will crash while the other 7 continue to run. It looks like we can't run 8 of these at the same time for whatever reason.

Unfortunately, my affected host is at work and I am not currently able to access it directly so will have to wait until tomorrow. However, I do know that that machine does not run all 8 cores on GW tasks (or any other CPU tasks), if I remember correctly I have perhaps 2 cores freed to feed the GPU and its BRP4 tasks...

I wonder if the problem started with the introduction of BRP5 tasks on our hosts?
I am just guessing and clutching at straws, but it would (without closer inspection) fit nicely with the issues in my host, more investigating needed methinks.

I don't have a GPU in mine (well its an on-chip Intel one) so no BRP tasks running at all which is probably why I get so many Lineveto tasks now.

Beyond
Beyond
Joined: 28 Feb 05
Posts: 121
Credit: 2374546212
RAC: 5713652

RE: Its now trashed 49 work

Quote:
Its now trashed 49 work units. My other hosts seem fine so I will remove it from Einstein for the time being. Time to run some diagnostics on it.


This one has a lot of validate errors:

http://einsteinathome.org/host/4871586/tasks&offset=0&show_names=1&state=4&appid=0

On the other one just set "use at most 90% of the processors" in your BOINC client settings and see how that works.

Gavin
Gavin
Joined: 21 Sep 10
Posts: 191
Credit: 40644337738
RAC: 1

Just to update, my host

Just to update, my host appears to have righted itself there have been no new errors since the early hours of 2nd June.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.