Critical: XLAL Error

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 517446849

RAC: 434776

8 Feb 2013 22:47:49 UTC

Topic 196798

(moderation:

)

I have some failing LV SSE-wu's in the last 2 says; one of thenm is http://einsteinathome.org/task/343005413

Stderror output says:
2013-02-07 16:57:35.7508 (1520) [normal]: Recalculating statistics for the final toplist...
2013-02-07 16:57:35.7518 (1520) [CRITICAL]: Required frequency-bins [805088, 805103] not covered by SFT-interval [909995, 910486]
[Parameters: alpha:0, Dphi_alpha:8.050956e+005, Tsft:1.800000e+003, *Tdot_al:1.000073e+000]
XLAL Error - LocalXLALComputeFaFb (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/FDS_isolated/OptimizedCFS/LocalComputeFstat.c:554): Input domain error

LocalXALComputeFaFb() failed
Error[1] 5: function LocalComputeFStat, file /home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/FDS_isolated/OptimizedCFS/LocalComputeFstat.c, line 338, $Id$
ABORT: XLAL function call failed
XLALComputeExtraStatsSemiCoherent, line 360 : Failed call to LAL function ComputeFStat(). statusCode=5

XLAL Error - XLALComputeExtraStatsSemiCoherent (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/LineVeto.c:361): Internal function call failed: Input domain error

Error in function XLALComputeExtraStatsForToplist, line 220 : Failed call to XLALComputeLineVetoSemiCoherent().

XLAL Error - XLALComputeExtraStatsForToplist (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/LineVeto.c:221): Internal function call failed: Input domain error
XLAL Error - MAIN (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c:1814): Check (XLAL_SUCCESS == XLALComputeExtraStatsForToplist ( semiCohToplist, "GCTtop", &stackMultiSFT, &stackMultiNoiseWeights, &stackMultiDetStates, &CFparams, refTimeGPS, uvar_SignalOnly, uvar_outputSingleSegStats )) failed
XLAL Error - MAIN (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c:1814): XLALComputeExtraStatsForToplist() failed with xlalErrno = 1057.

XLAL Error - MAIN (/home/jenkins/workspace/workspace/EAH-GW-S6LV1/SLAVE/MINGW32/TARGET/windows-x86/EinsteinAtHome/source/lalsuite/lalapps/src/pulsar/GCT/HierarchSearchGCT.c:1814): Invalid pointer
2013-02-07 16:57:35.7608 (1520) [CRITICAL]: ERROR: MAIN() returned with error '-1'
FPU status flags: COND_0 PRECISION
2013-02-07 16:57:35.7618 (1520) [normal]: done. calling boinc_finish(-1).
16:57:35 (1520): called boinc_finish

They all show similar errors; it happens on one machine only (Intel i3, Host ID 5394959 ). I tried some LHC wu's, they finish without error, I have no troubles using windows.
Does this point to a hw-problem or is it a sw-problem?

Nobody316

Joined: 14 Jan 13

Posts: 141

Credit: 2008126

RAC: 0

Critical: XLAL Error

9 Feb 2013 7:05:20 UTC

Message 114784

(moderation:

)

http://boincfaq.mundayweb.com/index.php?view=210&language=1

see if this helps... if i understand right I thinks it may be data problem. I don't know for sure but hope that link helps...

PC setup MSI-970A-G46 AMD FX-8350 8 core OC'd 4.45GHz 16GB ram PC3-10700 Geforce GTX 650Ti Windows 7 x64 Einstein@Home

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 517446849

RAC: 434776

RE: http://boincfaq.mundayw

9 Feb 2013 8:22:20 UTC

Message 114786 in response to message 114784

(moderation:

)

Quote:

http://boincfaq.mundayweb.com/index.php?view=210&language=1

see if this helps... if i understand right I thinks it may be data problem. I don't know for sure but hope that link helps...

THX for the link; looks like something similar happened earlier.
I hope for a response from BM or Bikeman. Anyway, I will run some tests on my hardware to find out if there is a faulty ram or a disk problem. I'm a little bit concerned because it happens on one machine only and noone else is reporting these probles, this points to a problem on my side. On the other hand, a burn-in test did not show any problem.
I'll try another mix of project and watch the machine careful.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4332

Credit: 252237655

RAC: 34871

The workunit you pointed to

9 Feb 2013 10:51:24 UTC

Message 114787

(moderation:

)

The workunit you pointed to was successfully completed by at least one other (Linux) host, so it can hardly be a systematic error in the application or in the workunit setup.

The error means that the frequency range the program tries to analyze is not contained in the data that was read from disk.

So there are essentially two possibilities: The data on disk is corrupted or can't be read (completely), or the program wrongly calculates the data it needs. If there is no problem in the application (version) itself, the latter is usually the result of some calculations going wrong in the floating point unit (FPU) of the CPU ("usually" here means that I have yet to see a single case where this wasn't the problem).

To get rid of possibly bad data I'd recommend to reset the project. In older BOINC Clients this won't delete and re-download the "sticky" files, so you better detach and re-attach there. Apart from that try to monitor the CPU temperature.

Einstein@Home uses hand-coded assembler routines for its core computation, it might well be that of all BOINC projects E@H puts the most stress on the floating-point units (x87 FPU and SSE). I don't know what you used for burn-in, but most such procedures don't use the FPUs as much as E@H does.

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 517446849

RAC: 434776

RE: To get rid of possibly

9 Feb 2013 18:39:28 UTC

Message 114788 in response to message 114787

(moderation:

)

Quote:

To get rid of possibly bad data I'd recommend to reset the project. In older BOINC Clients this won't delete and re-download the "sticky" files, so you better detach and re-attach there. Apart from that try to monitor the CPU temperature.

Einstein@Home uses hand-coded assembler routines for its core computation, it might well be that of all BOINC projects E@H puts the most stress on the floating-point units (x87 FPU and SSE). I don't know what you used for burn-in, but most such procedures don't use the FPUs as much as E@H does.

BM

BM,

many thanks for that info, perfect support from the dev's as usual.

I use Sisoft Sandra; i3 was working all the day on burn-in, no errors.
I have reset the system (detached and reattached) last time I encountered problems, this did not help.
Your explanation about SSE can explain, why win sees no problem there, SSE is rarely used there. Heat? It's my only system not running with cpu-cooler with heat-pipes, just the original one.
It's easy to change this.

Alexander

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 517446849

RAC: 434776

If someone is interested,

16 Feb 2013 20:55:30 UTC

Message 114789

(moderation:

)

If someone is interested, here is the result:
it was a mainboard error.
After reducing the system to the minimum required parts the problem was still there. Exchanging power supply and ram with another pc did not help eiter.
I bought a new cpu, an i3-3220.
Grrr .. the same problem.
I also made a restore of win. No change.
I started sisoft without running any tests, just monitoring all sensors that can be monitored. The ICH - chip went hot, more than 85 deg celsius.

So i bought a new mainboard, one with 2 pcie x16 3.0 slots, running both x8 when used. And a HD7870XT Boost (Tahiti, 1536 shaders).
I have moved my gtx550ti also to that system, enabled 3 cpu's and running 2 wu's together on both gpu's.
With win7-64, BM 7.0.52, CCC 13.2 beta 5 the system seems to run stable, producing 6 gpu-results per h and running 2 cpu- wu's also.
GPU-Z and AMD Trixx both show a GPU-usage of 1 - 3 %, so most likely an update is required to support that gpu.

https://dl.dropbox.com/u/50246791/cpu-gpu%20multiuse%20system.PNG

In this configuration I have the following crunching times:
nvidia 3518 sec
amd 1871 sec
grp 1.04 5566 sec
GW s6 ext 14890 sec

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 782539108

RAC: 1209899

Hi! Thanks for letting us

19 Feb 2013 13:04:37 UTC

Message 114790

(moderation:

)

Hi!

Thanks for letting us know about the cause of the problem. Isn't E@H a nice stress test ?? ;-)

Cheers
HB

Critical: XLAL Error

Forums › Problems and Bug Reports

Critical: XLAL Error

RE: http://boincfaq.mundayw

The workunit you pointed to

RE: To get rid of possibly

If someone is interested,

Hi! Thanks for letting us

Comment viewing options

Forums › Problems and Bug Reports