Computation error

jhaering
jhaering
Joined: 24 Nov 05
Posts: 3
Credit: 17821921
RAC: 0
Topic 215065

Hi,

whats going on ?

Task 757003670

Name: h1_0427.00_O2C02Cl1In0__O2AS20-500_427.15Hz_999_0
Job ID: 353550548
Erstellt: 16 Mai 2018 14:21:20 GMT
Gesendet: 16 Mai 2018 14:21:20 GMT
Ablaufdatum: 30 Mai 2018 14:21:20 GMT
Empfangen: 16 Mai 2018 14:23:47 GMT
Serverstatus: Over
Resultat: Computation error
Clientstatus: Compute error
Endstatus: 11 (0x0000000B) Unknown error code
Computer: 12647126
Run time (sec): 0.00
CPU time (sec): 0.00
Peak working set size (MB): 0
Peak swap size (MB): 0
Peak disk usage (MB): 0
Prüfungsstatus: Invalid
Gewährte Punkte: 0
Anwendung: Continuous Gravitational Wave search O2 All-Sky v1.01 x86_64-pc-linux-gnu

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110021098220
RAC: 22706445

Sig 11 (segmentation

Sig 11 (segmentation violation) errors mean that the program tried to access a memory location that was not assigned or not part of its address space.  They can be either hardware errors or software flaws.  The trick is to determine which.

If it was a bug in the app, it would likely show up in many other systems running the same app.  In your case, you were successfully returning tasks right up to 16th May, after which there was a series of 88 straight errors.  My guess is that some part of the hardware (possibly RAM) failed at that point.  By the look of things, the number of errors has caused your daily task allocation to now be down to 1 per core per day.  I'm not saying it's definitely hardware - it's still possible that your system is triggering something in the app that others aren't.

If you haven't rebooted since the problem started, that would be a first step.  If errors continue, you should try replacing components, particularly RAM.  If you don't have spare RAM but you have enough installed, you could try using half the sticks, and then reverse the selection, to see if one half works but the other doesn't.  It would also be worth trying a different PSU if you had access to one.

I got interrupted whilst composing this reply.  I had started about 6 hours ago.  I've just now had another look at your tasks list and can see two successfully completed tasks returned in the meantime (?perhaps the problem is resolved?).  At least, this should have quadrupled your daily task limit.  You have no 'in progress' tasks so you should be able to get some more if you cancel any 'backoff' in communications by 'updating' the project in BOINC Manager.

Did the machine correct itself or did you mutter some magic incantation? :-).  It would be interesting to know how the problem was resolved (if in fact it really is resolved!).

 

Cheers,
Gary.

jhaering
jhaering
Joined: 24 Nov 05
Posts: 3
Credit: 17821921
RAC: 0

Hi, Reboot was done. No

Hi,

Reboot was done. No success. I think the

Continuous Gravitational Wave search O2 All-Sky v1.01 x86_64-pc-linux-gnu

Tasks are the problem.

The others are ok.

Greetings

Jürgen

 

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110021098220
RAC: 22706445

jhaering wrote:Tasks are the

jhaering wrote:
Tasks are the problem.

I'm sorry, but I disagree.

If you look on the website and click on the Workunit ID for any of your failed tasks - here is a random example - you will see the different computers that processed the same task.  In this example, there are two other computers that have both successfully processed (and validated) the task.  Your computer is the one that failed.  If the task is the problem, why do other computers not have a problem?

There is no guarantee, but it seems rather likely that there is some sort of problem with your hardware (or its configuration).  The big clue is that the problem started suddenly at a particular point in time.  Did anything change (software or hardware) at about that particular time.  If not, it seems rather likely that something has gone wrong with some part of your hardware and trying out different RAM modules would be one of the first things to look at.

Quote:
The others are ok.

I'm not sure what "others" refers to.  Do you mean other computers?  Do you mean other projects?  Do you mean other searches at this project?  You do have two recent validated tasks for the FGRP search.  Perhaps you also have no problems at other projects.  If so, it doesn't mean there is no hardware problem.  This is why I'm suggesting you check RAM.  I believe that GW tasks use a lot of memory.  If you have a fault in a particular memory location, it might be just the GW tasks that happen to use that particular location.

 

Cheers,
Gary.

alanb1951
alanb1951
Joined: 28 Nov 16
Posts: 18
Credit: 651033134
RAC: 350525

Is the Continuous Gravity

Is the Continuous Gravity Wave search program statically linked???  (I only run Einstein GPU work so I've not got a copy of the binary to look at...)

If it is, it might be linked with an older set of C libraries that use a now deprecated mechanism to access certain system calls -  most recent kernels have disabled vsyscall emulation, and if that's the case you'll get SIGSEGV faults if it tries to access (for example) certain system time functions...

Jürgen, that might explain why your system with a 4.9 kernel seems happy with these tasks but your 4.15 kernel machine doesn't! You could try adding kernel boot option  vsyscall=emulate at startup and seeing if it makes any difference.  (Of course, if this is the problem but the kernel has been built without the emulation available at all it won't help...

By the way, vsyscall is regarded as a security risk nowadays (do a search for details...) so the correct solution is for applications to be built either with dynamic linking or with newer static libraries that know about both vsyscall and its replacement...

Hope this helps - Al.

P.S. I use XUbuntu 16.04 with various kernels, and all of them up to 4.13 seem to have vsyscall emulation on by default,  And I had a look at an XUbuntu 18.04 with kernel 4.15 and that seems to have it enabled too...  However, I know some other distributions have disabled it.

 

jhaering
jhaering
Joined: 24 Nov 05
Posts: 3
Credit: 17821921
RAC: 0

Hi, thanks for help. The

Hi,

thanks for help.

The solution was to set the parameter to yes.

Run Linux app versions built with LIBC 2.15:
yes
no
This ensures compatibility with new Linux systems that have virtual syscalls disabled, but breaks compatibility with older systems with (G)LIBC prior to 2.15
 
Greetings
 
Juergen

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.