Client Errors of S5R2/S5R3 Apps

Stick
Stick
Joined: 24 Feb 05
Posts: 790
Credit: 33128936
RAC: 1108

You should read the GNU/Linux

Message 71146 in response to message 71145

You should read the GNU/Linux S5R3 App 4.09 available for Beta test thread and consider trying the Beta App.

Quote:

I also had one result with a signal 11 error, running Ubuntu Linux 6.06. It failed during a time when I was having internet issues. First, the xtremlab site was down, so I turned communications on and off a few times, then I lost my internet connection completely for a while.

I was wondering if it would be feasible to respond to some errors of this type by restarting at the last checkpoint. If that were done, there would need to be some way to insure that it didn't restart repeatedly. Perhaps the task could be suspended, with an option for the user to restart or abort it. I don't think it would be a good idea to require input to make the decision though. Perhaps it would be aborted automatically if it happened more than once, or more than once without significant progress from the last checkpoint.

If anyone has more to add about possible causes of the error, I would be interested in hearing them also. For now, I'm just assuming it was just a fluke.

http://einsteinathome.org/task/87532936

5.4.9

process got signal 11


Conan
Conan
Joined: 19 Jun 05
Posts: 172
Credit: 8286697
RAC: 7142

This is probably one of the

This is probably one of the first if not the first Error I have had on Einstein (at least that I can recently recall)
This WU
It has an error I have seen lots of on other projects, Error 161, Xfer Error.

The computer running this is an AMD X2 4800+ at standard clock speed running Win XP. Previous WU's in this batch all were run at a quicker speed to this one (by 5,000 to 10,000 seconds), which seems to have finished (claimed credit is correct), but failed on file upload so gets no cigar.

rhb
rhb
Joined: 15 Aug 06
Posts: 6
Credit: 1287768
RAC: 0

RE: You should read the

Message 71148 in response to message 71146

Quote:

You should read the GNU/Linux S5R3 App 4.09 available for Beta test thread and consider trying the Beta App.

Quote:

I also had one result with a signal 11 error

http://einsteinathome.org/task/87532936

5.4.9

process got signal 11


I had another failure. Signal 11 again.

http://einsteinathome.org/task/87787188

Interestingly, I had more problems with my internet connection today. If I have any more errors, I may try out the beta version.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250505763
RAC: 34713

RE: Could you make an

Quote:
Could you make an overview post of what bugs remain? Is it mostly the input domain issues, and is that with the XLAL code???


Looking at the latest error rates from recent Apps (4.15) I doubt even more than before that this is actually a bug in the program, but rather a problem of a few machines. Some of them might be overclocked, some may experience some transient heat or other problem - I don't know, but the overall rate for this error has fallen below a percent.

There is something left in the Linux Apps that causes segfaults; I'd mostly expect it to be a bug left in the current BOINC library.

The DLL load problem of Windows still affects us noticeable. Our 4.15 App narrowed it down to probably be KERNEL32.DLL, and we're currently investigating the reasons named in a Microsoft Knowledge Base article, but there's not much I can further do in the App code to track this down.

Judging from individual computing errors the most "unreliable" platform is MacOS PPC, apparently due to an occasional(!) "invalid instruction" (signal 4). This looks like a problem rather in the build process than in the app code, but it needs to be fixed anyway.

BM

BM

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: RE: Could you make an

Message 71150 in response to message 71149

Quote:
Quote:
Could you make an overview post of what bugs remain? Is it mostly the input domain issues, and is that with the XLAL code???

Looking at the latest error rates from recent Apps (4.15) I doubt even more than before that this is actually a bug in the program, but rather a problem of a few machines. Some of them might be overclocked, some may experience some transient heat or other problem - I don't know, but the overall rate for this error has fallen below a percent.

Well, you do know that I'm overclocked by quite a bit (2750MHz-[at]-1.6v vs. 2200MHz @ 1.4v), but it is very stable. I had a couple of invalid units at LHC a few weeks ago, but no problems since. LHC is very sensitive to FP variances, even between processors, so it may have happened anyway...

My whole point? I have only had 2 errors here with S5R3, and both of those had to do with power failures. My only "complaint" is the variation of runtimes...

As for the DLL loading issue, it doesn't look like you'd be able to do anything with it from the application standpoint, other than perhaps making the app as lean as it can be and making sure that memory is released as it is no longer needed. If only C/C++ had Garbage Collection ;)

Do people contact you about these errors, or do you just see them in logs?

Brian

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 727422027
RAC: 1224352

We are not alone, other

We are not alone, other projects seem to suffer from similar problems, which have now led to a code change in boinc that handles the DLL issue differently, according to this discussion on QMC

Let's hope this helps.
CU
H-B

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250505763
RAC: 34713

RE: We are not alone, other

Message 71152 in response to message 71151

Quote:
We are not alone, other projects seem to suffer from similar problems, which have now led to a code change in boinc that handles the DLL issue differently, according to this discussion on QMC


1. I'm not sure that change is made in the current 5.10 Core Client branch, or in the 6.x one which is also already under development.

2. This may fix the problem for Windows Vista shutdown. However we see the problem with all versions of Windows, roughly with the same distribution we have in the hosts database, e.g. most frequently on XPSP2. Its main cause might likely be a full desktop heap, and as Rom's note correctly states "Only a reboot can fix the desktop heap." This shouldn't be done by the client (automatically).

BM

BM

josep
josep
Joined: 9 Mar 05
Posts: 63
Credit: 1156542
RAC: 0

Another signal 11 error for

Another signal 11 error for me, after one month of absolutely correct results:

http://einsteinathome.org/task/88138694

And it has happened also after a temporary failure of my DSL connection, exactly the same problem than my previous post here one month ago. The computer was running yesterday for several hours with no Internet access, attempting many times to connect to the scheduler with no success.

Today I have manually reseted the DSL router, and then the result has been reported with signal 11 error.

I use the linux 4.14 beta app, running on OpenSuSE 10.2

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 727422027
RAC: 1224352

So, what does the core client

So, what does the core client need desktop heap space for anyway?

Screensaver activation? Tray symbol display ? That's all non-essential stuff and I would not mind if BOINC core client carried on regardless when this desktop stuff is no longer available because of heap shortage. Maybe this fix can help after all for XP as well.

CU

H-B

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250505763
RAC: 34713

RE: So, what does the core

Message 71155 in response to message 71154

Quote:
So, what does the core client need desktop heap space for anyway?


Every (non-console) Windows App needs a fragment of "Desktop Heap", or the KERNEL32.DLL and USER32.DLL will fail to load. In particular this applies to the BOINC Apps started by the client. If the Desktop heap is full, the App can't be started, which results in a Client Error that indicates a DLL load problem.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.