Client Error

Ed1934158
Ed1934158
Joined: 10 Nov 04
Posts: 62
Credit: 14481483
RAC: 0

I have the same problem now.

I have the same problem now. I installed the new Ubuntu version, named the computer the same, and client errors started. GPUGRID@home and Rosseta@home are working fine, but Einstein@home just gives client errors. That computer also worked fine before, with a same boinc version 6.45. I have another computer with the same version of Ubuntu and boinc, and there is no problem.

Could it be possible that einstein@home "thinks" that it wasn't reinstalled and that there are no applications downloaded that run the tasks, so it downloads units, and has nothing to do work without, so it assumes that error happened?

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 167

RE: I installed the new

Message 88846 in response to message 88845

Quote:
I installed the new Ubuntu version


If that is Ubuntu 9.0.4 (Jaunty), then do know that the OS itself has been reported as the culprit of errors on other BOINC (and non-BOINC) projects.

But your error is "process exited with code 255 (0xff, -1)", which isn't a download error but more an error of a library that is misplaced. First to check is if you have all the 32bit compatibility libraries in place. Einstein doesn't supply a 64bit application, only 32bit and still not all Linux distros supply the 32bit compatibility (ia32) libs by default.

Are all your libs included in the environment (env) path?

Ed1934158
Ed1934158
Joined: 10 Nov 04
Posts: 62
Credit: 14481483
RAC: 0

RE: RE: I installed the

Message 88847 in response to message 88846

Quote:
Quote:
I installed the new Ubuntu version

If that is Ubuntu 9.0.4 (Jaunty), then do know that the OS itself has been reported as the culprit of errors on other BOINC (and non-BOINC) projects.

But your error is "process exited with code 255 (0xff, -1)", which isn't a download error but more an error of a library that is misplaced. First to check is if you have all the 32bit compatibility libraries in place. Einstein doesn't supply a 64bit application, only 32bit and still not all Linux distros supply the 32bit compatibility (ia32) libs by default.

Are all your libs included in the environment (env) path?


I really don't know. Except that on 3 other systems that I ran boinc/einstein on Jaunty 64bit work without any problems. I'll try to install older version from repository, and run 6.45 one for GPUgrid.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 715192062
RAC: 941040

RE: RE: Are all your

Message 88848 in response to message 88847

Quote:
Quote:

Are all your libs included in the environment (env) path?


I really don't know. Except that on 3 other systems that I ran boinc/einstein on Jaunty 64bit work without any problems.

There's an easy test: go to the directory where the einstein-executables are. Depending on whether you installed BOINC yourself from a tar.gz or it is prt of your duistribution, this might be something like

~/BOINC/projects/einstein.phys.uwm.edu

or

/var/lib/boinc-client/projects/einstein.phys.uwm.edu

then do

ldd einstein_S5R5_1.01_i686-pc-linux-gnu_2

The output will show whether all necessary libraries can be found or not.

CU
Bikeman

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1886
Credit: 1399104734
RAC: 1128238

(Gundolf, I was not looking

(Gundolf, I was not looking for the advice/information from a person who only has Credit: 130,159 in 4 years here that imagines things when he see's red text....maybe I was whispering)
_____________________________________________________

These as you can tell are the same pc's I have been using doing Einstein and LHC for many years now and as I said they have no problem running LHC,Seti,or the new Radio Pulsar's

Only does this with the S5R5's and I can sit there and "reset" over and over but they still just sit there after downloading unless I just happen to get lucky and pull up a Radio Pulsar unit on one or 2 on the dual processor machine.

I used the new version and older ones downloaded only from here http://boinc.berkeley.edu/download.php

Never tested any other version.

And it was working for quite a while on the P4 2.5 and I have used that one since I started here over 4 years ago and the dual for about 3 years.

Nothing was changed on either pc and they have no problems doing anything other than running the S5R5's

Now a few hours ago the dual finally loaded 2 of them and has been running them for over 7 hours and they appear to be ok and maybe when they send them back I will get more to work on this one at least........but they refuse to work on the P2 2.5

The dual sent back close to 50 units with "client error" before getting these 2 running right now.

And as you can see before this started on May 7th it had no problem getting 462,001 total credit (and the P4 2.5 got over 266,000 credits)

Once again none of my 4 machines ever have a problem doing any other WU's other than the S5R5's

I have been doing this on my personal pc's since I started with Seti Classic in 2000 and I tend to never ask anyone for any help until I have went over everything several times myself.

So all I can say now is if there are no tips or advice on this then no reply works better than some complaining about colored text and in all my years here I am about the only one I see that doesn't just type tiny black text on a blank page all in one continuous paragraph.

(you can also read my profile page and see who will be happy about this in the last paragraph since I won't continue spending thousands of dollars building these machines just to do this if it won't work.......so much for my dreams of a dual-quad project lately)


adrianxw
adrianxw
Joined: 21 Feb 05
Posts: 242
Credit: 322654862
RAC: 0

RE: And it is quite

Quote:
And it is quite annoying to read posts that are shouted with bold, red and size 16. So, I refrain from further suggestions to resolving your problems.

Likewise, and simmilarly ignored. You're not doing yourself a favour MAGIC.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2951396942
RAC: 690512

RE: (Gundolf, I was not

Message 88851 in response to message 88849

Quote:
(Gundolf, I was not looking for the advice/information from a person who only has Credit: 130,159 in 4 years here that imagines things when he see's red text....maybe I was whispering)


And I find that the knowledge shown by people's words are far more important that either the size of the type or the total of their credit - though you're welcome to compare mine.

Also, could you please set your stats box as a true signature rather than an image in the body text - it's much kinder for dial-up users, who can turn off images they don't need.

Once the presentational issues have been disposed of, maybe someone will address your computation problems, which as Gundolf says seems to be particular to your machine - your wingmen complete the same tasks successfully.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 167

I know, I haven't got enough

Message 88852 in response to message 88849

I know, I haven't got enough credit to be able to answer you, but hear me out anyway. Just look for clues in Result '(result)' exited with zero status but no 'finished' file/Task (task) exited with a DLL initialization error/Unrecoverable error for result (task) (too many exit(0)s).

Of course, as long as you compare amount of credits with the knowledge that someone has about BOINC and its quirks and errors, (Is IQ calculated based on BOINC credits these days?), you won't get very far around here.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

RE: (Gundolf, I was not

Message 88853 in response to message 88849

Quote:

(Gundolf, I was not looking for the advice/information from a person who only has Credit: 130,159 in 4 years here that imagines things when he see's red text....maybe I was whispering)
_____________________________________________________

These as you can tell are the same pc's I have been using doing Einstein and LHC for many years now and as I said they have no problem running LHC,Seti,or the new Radio Pulsar's

Only does this with the S5R5's and I can sit there and "reset" over and over but they still just sit there after downloading unless I just happen to get lucky and pull up a Radio Pulsar unit on one or 2 on the dual processor machine.

I used the new version and older ones downloaded only from here http://boinc.berkeley.edu/download.php

Never tested any other version.

And it was working for quite a while on the P4 2.5 and I have used that one since I started here over 4 years ago and the dual for about 3 years.

Nothing was changed on either pc and they have no problems doing anything other than running the S5R5's

Now a few hours ago the dual finally loaded 2 of them and has been running them for over 7 hours and they appear to be ok and maybe when they send them back I will get more to work on this one at least........but they refuse to work on the P2 2.5

The dual sent back close to 50 units with "client error" before getting these 2 running right now.

And as you can see before this started on May 7th it had no problem getting 462,001 total credit (and the P4 2.5 got over 266,000 credits)

Once again none of my 4 machines ever have a problem doing any other WU's other than the S5R5's

I have been doing this on my personal pc's since I started with Seti Classic in 2000 and I tend to never ask anyone for any help until I have went over everything several times myself.

So all I can say now is if there are no tips or advice on this then no reply works better than some complaining about colored text and in all my years here I am about the only one I see that doesn't just type tiny black text on a blank page all in one continuous paragraph.

(you can also read my profile page and see who will be happy about this in the last paragraph since I won't continue spending thousands of dollars building these machines just to do this if it won't work.......so much for my dreams of a dual-quad project lately)

Well, Adrian, Richard, and Ageless pretty much covered the etiquette issues.

From a technical POV, and given your stated background, I have to assume you have been living under a rock as far as BOINC is concerned.

Otherwise, you would have known there are critical loops and some blocking IO routines in BOINC CC's (even to this day), especially in the parts which interface to the host OS and outside world.

How does that apply here you might ask? Well, what it means is there are circumstances and conditions which are somewhat unique to each host which can result in the CC effectively 'stalling' from time to time. When this happens, it prevents the CC from being able to send the required 'heartbeats' to the running science applications and causes them to make a 'forced' exit (just like they are supposed to do to prevent them from running as orphaned processes according to the BOINC framework guidelines).

Several sources which are known to be able to cause this problem are unusual/unexpected delays and/or failures in disk IO, network IO, and/or memory IO. Empirical data from the field has shown the following:

1.) AV software which puts an exclusive lock on files while scanning them may cause a forced exit (or worse).

2.) Some types of network communication delays may result in a forced exit.

3.) Situations which result in 'sluggish' Disk IO may result in a forced exit.

4.) Memory IO problems typically result in a fatal error abort, but I suppose it's theoretically possible to have it just result in a forced exit.

The reason you are getting an abort is there is a limit of ~100 forced exits. IIRC, the problem is that the CC keeps a running tally of the number of times it has to restart the application for unusual reasons. Unusual in this case is defined as the CC itself didn't tell the application to stop. Apparently and unfortunately the CC's doesn't have the ability to recognize it is the one which messed up by not sending heartbeats when it should have.

The severity of the problem ranges from just being merely annoying (by clogging up the logs) to rendering the host virtually unserviceable as a cruncher (as in your case).

Oh and BTW, the P4 and PD ARE exhibiting the exact same problem on SAH as they are here. I guess you just didn't bother to 'listen' to what the CC was trying to tell you, or weren't able to connect the dots. Also, there have been changes to your hosts you have forgotten to take into account. Windows Service Packs and updates. I can tell you for a fact from my own personal experience that updates for 2K since SP4, and starting with SP3 on XP, forced app exits due to lost heartbeats from the CC have become more prevalent (they were fairly rare before that) and effects both 5x and 6x CC's.

Alinator

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.