GNU/Linux S5R3 "power users" App 4.21 available

rroonnaalldd
rroonnaalldd
Joined: 12 Dec 05
Posts: 116
Credit: 537221
RAC: 0

My last WU's died ever with

My last WU's died ever with Signal 11 with Boinc-5.10.21/28.
Today i switched to Boinc-5.10.30 and this WU dies with Signal 34.
In Boinc-Log i found that:
file projects/einstein.phys.uwm.edu/einstein_S5R3_4.02_i686-pc-linux-gnu not found
file projects/einstein.phys.uwm.edu/einstein_S5R3_4.02_i686-pc-linux-gnu.so not found
After starting einstein-wu within a few seconds:
Starting einstein
Starting einstein using einstein_S5R3 version 4.21
Computation for task einstein finished
Output file einstein for task einstein absent

I don't know why. I've seen multiple Signal 11 also on Spinhenge-WU's. My other projects Simap, QMC, Seti, Rieselsieve, Chess960, LHC running without problems.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4267
Credit: 244933143
RAC: 16332

RE: My last WU's died ever

Message 76393 in response to message 76392

Quote:

My last WU's died ever with Signal 11 with Boinc-5.10.21/28.
Today i switched to Boinc-5.10.30 and this WU dies with Signal 34.
In Boinc-Log i found that:
file projects/einstein.phys.uwm.edu/einstein_S5R3_4.02_i686-pc-linux-gnu not found
file projects/einstein.phys.uwm.edu/einstein_S5R3_4.02_i686-pc-linux-gnu.so not found
After starting einstein-wu within a few seconds:
Starting einstein
Starting einstein using einstein_S5R3 version 4.21
Computation for task einstein finished
Output file einstein for task einstein absent

I don't know why. I've seen multiple Signal 11 also on Spinhenge-WU's. My other projects Simap, QMC, Seti, Rieselsieve, Chess960, LHC running without problems.


Thanks for the report.

Ths "file ... 4.02 ... not found" could safely be ignored if the 4.21 App was working on your machine. However this doesn't seem to be the case, the App got a "signal 4", which is is an "illegal instruction". There shouldn't be anything in the App that a Core2 CPU can't handle. I'd suggest to download the archive again and check the md5 checksum before unpacking it again (overwriting the old files). You may also want to let the client get the 4.02 App files or manually download them to get rid of the error messages.

The references to the other projects are helpful, thanks! Probably the Apps of Spinhenge like the one of Einstein are built with a 'bleeding edge' version of the BOINC library, while the other project use older versions.

BM

BM

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

I'm getting the same error

I'm getting the same error about the 4.02 app not being found, but since some of my WUs get crunched successfully and the problems don't occur directly after that error message I don't think that, in my case, it is related to the signal 11.
I'm running BOINC 5.3.31 now, luckily most of my projects seem to have switched to fixed credit so it won't even hurt me in the way of getting credit. We'll see if the older client makes any difference.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109409614529
RAC: 35121326

RE: I'm getting the same

Message 76395 in response to message 76394

Quote:
I'm getting the same error about the 4.02 app not being found....

It's not really an error message - rather an information message.

You are crunching with the 4.21 app using the app_info.xml file created by Bernd and distributed as part of the 4.21 package. Bernd has no way of knowing what "brands" of tasks may be in your cache of work at the time you decide to switch to the 4.21 app. Depending on what previous app (beta or otherwise) you may have been running at the time you decided to switch to 4.21, it is possible that you could have tasks "branded" with any one of quite a few different earlier versions.

Most of the earlier versions (all except 4.02 if I recall correctly) have compatible checkpoint file formats so 4.21 can handle partially completed tasks from all earlier versions except 4.02. The app_info.xml file therefore has to list the 4.02 version as being "required" in case you just happened to have 4.02 branded work in your cache (extremely unlikely, surely - probably impossible these days).

So, if you don't have any 4.02 tasks in your cache, you won't ever have a problem but you will be informed that 4.02 is a specified app and you don't actually have it. That why I called it an information message rather than an error message. It would become an error message if the project suddenly stopped issuing the "current" work and reverted to issuing only work that was branded as requiring the 4.02 app - something that's just not going to happen :).

EDIT:
Everybody running the 4.21 app is going to see this same information message every time they start BOINC. The way to avoid the message (mentioned by Bernd if you read carefully) is to place a copy of the 4.02 .exe in the project folder with the 4.21 app. You can only do that if you kept a copy of that file as there's no visible link to old executables AFAIK.

I actually have a copy but I regard the message as so unimportant that I haven't bothered to install it to make the message go away :).

EDIT2:
1. You can get 4.02 manually - see next post
2. Removed reference to .pdb file - my brain was stuck in Windows mode for some reason :).

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109409614529
RAC: 35121326

RE: You may also want to

Message 76396 in response to message 76393

Quote:
You may also want to let the client get the 4.02 App files or manually download them to get rid of the error messages.

Whilst the app_info.xml mechanism is in place, the local client cannot get any app files - it's against the rules :). If you delete app_info.xml, you won't get 4.02 as the current default version is 4.20. The only way I could imagine the client getting 4.02 would be to do a dodgy edit to "rebrand" a task already in your cache and I'm certainly not suggesting that.

Of course manual downloading is possible if you know where to look, since there is no directly visible, easy to follow link on the website pointing you to the page where older versions of files reside. I didn't even know it was possible until just now when I decided to look up URLs for current files in client_state.xml and then go look to see if all the old stuff is there as well. I suppose I should have realised that it would all be there :).

Cheers,
Gary.

th3
th3
Joined: 24 Aug 06
Posts: 208
Credit: 2208434
RAC: 0

Had network problems and got

Had network problems and got a load of Signal 11 with 4.21 on this host:
http://einsteinathome.org/host/1085263

Other than that the app is nice, got some sub 23,000 results when overclocked to 2.8GHz (Intel E2140 dual core 1.6GHz 1MB cache). The spread in computing times are not so big compared to what i seen on some AMD hosts in this thread, but that could still change, i dont have that many successful results yet.

KSMarksPsych
KSMarksPsych
Moderator
Joined: 15 Oct 05
Posts: 2702
Credit: 4090227
RAC: 0

It definitely doesn't like

It definitely doesn't like the loss of the network. My ISP cut me off (their own fault) at about 5:20 this morning (don't even ask why I was awake).

Here's the log from the daemon.

2008-01-09 05:17:38 [Hydrogen@Home] Scheduler RPC succeeded [server version 601]
2008-01-09 05:17:38 [Hydrogen@Home] Deferring communication for 2 hr 36 min 15 sec
2008-01-09 05:17:38 [Hydrogen@Home] Reason: no work from project
2008-01-09 05:20:30 [Einstein@Home] Deferring communication for 1 min 0 sec
2008-01-09 05:20:30 [Einstein@Home] Reason: Unrecoverable error for result h1_0724.60_S5R2__174_S5R3a_1 (process got signal 11)
2008-01-09 05:20:30 [Einstein@Home] Computation for task h1_0724.60_S5R2__174_S5R3a_1 finished
2008-01-09 05:20:30 [Einstein@Home] Output file h1_0724.60_S5R2__174_S5R3a_1_0 for task h1_0724.60_S5R2__174_S5R3a_1 absent
2008-01-09 05:21:10 [PrimeGrid] Sending scheduler request: To fetch work
2008-01-09 05:21:10 [PrimeGrid] Requesting 864 seconds of new work
2008-01-09 05:21:10 [Einstein@Home] Deferring communication for 1 min 0 sec
2008-01-09 05:21:10 [Einstein@Home] Reason: Unrecoverable error for result h1_0724.60_S5R2__173_S5R3a_0 (process got signal 11)
2008-01-09 05:21:10 [Einstein@Home] Computation for task h1_0724.60_S5R2__173_S5R3a_0 finished
2008-01-09 05:21:10 [Einstein@Home] Output file h1_0724.60_S5R2__173_S5R3a_0_0 for task h1_0724.60_S5R2__173_S5R3a_0 absent
2008-01-09 05:22:30 [---] Project communication failed: attempting access to reference site
2008-01-09 05:22:30 [PrimeGrid] Scheduler request failed: couldn't resolve host name
2008-01-09 05:22:30 [PrimeGrid] Deferring communication for 1 min 0 sec
2008-01-09 05:22:30 [PrimeGrid] Reason: scheduler request failed
2008-01-09 05:23:50 [---] Access to reference site failed - check network connection or proxy configuration.

http://einsteinathome.org/task/90742227
http://einsteinathome.org/task/90727597

http://einsteinathome.org/host/1086583

The stock Windows app survived the loss of network.

http://einsteinathome.org/host/1086583

Kathryn :o)

Einstein@Home Moderator

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752987842
RAC: 1376008

RE: It definitely doesn't

Message 76399 in response to message 76398

Quote:

It definitely doesn't like the loss of the network. My ISP cut me off (their own fault) at about 5:20 this morning (don't even ask why I was awake).
.....
http://einsteinathome.org/host/1086583

The stock Windows app survived the loss of network.

http://einsteinathome.org/host/1086583


Kathryn,

You're definitely posting too early in the morning (or suffering from lack of sleep) - both of those are the same host ID!

But since you mention the stock Windows app - you must, by definition, be running a different build of BOINC too. Is it the app, or BOINC, that causes the unrecoverable error when the network goes down?

KSMarksPsych
KSMarksPsych
Moderator
Joined: 15 Oct 05
Posts: 2702
Credit: 4090227
RAC: 0

RE: RE: It definitely

Message 76400 in response to message 76399

Quote:
Quote:

It definitely doesn't like the loss of the network. My ISP cut me off (their own fault) at about 5:20 this morning (don't even ask why I was awake).
.....
http://einsteinathome.org/host/1086583

The stock Windows app survived the loss of network.

http://einsteinathome.org/host/1086583


Kathryn,

You're definitely posting too early in the morning (or suffering from lack of sleep) - both of those are the same host ID!

But since you mention the stock Windows app - you must, by definition, be running a different build of BOINC too. Is it the app, or BOINC, that causes the unrecoverable error when the network goes down?

A serious lack of sleep and a splitting headache from the kids screaming earlier today. I should go to bed soon.

http://einsteinathome.org/host/1086585 is the Windows host and http://einsteinathome.org/host/1086583 is the Linux host.

They are running different version of the core client. Windows is 5.10.30 (I think) installed as a service and Linux is 5.10.21 installed via rpm as a system daemon.

As far as it being the app or the client, I'm not sure. I saw similar reports of work crashing upon network loss at ABC. But I'm not sure if they ended with signal 11s.

The Linux host is wired, so I can easily pull the network cable for testing purposes.

Kathryn :o)

Einstein@Home Moderator

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4267
Credit: 244933143
RAC: 16332

RE: They are running

Message 76401 in response to message 76400

Quote:
They are running different version of the core client. Windows is 5.10.30 (I think) installed as a service and Linux is 5.10.21 installed via rpm as a system daemon.


Does this run BOINC and the App as root? (try "ps -ef | grep eistein" or similar)?

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.