Killing Processes with Manual Refresh

Rauch Christian
Rauch Christian
Joined: 1 Mar 07
Posts: 8
Credit: 109429
RAC: 0
Topic 192613

I noticed here, that, wehen updating manually (by clicking the Update (Aktualisieren it says in German) button, running processes from e@h are killed and filed with an Compute Error :(

This happened twice here (as it's an X2, 4 work units)

The log looks like this:

So 22 Apr 2007 13:19:52 CEST|Einstein@Home|Sending scheduler request: Requested by user
So 22 Apr 2007 13:19:52 CEST|Einstein@Home|Reporting 2 tasks
So 22 Apr 2007 13:20:08 CEST|Einstein@Home|Scheduler RPC succeeded [server version 509]
So 22 Apr 2007 13:20:08 CEST|Einstein@Home|Deferring communication for 1 min 0 sec
So 22 Apr 2007 13:20:08 CEST|Einstein@Home|Reason: requested by project
So 22 Apr 2007 13:20:09 CEST|Einstein@Home|Deferring communication for 1 min 0 sec
So 22 Apr 2007 13:20:09 CEST|Einstein@Home|Reason: Unrecoverable error for result h1_0236.00_S5R2__75_S5R2c_1 (process got signal 11)
So 22 Apr 2007 13:20:09 CEST|Einstein@Home|Computation for task h1_0236.00_S5R2__75_S5R2c_1 finished
So 22 Apr 2007 13:20:09 CEST|Einstein@Home|Output file h1_0236.00_S5R2__75_S5R2c_1_0 for task h1_0236.00_S5R2__75_S5R2c_1 absent
So 22 Apr 2007 13:20:10 CEST|Einstein@Home|Deferring communication for 1 min 0 sec
So 22 Apr 2007 13:20:10 CEST|Einstein@Home|Reason: Unrecoverable error for result h1_0236.00_S5R2__74_S5R2c_0 (process got signal 11)
So 22 Apr 2007 13:20:10 CEST|Einstein@Home|Computation for task h1_0236.00_S5R2__74_S5R2c_0 finished
So 22 Apr 2007 13:20:10 CEST|Einstein@Home|Output file h1_0236.00_S5R2__74_S5R2c_0_0 for task h1_0236.00_S5R2__74_S5R2c_0 absent
So 22 Apr 2007 13:21:11 CEST|Einstein@Home|Sending scheduler request: To fetch work
So 22 Apr 2007 13:21:11 CEST|Einstein@Home|Requesting 172800 seconds of new work, and reporting 2 completed tasks
So 22 Apr 2007 13:21:32 CEST|Einstein@Home|Scheduler RPC succeeded [server version 509]
So 22 Apr 2007 13:21:32 CEST|Einstein@Home|Deferring communication for 1 min 0 sec
So 22 Apr 2007 13:21:32 CEST|Einstein@Home|Reason: requested by project
So 22 Apr 2007 13:21:34 CEST|Einstein@Home|Starting h1_0236.00_S5R2__70_S5R2c_1
So 22 Apr 2007 13:21:34 CEST|Einstein@Home|Starting task h1_0236.00_S5R2__70_S5R2c_1 using einstein_S5R2 version 414
So 22 Apr 2007 13:21:34 CEST|Einstein@Home|Starting h1_0236.00_S5R2__69_S5R2c_0
So 22 Apr 2007 13:21:34 CEST|Einstein@Home|Starting task h1_0236.00_S5R2__69_S5R2c_0 using einstein_S5R2 version 414

So does anyone know why this happens?

Edit: OS is openSUSE 10.2, Boinc is 5.8.15

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

Killing Processes with Manual Refresh

I think on Linux the 5.8.15 is not a "recommended version", maybe that has to do with the problem? Just guessing here, but you might want to try up- or downgrading. That's the best I can come up with, since luckily I never came across that error myself, neither on Windows nor Linux...

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 753891392
RAC: 1169754

RE: I noticed here, that,

Quote:

I noticed here, that, wehen updating manually (by clicking the Update (Aktualisieren it says in German) button, running processes from e@h are killed and filed with an Compute Error :(

This happened twice here (as it's an X2, 4 work units)

The log looks like this:

So 22 Apr 2007 13:19:52 CEST|Einstein@Home|Sending scheduler request: Requested by user
So 22 Apr 2007 13:19:52 CEST|Einstein@Home|Reporting 2 tasks
So 22 Apr 2007 13:20:08 CEST|Einstein@Home|Scheduler RPC succeeded [server version 509]
So 22 Apr 2007 13:20:08 CEST|Einstein@Home|Deferring communication for 1 min 0 sec
So 22 Apr 2007 13:20:08 CEST|Einstein@Home|Reason: requested by project
So 22 Apr 2007 13:20:09 CEST|Einstein@Home|Deferring communication for 1 min 0 sec
So 22 Apr 2007 13:20:09 CEST|Einstein@Home|Reason: Unrecoverable error for result h1_0236.00_S5R2__75_S5R2c_1 (process got signal 11)
So 22 Apr 2007 13:20:09 CEST|Einstein@Home|Computation for task h1_0236.00_S5R2__75_S5R2c_1 finished
So 22 Apr 2007 13:20:09 CEST|Einstein@Home|Output file h1_0236.00_S5R2__75_S5R2c_1_0 for task h1_0236.00_S5R2__75_S5R2c_1 absent
So 22 Apr 2007 13:20:10 CEST|Einstein@Home|Deferring communication for 1 min 0 sec
So 22 Apr 2007 13:20:10 CEST|Einstein@Home|Reason: Unrecoverable error for result h1_0236.00_S5R2__74_S5R2c_0 (process got signal 11)
So 22 Apr 2007 13:20:10 CEST|Einstein@Home|Computation for task h1_0236.00_S5R2__74_S5R2c_0 finished
So 22 Apr 2007 13:20:10 CEST|Einstein@Home|Output file h1_0236.00_S5R2__74_S5R2c_0_0 for task h1_0236.00_S5R2__74_S5R2c_0 absent
So 22 Apr 2007 13:21:11 CEST|Einstein@Home|Sending scheduler request: To fetch work
So 22 Apr 2007 13:21:11 CEST|Einstein@Home|Requesting 172800 seconds of new work, and reporting 2 completed tasks
So 22 Apr 2007 13:21:32 CEST|Einstein@Home|Scheduler RPC succeeded [server version 509]
So 22 Apr 2007 13:21:32 CEST|Einstein@Home|Deferring communication for 1 min 0 sec
So 22 Apr 2007 13:21:32 CEST|Einstein@Home|Reason: requested by project
So 22 Apr 2007 13:21:34 CEST|Einstein@Home|Starting h1_0236.00_S5R2__70_S5R2c_1
So 22 Apr 2007 13:21:34 CEST|Einstein@Home|Starting task h1_0236.00_S5R2__70_S5R2c_1 using einstein_S5R2 version 414
So 22 Apr 2007 13:21:34 CEST|Einstein@Home|Starting h1_0236.00_S5R2__69_S5R2c_0
So 22 Apr 2007 13:21:34 CEST|Einstein@Home|Starting task h1_0236.00_S5R2__69_S5R2c_0 using einstein_S5R2 version 414

So does anyone know why this happens?

Edit: OS is openSUSE 10.2, Boinc is 5.8.15

Same here ... also on BOINC 5.8.15, but never happened during S5R1 or S5RI, this problem seems to be new with the S5R2 client.

I will update to 5.8.17 now and see whether this helps.

CU

BRM

Rauch Christian
Rauch Christian
Joined: 1 Mar 07
Posts: 8
Credit: 109429
RAC: 0

RE: I will update to

Message 62763 in response to message 62762

Quote:

I will update to 5.8.17 now and see whether this helps.

Did so too, now I'm waiting for some WUs to complete, let's see, if manual reporting crashes the running processes again.

Rauch Christian
Rauch Christian
Joined: 1 Mar 07
Posts: 8
Credit: 109429
RAC: 0

RE: RE: I will update to

Message 62764 in response to message 62763

Quote:
Quote:

I will update to 5.8.17 now and see whether this helps.

Did so too, now I'm waiting for some WUs to complete, let's see, if manual reporting crashes the running processes again.

And it's getting worse.

Mo 23 Apr 2007 12:46:24 CEST||General prefs: no separate prefs for work; using your defaults
Mo 23 Apr 2007 12:46:25 CEST|Einstein@Home|Restarting task h1_0236.00_S5R2__68_S5R2c_0 using einstein_S5R2 version 414
Mo 23 Apr 2007 12:46:25 CEST|Einstein@Home|Restarting task h1_0236.00_S5R2__67_S5R2c_0 using einstein_S5R2 version 414
Mo 23 Apr 2007 16:42:55 CEST|Einstein@Home|Sending scheduler request: To fetch work
Mo 23 Apr 2007 16:42:55 CEST|Einstein@Home|Requesting 9 seconds of new work
Mo 23 Apr 2007 16:43:00 CEST|Einstein@Home|Scheduler RPC succeeded [server version 509]
Mo 23 Apr 2007 16:43:00 CEST|Einstein@Home|Deferring communication for 1 min 0 sec
Mo 23 Apr 2007 16:43:00 CEST|Einstein@Home|Reason: requested by project
Mo 23 Apr 2007 16:43:02 CEST|Einstein@Home|Deferring communication for 1 min 0 sec
Mo 23 Apr 2007 16:43:02 CEST|Einstein@Home|Reason: Unrecoverable error for result h1_0236.00_S5R2__68_S5R2c_0 (process got signal 11)
Mo 23 Apr 2007 16:43:02 CEST|Einstein@Home|Computation for task h1_0236.00_S5R2__68_S5R2c_0 finished
Mo 23 Apr 2007 16:43:02 CEST|Einstein@Home|Output file h1_0236.00_S5R2__68_S5R2c_0_0 for task h1_0236.00_S5R2__68_S5R2c_0 absent
Mo 23 Apr 2007 16:43:02 CEST|Einstein@Home|Starting h1_0236.00_S5R2__65_S5R2c_1
Mo 23 Apr 2007 16:43:02 CEST|Einstein@Home|Starting task h1_0236.00_S5R2__65_S5R2c_1 using einstein_S5R2 version 414
Mo 23 Apr 2007 16:43:03 CEST|Einstein@Home|Deferring communication for 1 min 0 sec
Mo 23 Apr 2007 16:43:03 CEST|Einstein@Home|Reason: Unrecoverable error for result h1_0236.00_S5R2__67_S5R2c_0 (process got signal 11)
Mo 23 Apr 2007 16:43:03 CEST|Einstein@Home|Computation for task h1_0236.00_S5R2__67_S5R2c_0 finished
Mo 23 Apr 2007 16:43:03 CEST|Einstein@Home|Output file h1_0236.00_S5R2__67_S5R2c_0_0 for task h1_0236.00_S5R2__67_S5R2c_0 absent
Mo 23 Apr 2007 16:44:04 CEST|Einstein@Home|Sending scheduler request: To fetch work
Mo 23 Apr 2007 16:44:04 CEST|Einstein@Home|Requesting 86400 seconds of new work, and reporting 2 completed tasks
Mo 23 Apr 2007 16:44:25 CEST|Einstein@Home|Scheduler RPC succeeded [server version 509]
Mo 23 Apr 2007 16:44:25 CEST|Einstein@Home|Deferring communication for 1 min 0 sec
Mo 23 Apr 2007 16:44:25 CEST|Einstein@Home|Reason: requested by project
Mo 23 Apr 2007 16:44:28 CEST|Einstein@Home|Starting h1_0236.00_S5R2__64_S5R2c_0
Mo 23 Apr 2007 16:44:28 CEST|Einstein@Home|Starting task h1_0236.00_S5R2__64_S5R2c_0 using einstein_S5R2 version 414

I did NOT press the update button!

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 753891392
RAC: 1169754

RE: RE: RE: I will

Message 62765 in response to message 62764

Quote:
Quote:
Quote:

I will update to 5.8.17 now and see whether this helps.

Did so too, now I'm waiting for some WUs to complete, let's see, if manual reporting crashes the running processes again.

And it's getting worse.

...
I did NOT press the update button!


But it seems that BOINC itself wanted to update to get new work. Seems that the S2R5 client doesn't like that :-(.

In the debugging output of that WU you'll probably find some message about an assertion failure for glutTimerFunc() , a function from an OpenGL utility lib. Strange enough, I wasn't even using the screen-saver / visualization window at all when I got this.

CU
BRM

Rauch Christian
Rauch Christian
Joined: 1 Mar 07
Posts: 8
Credit: 109429
RAC: 0

RE: In the debugging

Message 62766 in response to message 62765

Quote:


In the debugging output of that WU you'll probably find some message about an assertion failure for glutTimerFunc() , a function from an OpenGL utility lib. Strange enough, I wasn't even using the screen-saver / visualization window at all when I got this.

CU
BRM

hm, I cannot see something like this, just dozens of lines "SIGABRT: abort called". grepping through the whole Boinc directory does not show anything, too,
only 2 places found in the e@h binaries(one each).

One of those WUs is this one

And I was not running the visualization window too, only the advanced manager view.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 753891392
RAC: 1169754

RE: RE: In the debugging

Message 62767 in response to message 62766

Quote:
Quote:


In the debugging output of that WU you'll probably find some message about an assertion failure for glutTimerFunc() , a function from an OpenGL utility lib. Strange enough, I wasn't even using the screen-saver / visualization window at all when I got this.

CU
BRM

hm, I cannot see something like this, just dozens of lines "SIGABRT: abort called". grepping through the whole Boinc directory does not show anything, too,
only 2 places found in the e@h binaries(one each).

One of those WUs is this one

And I was not running the visualization window too, only the advanced manager view.

This is one of my failed WUs

http://einsteinathome.org/task/83371387

Also only running the manager in advanced view.

I was able to complete one S5R2 run, tho. This one was on a server that doesn't even have openGL installed...sooooo...just speculating here. I tried to disable graphics by moving away the *.so but BOINC is just too smart and re-installs it from the server.

Anyway I don't need the graphics, how can I disable it for good in a safe way?

CU

BRM

Pooh Bear 27
Pooh Bear 27
Joined: 20 Mar 05
Posts: 1376
Credit: 20312671
RAC: 0

Maybe they are trying to

Maybe they are trying to force people NOT to use the update button, because it causes database slowdowns? Just a theory.

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

I don't think so. They may

I don't think so. They may not like people updating too frequently, but trashing whole WUs and losing their data would hurt them just as much as the crunchers. They'd never get any science done with an approach like that- and annoy the crunchers, which is also not a good thing ;-)

Rauch Christian
Rauch Christian
Joined: 1 Mar 07
Posts: 8
Credit: 109429
RAC: 0

RE: Maybe they are trying

Message 62770 in response to message 62768

Quote:
Maybe they are trying to force people NOT to use the update button, because it causes database slowdowns? Just a theory.

And a wrong one, see my other post. There e@h refreshed itself and killed my processes.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.