Some recent change causes OSX 10.9 on PPC4 to hang in a system call

Richard Schumacher
Richard Schumacher
Joined: 8 Aug 06
Posts: 32
Credit: 14626275
RAC: 30648
Topic 193060

When this happens no progress is made in the run. I have to quit Boinc and re-start it to un-hang whatever-it-is; suspending the run is not sufficient. This has happened thrice in the past six weeks.

Has this been seen before? What if anything should I do to diagnose it further?

6dj72cn8
6dj72cn8
Joined: 24 Jan 06
Posts: 24
Credit: 13321065
RAC: 0

Some recent change causes OSX 10.9 on PPC4 to hang in a system c

I had one work unit hang about a week ago using a G5, Mac OS 10.4.10, Boinc 5.8.17, App PPC 4.34. The processor appeared to continue at full steam but no progress had been made for five hours so apparently it was caught in a loop. First time I've ever seen that. Restarting Boinc fixed it. I check it every couple of hours now.

So, yes, I have seen it before but have nothing else useful to offer.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 714563935
RAC: 923079

RE: When this happens no

Quote:

When this happens no progress is made in the run. I have to quit Boinc and re-start it to un-hang whatever-it-is; suspending the run is not sufficient. This has happened thrice in the past six weeks.

Has this been seen before? What if anything should I do to diagnose it further?

If this happens again, it would be useful to know which of the processes consumes all the CPU (you could use "top" in terminal to find out):

a) The "einstein*..." sscience app ?
b) The "boinc" core client
c) some other process (which would also have the net effect of freezing einstein and any other BOINC science app)

CU

BRM

Richard Schumacher
Richard Schumacher
Joined: 8 Aug 06
Posts: 32
Credit: 14626275
RAC: 30648

RE: If this happens again,

Message 71570 in response to message 71569

Quote:

If this happens again, it would be useful to know which of the processes consumes all the CPU (you could use "top" in terminal to find out):

a) The "einstein*..." sscience app ?


Einstein_S5R2_4.

6dj72cn8
6dj72cn8
Joined: 24 Jan 06
Posts: 24
Credit: 13321065
RAC: 0

In my case I had two Einstein

Message 71571 in response to message 71570

In my case I had two Einstein WUs running. Only one of the two got stuck.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250193236
RAC: 35038

Some even more helpful

Some even more helpful information would be the end of stderr (/Library/Application Support/BOINC Data/slots/*/stderr.txt) of the task while it is stuck, and especially whether the output changes (it might be that just the display of progress is broken). And I would be really thankful for a Shark tree (shark is a performance analysis tool that is installed with Xcode or CHUD tools) to see what function keeps the CPU busy without progress.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250193236
RAC: 35038

RE: And I would be really

Message 71573 in response to message 71572

Quote:
And I would be really thankful for a Shark tree (shark is a performance analysis tool that is installed with Xcode or CHUD tools) to see what function keeps the CPU busy without progress.


BTW: Probably VTune could do the same on Linux and Windows for "stuck" tasks. All current Apps are compiled with debugging symbols.

BM

BM

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 714563935
RAC: 923079

RE: RE: And I would be

Message 71574 in response to message 71573

Quote:
Quote:
And I would be really thankful for a Shark tree (shark is a performance analysis tool that is installed with Xcode or CHUD tools) to see what function keeps the CPU busy without progress.

BTW: Probably VTune could do the same on Linux and Windows for "stuck" tasks. All current Apps are compiled with debugging symbols.

BM


Not sure (my evaluation key has expired) but I believe VTune under Windows requires a local PDB file and won't download it automatically from the symstore. But there was a download link , right?

CU

H-B

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250193236
RAC: 35038

RE: Not sure (my evaluation

Message 71575 in response to message 71574

Quote:
Not sure (my evaluation key has expired) but I believe VTune under Windows requires a local PDB file and won't download it automatically from the symstore. But there was a download link , right?


For the 4.33 the PDB is distributed withe the App again.

BM

BM

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 714563935
RAC: 923079

RE: RE: Not sure (my

Message 71576 in response to message 71575

Quote:
Quote:
Not sure (my evaluation key has expired) but I believe VTune under Windows requires a local PDB file and won't download it automatically from the symstore. But there was a download link , right?

For the 4.33 the PDB is distributed withe the App again.

BM

Ah, I see...I'm still running the beta :-), will switch when the dry-run point is at a time of day where I can delete app_info.xml without loosing too much work.

CU

BRM

6dj72cn8
6dj72cn8
Joined: 24 Jan 06
Posts: 24
Credit: 13321065
RAC: 0

RE: Some even more helpful

Message 71577 in response to message 71572

Quote:

Some even more helpful information would be the end of stderr (/Library/Application Support/BOINC Data/slots/*/stderr.txt) of the task while it is stuck, and especially whether the output changes (it might be that just the display of progress is broken). And I would be really thankful for a Shark tree (shark is a performance analysis tool that is installed with Xcode or CHUD tools) to see what function keeps the CPU busy without progress.

BM

Got a stuck one now, Bernd. The display in Boinc Manager has not altered for over 60 minutes, nor have the stderr or .cpt files updated in that time. Nevertheless, it is maxing out *both* CPUs. Interesting. It only maxes out one when it is actually working.

End of stderr.txt reads:
44669, 44670, c
44671, 44672, 44673, 44674, 44675, 44676, 44677, 44678, 44679, 44680, 44681, 44682, 44683, 44684, 44685, 44686, 44687, 44688, 44689, 44690, 44691, 44692, 44693, 44694, 44695, 44696, 44697, 44698, 44699, 44700, 44701, c
44702, 44703, 44704, 44705, 44706,

End of file h1_0543.40_S5R2__145_S5R2c_1_0 reads:
543.5073357382 3.712175 0.2344938 -9.7777e-11 5.0811
543.5073424488 3.725723 0.2344938 2.568e-11 5.15934
543.5073625804 3.739271 0.2344938 -3.4469e-10 5.53302
543.5073625804 3.752819 0.2344938 -9.7777e-11 5.38567

File h1_0543.40_S5R2__145_S5R2c_1_0.cpt reads:
3.712175,0.234494,44702,72482,41356798,829571

Part of client_state.xml reads:

http://einstein.phys.uwm.edu/
h1_0543.40_S5R2__145_S5R2c_1
1
434
1
2
81639.828535
0.616788
81648.509865
80920576.000000
39030784.000000
39030784.000000
0.000000

1

Shark was a disappointment. I've installed CHUD 4.4.4 but although Shark appears to take a sample, at the end of processing it does not display an output window. I reinstalled but with the same result. Any advice?

Activity Monitor shows Einsein making a lot of Mach System Calls but I don't know how to interpret the significance (if any) of that.

In Boinc Manager I hit Suspended (and it showed as such but the CPUs didn't stop), and then Resume. The task then showed as Waiting to Run (although still with full CPU activity). The checkpoint file did not update.

During all of this a Rosetta WU finished and uploaded so Boinc Manager appears working OK, at least in part.

Mac G5, 2GB, Boinc 5.8.17, App 4.34

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.