Unusually short crunch time

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73,516,529
RAC: 0

RE: RE: RE: If

Message 80834 in response to message 80832

Quote:
Quote:
Quote:

If possible, dig through the client logs and see when this task was started and finished ("wall clock time"). Correctly counting the CPU time isn't that trivial on Linux, and problems on that are common.

BM

Okay, I can try that. In the meantime, here's another one that looks abnormal. This time, my wingman finished first, and has an unusually short runtime for his 3500+ machine. (Mine in this case is the Xeon, which hasn't started crunching that one yet.)

A second strange one.

I think this "second strange one" can be neglected, look at the other results from that host: most claim 0 seconds crunch time, and it's also using a very old BOINC client.

But that first result looks strange indeed.

CU
Bikeman

Yeah, I see that now. I hadn't bothered to look before.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,492
Credit: 63,783,634,432
RAC: 53,698,978

RE: I found some '1155.00

Message 80835 in response to message 80831

Quote:
I found some '1155.00 sec' time in the results of my Kentsfield.
But the BOINC Manager showed about 2,5-3 hours for them.
I didn't find the reason of this behaviour yet.

If you look at the stderr.out for all the 1155.00 sec results, you will see that each one was restarted at least once (and some several times) during the execution. The ones that are being reported at around 11K secs were not stopped and restarted at any stage (or so it seems from a pretty hasty examination).

At first I saw the very large number of skypoints that were being "redone" when the task resumed from a checkpoint and thought that was an error until I realised that your app on steroids is doing an unbelievably large number of skypoints per checkpoint and so will have to repeat an awful lot if you previously stopped it just 30 secs after a checkpoint :).

Cheers,
Gary.

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4,527,270
RAC: 0

RE: If you look at the

Message 80836 in response to message 80835

Quote:

If you look at the stderr.out for all the 1155.00 sec results, you will see that each one was restarted at least once (and some several times) during the execution. The ones that are being reported at around 11K secs were not stopped and restarted at any stage (or so it seems from a pretty hasty examination).

At first I saw the very large number of skypoints that were being "redone" when the task resumed from a checkpoint and thought that was an error until I realised that your app on steroids is doing an unbelievably large number of skypoints per checkpoint and so will have to repeat an awful lot if you previously stopped it just 30 secs after a checkpoint :).

Oh. You are right! Thanks.

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73,516,529
RAC: 0

Here's another

Here's another strange-looking result, from the same P-III 667 as before.

Second strange result

I checked the log, and it shows that this workunit started processing at 07:18, and finished at 08:21 the same morning.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,516
Credit: 468,997,165
RAC: 60,034

Keep a safe distance from

Keep a safe distance from that black hole to avoid spacetime disturbances :-)

Seriously , thanks for the report!

CU
Bikeman

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73,516,529
RAC: 0

RE: Keep a safe distance

Message 80839 in response to message 80838

Quote:

Keep a safe distance from that black hole to avoid spacetime disturbances :-)

Seriously , thanks for the report!

CU
Bikeman

Hmmmm. . .

Are you saying that I could be entering "The Twilight Zone"? Or, maybe the LHC has fired up without anyone being informed.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9,352,143
RAC: 0

RE: Here's another

Message 80840 in response to message 80837

Quote:

Here's another strange-looking result, from the same P-III 667 as before.

Second strange result

I checked the log, and it shows that this workunit started processing at 07:18, and finished at 08:21 the same morning.

Just out of curiosity, did you happen to note what the first strange was granted finally? I missed catching it when it cleared, and the second one probably won't be turned in by the wingman for a week to ten days or so.

Alinator

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73,516,529
RAC: 0

RE: RE: Here's another

Message 80841 in response to message 80840

Quote:
Quote:

Here's another strange-looking result, from the same P-III 667 as before.

Second strange result

I checked the log, and it shows that this workunit started processing at 07:18, and finished at 08:21 the same morning.

Just out of curiosity, did you happen to note what the first strange was granted finally? I missed catching it when it cleared, and the second one probably won't be turned in by the wingman for a week to ten days or so.

Alinator

The first strange one was granted full credit, and validated fine for both my wingman and me.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9,352,143
RAC: 0

RE: RE: RE: Here's

Message 80842 in response to message 80841

Quote:
Quote:
Quote:

Here's another strange-looking result, from the same P-III 667 as before.

Second strange result

I checked the log, and it shows that this workunit started processing at 07:18, and finished at 08:21 the same morning.

Just out of curiosity, did you happen to note what the first strange was granted finally? I missed catching it when it cleared, and the second one probably won't be turned in by the wingman for a week to ten days or so.

Alinator

The first strange one was granted full credit, and validated fine for both my wingman and me.

OK, although it's still a mystery why the app all of sudden stopped checkpointing and logging run time.

This definitely serves to show that quasi-fixed, server side scoring has its advantages if the project work lends itself to being able to use it. ;-)

Alinator

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73,516,529
RAC: 0

Well folks, I've solved the

Well folks, I've solved the mystery. It turns out that I neglected to install NTP on this machine, and the system clock was losing time like crazy. I didn't notice until I untarred the new beta app, and received a bunch of error messages about how the files had time-stamps from far into the future.

Why Ubuntu doesn't have NTP installed by default, I'll never understand.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.