progress reporting oddity on FGRPB1

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7229594872
RAC: 1152065
Topic 198404

I've now completed two FGRPB1 units on Stoll7. While I've not watched them progress with a steady gaze, I got an impression on the first one, strongly reinforced with somewhat closer observation on the second, that the progress reporting seemed to move in a steady progression toward a possible completion at something like 10 hours (elapsed or CPU--nearly the same on this host) but actual completion came at something under six hours.

FGRPB1 unit 1
FGRPB1 unit 2
On the larger scale of issues, even if this is consistently true it is a minor problem. Maybe it is not a problem at all, if these two WU in fact took some sort of "early exit" not available to more typical WU samples. The log appears to split work between a semicoherent stage which may be standardized, and followup, which may be less consistent. So perhaps mine just had less followup work than the progress report assumes.

I'm mostly posting this in hopes of attracting comment from other early FGRPB1 completion people--either confirming or denying my tentative observation.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

progress reporting oddity on FGRPB1

Quote:
I'm mostly posting this in hopes of attracting comment from other early FGRPB1 completion people--either confirming or denying my tentative observation.

yes i see similar behaviour - an estimated time of 19:08 hrs, but completing in 3-4 hrs

boinccmd --get_tasks shows on this host here

   name: LATeah0001L_16.0_0_-7e-11_36_1
   active_task_state: UNINITIALIZED
   app version num: 0
   checkpoint CPU time: 0.000000
   current CPU time: 0.000000
   fraction done: 0.000000
   estimated CPU time remaining: 68882.598257

name: LATeah0001L_16.0_0_-7.7e-11_0_0
active_task_state: EXECUTING
app version num: 100
checkpoint CPU time: 8855.802000
current CPU time: 9221.422000
fraction done: 0.456227
estimated CPU time remaining: 58136.419302

There seems to be a few more tasks in the wild now.

The GPU can go hungry for the moment...

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117773115329
RAC: 34779644

RE: ... the progress

Quote:
... the progress reporting seemed to move in a steady progression toward a possible completion ...


Are you seeing this steady progression every second?

I've just looked at examples crunching on two machines, one with a very old version of BOINC which is before the time when 'simulated' progress was introduced for the period before the writing of the first checkpoint. This particular task has been running for 8 hours and is still showing 0.000% progress. Looking at the task properties shows that there is a checkpoint around 20 mins ago. I saw another example of this exact behaviour on a different machine last night (no 'progress' after many hours) and when checking again later, the task had been completed, returned and validated, as can be seen here.

On the other machine which does have 'simulated' progress, the task has been running for 8.5 hours and the progress counter (still ticking over every second) is showing just over 60%. Once again, the task properties shows checkpoints being written with gaps of at least 15-20 mins, which is normal.

Both of these examples of odd behaviour for the progress counter seem to be indicating that BOINC is not 'aware' of the fact that checkpoints are being written. It is either showing no progress or continuing with simulated progress - depending on BOINC version - until the completion of the task intervenes and sorts things out. This is just a guess on my part. I'll need to observe the finishing of tasks to see what really happens.

Cheers,
Gary.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7229594872
RAC: 1152065

RE: Are you seeing this

Quote:
Are you seeing this steady progression every second?

Gary,

Pretty frequent, but I'm unsure as to "every second". It certainly was not waiting lots of minutes and then doing a big jump update, nor was it routinely displaying round numbers after such a jump.

I can't look now, as I completed those two tasks, and all 600 WUs so far released to the users are already shipped elsewhere.

The host in question runs Boinc 7.6.22 on Windows 7.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7229594872
RAC: 1152065

My host Stoll7 which is set

My host Stoll7 which is set to receive this work happened to get a re-issued task a few hours ago, and I watched it a little more carefully than before.

I was watching from another machine using BOINCTasks, so can't vouch for just how frequently estimated progress was updating on the host, but at every sampling interval (about every six seconds), it advanced.

When I first spotted the unit in process, it had run for one and a quarter hours (at about 96% CPU), indicated progress at 20.514% and showed a time remaining estimate which implied total run time of 5:37:38.

As I watched it, the progress reported moved along steadily, and the implied total run time grew steadily. At my last in-progress observation, 3:28:35 had elapsed, reported progress was 46.825%, and estimated time remaining was 4:21:44, for an implied total of 7:50:19 (the highest).

When next I visited, the unit was finished, having actually required 4:02:52 of elapsed time.

As the estimated time remaining actually grew between an observation at 1:15 to one taken at 1:36, this is not quite so simple as grinding along at an assumed rate of advance until reality sets in at the very end. But it is not in the range of usual behavior I have observed here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.