How do I monitor task elapsed time in real time?

cecht
cecht
Joined: 7 Mar 18
Posts: 1,434
Credit: 2,471,511,595
RAC: 796,652

Gary Roberts wrote:On a whim,

Gary Roberts wrote:
On a whim, I tried 'suspending' the stuck task.  To my surprise, a new task sprung to life and there were 2 tasks, both progressing at normal speed.   I 're-enabled' the stuck task and when one of the other two finished, the previously stuck task started up (the % value dropped a little as the last checkpoint was reloaded) and it went on to complete as normal.  To my mind, this can't be the same problem that was around a few years ago.

Just to follow up, while running O3ASE tasks at 2X concurrency on a RX 5600 XT host, out of 191 valid and 10 invalid task completions I have so far seen one that stalled out. I noticed it when it had an elapsed time of about 1.5 hr. After suspending it for a few seconds, it resumed with a new short (normal) elapsed time and completed shortly thereafter. Unfortunately I didn't note the task name so do not know whether it was one of the invalid tasks, though I suspect it was. 

The point here being that, at least for this card/system, a brief suspension is all that is needed to send a "stuck" task (one that has lost GPU processing) on its way to completion.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.