Long or non running wu.

adrianxw
adrianxw
Joined: 21 Feb 05
Posts: 242
Credit: 322654862
RAC: 0
Topic 207304

Task LATeah0027L_1076.0_0_0.0_11987760_2 is running on here, it has been running 13:48:57 and shows 1d:16:06:54 remaining, a figure slowly increasing and 25.617% complete. I've suspended it for now. Is this stuck?

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Try resuming it again to see

Try resuming it again to see if it gets going, if that doesn't work try restarting Boinc and if that doesn't work then try rebooting your computer.
If none of the above gets the task going then maybe something is wrong with it and it might be a good idea to abort it to get a new one.

adrianxw
adrianxw
Joined: 21 Feb 05
Posts: 242
Credit: 322654862
RAC: 0

Hmmm, I let it go again, and

Hmmm, I let it go again, and it dropped back to 5+% and started running, ran past the previous stop point and up to ~89% with 2:14 remaining, this went up and down 2:14 - 2:15 - 2:14...etc. Ran like that for a couple of minutes and suddenly finished and uploaded.

I only look at this machine occasionally, it might have wasted days of computation. I can't trust Einstein to be included on here any more.

 

<edit>

The next Einstein job also did the wobbly bit at 89+% complete,so that may be normal.

</edit>

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

adrianxw
adrianxw
Joined: 21 Feb 05
Posts: 242
Credit: 322654862
RAC: 0

This machine also showed the

This machine also showed the "pause" at 89.997% complete, so I doubt that is related to the issue.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117906371347
RAC: 34569678

adrianxw wrote:This machine

adrianxw wrote:
This machine also showed the "pause" at 89.997% complete, so I doubt that is related to the issue.

I presume you are talking about a GPU task.

Please be aware that crunching of data is complete by 89.997%.  At that point there is a followup stage which produces a "toplist" of the top 10 candidate signals and there is no further progress in the % estimate until it jumps to 100% as the followup stage completes.  The time for this followup is variable - perhaps up to about 2min for the type of GPU you have.  This is all perfectly normal.

I have quite a few Pitcairn series GPUs myself and on some of them (fairly infrequently - weeks to months apart) I see something similar to what you initially reported - elapsed time much longer than it should be, the % estimate continuing to tick over at a snail's pace, and the remaining time showing "--" having counted down to zero some considerable time ago.  It used to happen with BRP style tasks also so it's not just the current app.

Restarting the system invariably causes crunching to restart at the last checkpoint which is often many hours ago at a much lower % completed.  The tasks complete normally from that point onward and are validated.  I assume (but have no real idea) that the GPU / driver has crashed for whatever reason but the rest of the system (including CPU tasks) continues on as normal.   I can remember seeing elapsed times as much as 50+ hours before noticing the problem.  These days I have better monitoring so that it gets picked up quite quickly when it happens.

My hosts run in quite adverse conditions temperature wise (ambient 36C in summer).  That may be a contributing factor but not the prime factor since I have very similar hosts that never show the problem.  I suspect it may really be power quality.  In the last 12 months I've recapped more than 30 PSUs which had swollen capacitors.  I've also done a few motherboards with the same issue.  The incidence rate of this problem has reduced quite a bit of late even though the machines have all just been through a pretty hot summer.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.