Stagnant BRP4-file

astro-marwil

Joined: 28 May 05

Posts: 520

Credit: 436473365

RAC: 807659

11 Aug 2011 15:15:43 UTC

Topic 195897

(moderation:

)

Yesterday I became aware of a stagnant BRP3-file. Normaly they are finished within 80 - 85 min, but this one had done 9:45h and still counting up at stadily progress of 66,7% during the next 15min. As GPU-load was 0%, memory controller load just 7% and CPU-load <0,01% (Process Explorer), I decided to abort this file. From that on every works constantly well.

For me this is the first time. But how often does this happen to other participants? And is there no mechanism installed watching the progress, abortimg the task at some limit?

I lost the succesfull crunching of 7 other files. But it was just luck, that I looked on. It could have become much longer times, even some days and the disservice proportionally bigger.

WhatÂ´s your experience in this regard?

Kind regards
Martin

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 692131685

RAC: 60197

Stagnant BRP4-file

11 Aug 2011 15:49:27 UTC

Message 106324

(moderation:

)

Quote:

Yesterday I became aware of a stagnant BRP3-file. Normaly they are finished within 80 - 85 min, but this one had done 9:45h and still counting up at stadily progress of 66,7% during the next 15min. As GPU-load was 0%, memory controller load just 7% and CPU-load <0,01% (Process Explorer), I decided to abort this file. From that on every works constantly well.

For me this is the first time. But how often does this happen to other participants?

I had a similar problem once which went away after I moved the PC to a cooler room. In my case, the CUDA Task would lock up only right at the start tho (= at 0% progress), so your problem might be different.

Quote:

And is there no mechanism installed watching the progress, abortimg the task at some limit?

Yes, every BOINC task comes with a measure of its complexity (assigned by the work generator) that will translate (based on benchmark) to a maximum allowed runtime, based on the individual computer's speed. Because the benchmarks are not very reliable, projects tend to set this value very conservatively so that tasks may run VERY long before timing out. But eventually, they will time out.

HBE

Gundolf Jahn

Joined: 1 Mar 05

Posts: 1079

Credit: 341280

RAC: 0

RE: I decided to abort this

11 Aug 2011 15:50:47 UTC

Message 106325

(moderation:

)

Quote:

I decided to abort this file.

You could have tried to suspend/resume the task and then to see if it continued normally. Sometimes a restart of the BOINC client or a reboot helps too.

Quote:

And is there no mechanism installed watching the progress, aborting the task at some limit?

There is. After a runtime ten times the originally estimated time to completion, the task is aborted with an exit code of -177 "Maximum elapsed time exceeded".

GruÃŸ,
Gundolf
[edit]Less than a minute difference! ;-)[/edit]

Computer sind nicht alles im Leben. (Kleiner Scherz)

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4275

Credit: 245502727

RAC: 11439

RE: There is. After a

11 Aug 2011 20:49:03 UTC

Message 106326 in response to message 106325

(moderation:

)

Quote:

There is. After a runtime ten times the originally estimated time to completion, the task is aborted with an exit code of -177 "Maximum elapsed time exceeded".

Note that (AFAIK) for this timeout the client still measures 'runtime' as CPU time. If the CUDA App runs 20x as fast as the CPU App on your system, this means that that the task will time out after 200x the normal execution time, in your case after 16 days (if running 24h a day).

Rechenkuenstler

Joined: 22 Aug 10

Posts: 138

Credit: 102567115

RAC: 0

RE: Yesterday I became

12 Aug 2011 9:19:01 UTC

Message 106327

(moderation:

)

Quote:

Yesterday I became aware of a stagnant BRP3-file. Normaly they are finished within 80 - 85 min, but this one had done 9:45h and still counting up at stadily progress of 66,7% during the next 15min. As GPU-load was 0%, memory controller load just 7% and CPU-load <0,01% (Process Explorer), I decided to abort this file. From that on every works constantly well.

For me this is the first time. But how often does this happen to other participants? And is there no mechanism installed watching the progress, abortimg the task at some limit?

I lost the succesfull crunching of 7 other files. But it was just luck, that I looked on. It could have become much longer times, even some days and the disservice proportionally bigger.

WhatÂ´s your experience in this regard?

Kind regards
Martin

I got the same problem and it could be related to BRP4, because it didn't occur with BRP 3 tasks.
Since BRP4 tasks do not give enough workload for my GPU's I've added some new projects with GPU tasks. So what happens is, when the switch between the projects occurs, than sometimes the nvidia driver crashes and for a few seconds there is a black screen, before it resumes. It is significant, that GPU tasks are sent to Nirvana only after such a driver crash. The problem can be solved very easily with restarting the boinc client. The GPU tasks resume at the last checkpoint.

So the reason, why I mention, that this could be related to BRP4 is, that I have one machine, where I'm running only Einstein and Milkyway GPU tasks. Nothing of the nwe GPU projects. And even on this machine, there occurs the same problem and it was NOT the case with BRP3 tasks.

NVIDIA Driver version is 275.33 on three machines
BOINC version is 2.12.26
OS is WIN7 Ultimate x64 on two machines and Vista 32bit

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2142

Credit: 2788205465

RAC: 722333

RE: I got the same problem

12 Aug 2011 10:00:28 UTC

Message 106328 in response to message 106327

(moderation:

)

Quote:

I got the same problem and it could be related to BRP4, because it didn't occur with BRP 3 tasks.
Since BRP4 tasks do not give enough workload for my GPU's I've added some new projects with GPU tasks. So what happens is, when the switch between the projects occurs, than sometimes the nvidia driver crashes and for a few seconds there is a black screen, before it resumes. It is significant, that GPU tasks are sent to Nirvana only after such a driver crash. The problem can be solved very easily with restarting the boinc client. The GPU tasks resume at the last checkpoint.

So the reason, why I mention, that this could be related to BRP4 is, that I have one machine, where I'm running only Einstein and Milkyway GPU tasks. Nothing of the nwe GPU projects. And even on this machine, there occurs the same problem and it was NOT the case with BRP3 tasks.

NVIDIA Driver version is 275.33 on three machines
BOINC version is 2.12.26
OS is WIN7 Ultimate x64 on two machines and Vista 32bit

Driver crashes at task switch could be related to the BOINC API issue we discussed here.

Driver 275.33 on Windows 7 is certainly in the vulnerable zone for that problem, which will only be fixed when the BRP app (the same app for BRP3 and BRP4) is modified and re-compiled against the new API, as described in AppCoprocessor at 'Cleanup on premature exit'.

Until then the only known workround is to revert to 266.xx series video drivers, for those cards which are supported by drivers from that era.

Rechenkuenstler

Joined: 22 Aug 10

Posts: 138

Credit: 102567115

RAC: 0

I have reinstalled the 266.xx

15 Aug 2011 8:14:54 UTC

Message 106329 in response to message 106328

(moderation:

)

I have reinstalled the 266.xx drivers on all machines. There is another problem with that driver. For reasons, that I cannot evaluate, boinc sometimes doesn't request new BRP4 or CPU tasks, even if there is non to crunch. The message is: Not reporting or requesting tasks.

After stopping and restarting Boinc client, it immediately requests new tasks. My guess is, that this still occurs at the switching between the projects.

Stagnant BRP4-file

Forums › Cruncher's Corner

Stagnant BRP4-file

RE: I decided to abort this

RE: There is. After a

RE: Yesterday I became

RE: I got the same problem

I have reinstalled the 266.xx

Comment viewing options

Forums › Cruncher's Corner