Stuck BRP4?

Chris
Chris
Joined: 9 Apr 12
Posts: 61
Credit: 45056670
RAC: 0
Topic 196909

A week ago, or so, I saw a BRP4 that had run 8 hours and resulted in no percent done. I aborted it, and the rest worked fine.

Today I have one stuck at about 66.5% done going on 6 hours. My gpu is running hotter than idle, but not full temperature. The tasks normally take 40-45 minutes.

I see this in my logs for the running task:

Quote:
4/12/2013 3:42:06 PM | Einstein@Home | Starting task p2030.20121218.G175.18-00.46.C.b0s0g0.00000_2056_0 using einsteinbinary_BRP4 version 133 (BRP4cuda32nv301) in slot 0

I suspended this one, and the next task appears to be working as normal.

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1907
Credit: 1437506147
RAC: 1216810

Stuck BRP4?

Your GTX 650 could be OC'd a bit too much and it could be the temp. also.

I have a GeForce GTX 650 Ti (2047MB) myself and did have to mess around with the clock and voltage settings to get it to run it's best and had them freeze once in a while when I first started running it here.

What does your GPU-Z say?

Here is an example of mine (I am on this host right now)

Chris
Chris
Joined: 9 Apr 12
Posts: 61
Credit: 45056670
RAC: 0

Stock GPU. I did also get two

Stock GPU. I did also get two errors this morning, but I woke up to find the computer off, so no idea what happened there. Temperature is actually down a bit from normal (CPU temp is up as the house temp comes up, so the case fans are running faster). I had plenty of valid results running as high as 65C

Running a live task:
Clock 1058.2 mhz
Mem. 2499.4 mhz
Shader 2116.5 mhz
Temp. 56C
Fan Speed 24%
Mem Used 438MB
GPU load 85%
Controller load: 55%
VEL: 0
VDDC 1.1120V

So, I resumed the task to check the actual gpu load and it reset the time to 28min and began counting up again on the % done. I guess the real test will be if it validates. Edit: uploaded and validated.

Conan
Conan
Joined: 19 Jun 05
Posts: 172
Credit: 8459254
RAC: 5683

I have a CPU version of one

I have a CPU version of one of these on my 64 bit Linux computer.
It has run for 20 hours 30 minutes, is at 43.802% and has been for a very long time, no progress is happening, it also has no Time to Completion just a blank.

I suspended and resumed the work unit and am waiting to see if it starts up normally again or continues with no progress.

Conan

mountkidd
mountkidd
Joined: 14 Jun 12
Posts: 177
Credit: 12719223088
RAC: 4957143

RE: It has run for 20 hours

Quote:
It has run for 20 hours 30 minutes, is at 43.802% and has been for a very long time, no progress is happening...


I have experienced something similar on the GPU side - seems to be taking a nap but the wu will eventually complete. My workaround has been to suspend all Boinc tasks, reboot the computer and resume Boinc. Seems to clear out the cobwebs and once in-progress tasks are completed normal task completion times return.

Gord

Conan
Conan
Joined: 19 Jun 05
Posts: 172
Credit: 8459254
RAC: 5683

I ended up restarting BOINC

I ended up restarting BOINC Manager and Client.
This made me lose 13 hours on the Einstein work unit (dropped from 20 hours to 7) and 5 hours disappeared from the Albert work unit (from 17 hours to 12).
But the time to completion has been restored and the Einstein work unit at least is progressing normally now.

(I also had a Wildlife@home work unit do something similar which also reset with the restart).

Seems that for many hours these work units were actually doing nothing at all.
Bit of a mystery that.

Other projects have been working fine during this same time.

Conan

mikey
mikey
Joined: 22 Jan 05
Posts: 12769
Credit: 1850199167
RAC: 756650

RE: I ended up restarting

Quote:

I ended up restarting BOINC Manager and Client.
This made me lose 13 hours on the Einstein work unit (dropped from 20 hours to 7) and 5 hours disappeared from the Albert work unit (from 17 hours to 12).
But the time to completion has been restored and the Einstein work unit at least is progressing normally now.

(I also had a Wildlife@home work unit do something similar which also reset with the restart).

Seems that for many hours these work units were actually doing nothing at all.
Bit of a mystery that.

Other projects have been working fine during this same time.

Conan

SOMETIMES a simple suspend and then after a 5 count a resume will restart the unit without a full shutdown and restart of Boinc. I am NOT a programmer but this happens rarely but often enough to make me think it could be a background program running that grabs the resources and then doesn't give them back when it is done. This is common in Windows, not releasing the resources, but the problem happens in Linux too and I thought it was better then that.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.