Progress bars stuck

mikey
mikey
Joined: 22 Jan 05
Posts: 12687
Credit: 1839093161
RAC: 3779

RE: RE: I'm sorry, but

Quote:
Quote:

I'm sorry, but I'm at a complete loss as to what you're trying to tell me :-).

Since FVIT hasn't responded, he has probably decided that E@H behaviour is not to his liking. He hasn't downloaded any further work and I suspect he may not even be listening to any suggestions being made.

However, since you responded to my post specifically, can you please enlighten me as to what I'm missing?


Basically his first two tasks made progress with no restarts, and had he not aborted his 2nd task it would probably have finished O.K.

The third task he was obviously watching when it started, and didn't like it when Boinc showed progress, then jumped back to a zero when he caused the app to exit repeatily,
Basically if he's going to watch the kettle come to the boil, it's going to take longer than he expects, and he should go on an do something else, and not worry about it,
If he's got 'Leave tasks in memory while suspended?' set to no, if he interrupts progress repeatedly he's not going make much progress on any project apps.

The Boinc showing progress before an app checkpoints, is a curveball that the devs thought was a good idea, but it confuses volunteers.

Claggy

Could the reason it repeatedly restarts Boinc crunching be due to the "while process usage is less than" setting? Could his pc be doing something that forces Boinc to quit and resume multiple times is my question I guess?

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2699403
RAC: 0

RE: RE: RE: I'm sorry,

Quote:
Quote:
Quote:

I'm sorry, but I'm at a complete loss as to what you're trying to tell me :-).

Since FVIT hasn't responded, he has probably decided that E@H behaviour is not to his liking. He hasn't downloaded any further work and I suspect he may not even be listening to any suggestions being made.

However, since you responded to my post specifically, can you please enlighten me as to what I'm missing?


Basically his first two tasks made progress with no restarts, and had he not aborted his 2nd task it would probably have finished O.K.

The third task he was obviously watching when it started, and didn't like it when Boinc showed progress, then jumped back to a zero when he caused the app to exit repeatily,
Basically if he's going to watch the kettle come to the boil, it's going to take longer than he expects, and he should go on an do something else, and not worry about it,
If he's got 'Leave tasks in memory while suspended?' set to no, if he interrupts progress repeatedly he's not going make much progress on any project apps.

The Boinc showing progress before an app checkpoints, is a curveball that the devs thought was a good idea, but it confuses volunteers.

Claggy

Could the reason it repeatedly restarts Boinc crunching be due to the "while process usage is less than" setting? Could his pc be doing something that forces Boinc to quit and resume multiple times is my question I guess?


No, It should stay in memory during 'Suspend work if CPU usage is above', Even GPU apps stay in memory during it,
(there is a bug with the current Stock ATI/AMD Seti v7 app, it should stop during benchmarks & 'Suspend work if CPU usage is above' occasions, But stay in memory,
the x41zc Cuda app does stop (and stay in memory), the ATI/AMD Seti v7 app doesn't respond)

Claggy

asphynx
asphynx
Joined: 1 Aug 13
Posts: 1
Credit: 250931
RAC: 0

I've been having the same

I've been having the same issue, and I've aborted multiple Einstein@home tasks because I thought they were frozen and would never complete. It was really bizarre to me that I'd watch one proceed, "time remaining" counting down and "progress (%)" counting up every second or two... then it would reach a point where they just stopped and remained static. Every once in a while, the time remaining will drop a second, but then increase again (i.e. 17:37:34 drops to 17:37:33, but then goes back up to 17:37:34).

I have a task running right now that seems to be frozen (FGRP-SSE2)... you're saying that if I just let it chug, it'll eventually hit a checkpoint and update the stats? (This seems like a foolish way of programming it. Why can't the program keep estimating like it did before?)

Also, how does this fit in with the BOINC preferences setting
"tasks checkpoint to disk at most every [ ] seconds"?
(Mine is set for every 60 seconds... not sure if I ever changed that from default.)

For the record:
Windows 8.1, 64-bit
Toshiba Satellite E45-B
Intel Core i5-4210 CPU, 6 GB RAM
BOINC version 7.4.36 (x64)
I turn my computer on in the morning and let it run until I go to bed at night, so BOINC usually runs 12-15 hours at a time. I don't usually restart BOINC manually; if I close the window it minimizes to the system tray. I usually let my computer hibernate at night instead of doing a full shutdown, so it should still keep tasks in RAM.

mikey
mikey
Joined: 22 Jan 05
Posts: 12687
Credit: 1839093161
RAC: 3779

RE: Also, how does this

Quote:

Also, how does this fit in with the BOINC preferences setting
"tasks checkpoint to disk at most every [ ] seconds"?
(Mine is set for every 60 seconds... not sure if I ever changed that from default.)

This setting does not override the application settings, it just says don't do it ANY MORE THAN every 60 seconds. It does NOT say 'do a checkpoint' every 60 seconds. Alot of people have found the hard drive gets used less if you change it to 900 seconds, 15 minutes. Checkpoints really only matter if you shut down your pc mid crunching, or the thing crashes, they act like a recovery point when either happens. That way you don't have to start all over again from the beginning on the workunit.

MB Atlanos
MB Atlanos
Joined: 11 Feb 05
Posts: 30
Credit: 1758276
RAC: 0

I have a FGRP4-Beta task

I have a FGRP4-Beta task http://einsteinathome.org/task/485568198, that is stuck for days at 95.428%. Time remaining is ca. 36 min +- 1 min and the task is not using the CPU anymore.

Other FGRP4-Beta tasks before was running normaly.

The stderr.txt shows, that the task restarts itself every 3 min after its finished the follow-up phase and called boinc_finish:

.
.
.
% checkpoint 28
% Time spent on semicoherent stage: 310764.0990s
% Writing semicoherent output file.

% Following up candidate number: 1
% Refining in S
% Following-up in P

% Following up candidate number: 2
% Refining in S
% Following-up in P

% Following up candidate number: 3
% Refining in S
% Following-up in P

% Following up candidate number: 4
% Refining in S
% Following-up in P

% Following up candidate number: 5
% Refining in S
% Following-up in P
% Writing follow-up output file.
FPU status flags: COND_1 PRECISION
21:31:43 (37365): [normal]: done. calling boinc_finish(0).
21:31:43 (37365): called boinc_finish
21:34:47 (51142): [normal]: This Einstein@home App was built at: Nov 25 2014 14:26:43

21:34:47 (51142): [normal]: Start of BOINC application 'hsgamma_FGRP4_1.05_x86_64-apple-darwin__FGRP4-Beta'.
21:34:47 (51142): [debug]: 2.1e+15 fp, 3.2e+09 fp/s, 657498 s, 182h38m17s98
command line: hsgamma_FGRP4_1.05_x86_64-apple-darwin__FGRP4-Beta --inputfile ../../projects/einstein.phys.uwm.edu/LATeah0101E.dat --outputfile results.cand.out --alpha 2.38805042651 --delta -0.837362055916 --skyRadius 2.940791e-03 --ldiBins 15 --f0start 48 --f0Band 32 --firstSkyPoint 2912 --numSkyPoints 28 --f1dot -6.03e-10 --f1dotBand 1e-12 --df1dot 5.185796503e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 5 --cohFollow 5 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.15 --reftime 55806 --f0orbit 0.005 --debug 1
output files: 'results.cand.out' '../../projects/einstein.phys.uwm.edu/LATeah0101E_80.0_2912_-6.02e-10_0_0' 'results.cand.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah0101E_80.0_2912_-6.02e-10_0_1'
21:34:47 (51142): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
21:34:47 (51142): [normal]: WARNING: Resultfile '../../projects/einstein.phys.uwm.edu/LATeah0101E_80.0_2912_-6.02e-10_0_0' present - doing nothing
21:34:47 (51142): [debug]: Set up communication with graphics process.
mv: results.cand.out: No such file or directory
mv: results.cand.out: No such file or directory
mv: results.cand.out: No such file or directory
21:37:51 (51166): [normal]: This Einstein@home App was built at: Nov 25 2014 14:26:43

21:37:51 (51166): [normal]: Start of BOINC application 'hsgamma_FGRP4_1.05_x86_64-apple-darwin__FGRP4-Beta'.
21:37:51 (51166): [debug]: 2.1e+15 fp, 3.2e+09 fp/s, 657498 s, 182h38m17s98
command line: hsgamma_FGRP4_1.05_x86_64-apple-darwin__FGRP4-Beta --inputfile ../../projects/einstein.phys.uwm.edu/LATeah0101E.dat --outputfile results.cand.out --alpha 2.38805042651 --delta -0.837362055916 --skyRadius 2.940791e-03 --ldiBins 15 --f0start 48 --f0Band 32 --firstSkyPoint 2912 --numSkyPoints 28 --f1dot -6.03e-10 --f1dotBand 1e-12 --df1dot 5.185796503e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 5 --cohFollow 5 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.15 --reftime 55806 --f0orbit 0.005 --debug 1
output files: 'results.cand.out' '../../projects/einstein.phys.uwm.edu/LATeah0101E_80.0_2912_-6.02e-10_0_0' 'results.cand.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah0101E_80.0_2912_-6.02e-10_0_1'
21:37:51 (51166): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
21:37:51 (51166): [normal]: WARNING: Resultfile '../../projects/einstein.phys.uwm.edu/LATeah0101E_80.0_2912_-6.02e-10_0_0' present - doing nothing
21:37:51 (51166): [debug]: Set up communication with graphics process.
mv: results.cand.out: No such file or directory
mv: results.cand.out: No such file or directory
mv: results.cand.out: No such file or directory
21:40:56 (51192): [normal]: This Einstein@home App was built at: Nov 25 2014 14:26:43

21:40:56 (51192): [normal]: Start of BOINC application 'hsgamma_FGRP4_1.05_x86_64-apple-darwin__FGRP4-Beta'.

Is there a method to finish this task gracefully and rescue the work of 13,5 hours? Suspending/Resuming the task or restarts of Boinc or the Mac did not help.

Mac mini 2010, OS X 10.9.5 with Boinc 7.4.36 http://einsteinathome.org/host/4198207

Mad Chemist
Mad Chemist
Joined: 9 May 07
Posts: 3
Credit: 2389344
RAC: 0

I've been running

I've been running LATeah0103E_80.0_832_-6.98e-10_2 for four days with the bar at 95.153% and the completion time cycling thru a ten minute range. The computer is on for about 10 hours a day. According to the Activity monitor the user CPU usage is running about 2.80% and doesn't even show the this task is running. No other BOINC tasks are running. Should I wait a few more days to assume that something is not working properly?? This is the second task to do this in the last month. Too much time wasted!!

mikey
mikey
Joined: 22 Jan 05
Posts: 12687
Credit: 1839093161
RAC: 3779

RE: I've been running

Quote:
I've been running LATeah0103E_80.0_832_-6.98e-10_2 for four days with the bar at 95.153% and the completion time cycling thru a ten minute range. The computer is on for about 10 hours a day. According to the Activity monitor the user CPU usage is running about 2.80% and doesn't even show the this task is running. No other BOINC tasks are running. Should I wait a few more days to assume that something is not working properly?? This is the second task to do this in the last month. Too much time wasted!!

Try suspending and then restarting the Project, it could be the unit got messed up along the way and is 'confused' as to how much work is really done. If this is the case it should jump back to the last checkpoint and then go forward again. You could also just abort it and let someone else try crunching the unit.

Mad Chemist
Mad Chemist
Joined: 9 May 07
Posts: 3
Credit: 2389344
RAC: 0

Suspending didn't work, so

Suspending didn't work, so aborting.
Thanks for the suggestion

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2699403
RAC: 0

Another option is to just

Another option is to just restart Boinc, Suspending a task, or the project won't necessarily free the app from memory (depending on Boinc preferences), restarting Boinc will free it from memory, allowing the task to continue from the last checkpoint.

Claggy

MB Atlanos
MB Atlanos
Joined: 11 Feb 05
Posts: 30
Credit: 1758276
RAC: 0

RE: Another option is to

Quote:

Another option is to just restart Boinc, Suspending a task, or the project won't necessarily free the app from memory (depending on Boinc preferences), restarting Boinc will free it from memory, allowing the task to continue from the last checkpoint.

Claggy


Restarting Boinc or the Mac didnt solve this problem. Tried it back then. According to the stderr.txt the WU finished but keep restarting the app every few minutes. See the segment of the log 5 messages above.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.