Gamma-ray Pulsar task at 100% but will not end

toulouse1712
toulouse1712
Joined: 23 Aug 12
Posts: 2
Credit: 1253844
RAC: 0
Topic 207071

Hello,

I have a couple of these tasks running on my Mac Mini at the present time, and one of them is showing 100% under the Progress Column, but refuses to end. I have had this happen occasionally in the past, but usually end up Aborting the task concerned; I see this as a waste of my computer time, so would like to know how I might resolve this issue for the future.

 

Unless I can resolve this problem, I will no longer contribute to Einstein at Home. 

 

Thanks for any assistance

 

Mick

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110039757303
RAC: 22397083

toulouse1712 wrote:I have a

toulouse1712 wrote:
I have a couple of these tasks running on my Mac Mini at the present time, and one of them is showing 100% under the Progress Column, but refuses to end. I have had this happen occasionally in the past, but usually end up Aborting the task concerned; I see this as a waste of my computer time, so would like to know how I might resolve this issue for the future.

Hi Mick,
Welcome to the Einstein forums.

A bit depends on how long you have waited for such tasks to finish.  The progress counter is approximate and can give a false impression of where things really are.  At the end of crunching there is a followup stage which creates a list of the top candidate 'signals' and this can take a variable amount of time.  If you have waited more than an hour or two to see if a task will finish, then there is likely to be some sort of issue.

The first thing to try is to stop and restart BOINC.  That will cause the task to be restarted from the last saved checkpoint.  If after a further period of time (say 1hr) nothing is changing, you could try a complete reboot.  It's unlikely to be some sort of 'bad task'.  It's more likely to be some sort of issue with how your OS is handling the workload.

I had a look at your tasks list on the website.  One thing that strikes me is the difference between CPU time and elapsed (run) time for completed tasks.  The difference is large (and variable) and indicates that there are significant other things happening on your machine.  For comparison, take a look at this tasks list.  It's a late 2009 vintage iMac with 8GB RAM.  It belongs to my daughter and does normal business office duties.  I just happen to be typing this reply on it at the moment :-).  It's slower than your machine (it's almost 9 years old) but there is little difference in the two times on it.  It would seem that its load from non-BOINC work is much less than your load.

BOINC is designed to 'get out of the way' when your machine needs to do other things.  This is what creates the difference between CPU time and run time (BOINC is still running but the CPU cycles are going elsewhere).  In trying to diagnose what is happening in your situation, you need to tell us what else (besides other BOINC projects) your machine does.  As you can see from my example, normal office duties don't need much CPU time.  At the moment this machine has at least 50 firefox tabs open and a few other things going on and I don't even notice an impact from BOINC - and BOINC doesn't seem to be troubled by my other work.  I've never seen a task get 'stuck' on it and I use it quite regularly - I like the big screen :-)  - and underneath it's really unix anyway :-).

 

Quote:
Unless I can resolve this problem, I will no longer contribute to Einstein at Home.

The small number of staff here do an excellent job of keeping this project running very smoothly indeed.  The majority of 'issues' are really outside their area of control.  The rest of us are all volunteers who will make an effort to help people resolve these sorts of problems.  I'm not saying it's not some sort of 'project related' issue that is causing you angst.  I'm just pointing out that it's most likely to be something that you have to find and that other volunteers will be willing to help.  Of course, whether you leave or not is entirely up to you.  Putting the statement 'out there' doesn't do much good in helping you resolve the issue, though :-(.

 

Cheers,
Gary.

toulouse1712
toulouse1712
Joined: 23 Aug 12
Posts: 2
Credit: 1253844
RAC: 0

Hello Gary, and thanks for

Hello Gary, and thanks for your comments.

I've now taken a closer look at the problem that I am experiencing with the original task. What appears to be happening is  that progress is showing as 100%, elapsed time is showing as 08:36:00 and counts up to 08:37:50 and then repeats this cycle endlessly, i.e. reset elapsed time to 08:36:00 and count up again.

 

I have tried your suggestions about restarting BOINC, and rebooting the machine but neither these or anything else that I have tried seems to make any difference to this particular task. This includes a rest of the project as well I think.

 

The Mac Mini is not heavily loaded; it is used for web browsing and email, but not a lot else aside from BOINC tasks. I usually have 3 or 4 pages open in Safari, although thinking about it I do sometimes appear to have some issues with that browser. 

 

My closing comment in the original post was out of sheer frustration at not being able to resolve this issue myself. I think, based upon your suggestions that I have done everything that has been recommended, but still do not have a resolution to the problem. Multiple other tasks from this project have been run and completed quite successfully, so I don't know what else to try.

 

Thanks for any further suggestions

 

Mick

 

 

 

solling2
solling2
Joined: 20 Nov 14
Posts: 219
Credit: 1564139881
RAC: 36826

Heard that virus scanners can

Heard that virus scanners can create endless loops unintendedly but don't know whether that's true.

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 127813162
RAC: 332116

We get reports from time to

We get reports from time to time about such tasks but we could not reproduce them. Based on your observation it seems that the task is failing somehow and restarts from the last checkpoint. To further analyze this we would need the complete contents of the slot directory of this particular task.

In the Tasks tab of the BOINC Manager you can click on Properties after selecting the stuck task and see which slot directory it is in. Then navigate with the finder to "/Library/Application Support/BOINC Data/slots" where you should find the subdirectory shown in the properties. Please zip the contents of this directory and send this to us. I can provide you with an URL via private message where you can easily upload the zip.

Nigel Garvey
Nigel Garvey
Joined: 4 Oct 10
Posts: 50
Credit: 18342341
RAC: 55573

Christian Beer wrote:To

Christian Beer wrote:
To further analyze this we would need the complete contents of the slot directory of this particular task.

I had a similar task a couple of weeks ago which went into a loop half-way through and I had to abort it. I now have another looping at 100%. Would you like the slot directory for it too?

 

Mid-2009 MacBook Pro

Mac OS 10.11.6

NG

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 127813162
RAC: 332116

Nigel Garvey wrote:Christian

Nigel Garvey wrote:
Christian Beer wrote:
To further analyze this we would need the complete contents of the slot directory of this particular task.

I had a similar task a couple of weeks ago which went into a loop half-way through and I had to abort it. I now have another looping at 100%. Would you like the slot directory for it too?

 

Mid-2009 MacBook Pro

Mac OS 10.11.6

Yes. I send you a PM with a link where to upload the archive.

Thanks.

John I Prewitt
John I Prewitt
Joined: 31 May 12
Posts: 4
Credit: 259373174
RAC: 7006

Christian, I also have this

Christian,

I also have this problem on my iMac 27-inch, Late 2015 with the latest version of Mojave (10.14.5).  The progress column shows 100%, the Elapsed column increases for a while and then resets it time by about 2 minutes. Current value is about 09:30:..  This goes on for hours.

This issue happens so often that for the time being, I now Abort all tasks for Gamma-Ray pulsar search #5 1.08 (FGRPSSE) Applications. I have a zip file associated with this task.  Would you like me to upload it?

John

John I Prewitt
John I Prewitt
Joined: 31 May 12
Posts: 4
Credit: 259373174
RAC: 7006

Christian, I looked at the

Christian,

I looked at the stderr.txt file in the slot folder for task failing to complete and found the following.

% C 89 11
% Writing follow-up output file.
FPU status flags:
06:10:51 (13629): [normal]: done. calling boinc_finish(0).
06:10:51 (13629): called boinc_finish
06:13:53 (14009): [normal]: This Einstein@home App was built at: Jul 26 2017 12:06:48

06:13:53 (14009): [normal]: Start of BOINC application 'hsgamma_FGRP5_1.08_x86_64-apple-darwin__FGRPSSE'.
06:13:53 (14009): [debug]: 2.1e+15 fp, 5.7e+09 fp/s, 365926 s, 101h38m45s89
command line: hsgamma_FGRP5_1.08_x86_64-apple-darwin__FGRPSSE --inputfile ../../projects/einstein.phys.uwm.edu/LATeah0056F.dat --alpha 4.7717882413 --delta -0.1998787927 --skyRadius 0.00306804779 --ldiBins 15 --f0start 1480 --f0Band 16 --firstSkyPoint 372801 --numSkyPoints 79 --f1dot -1.0e-13 --f1dotBand 1.0e-13 --df1dot 1.838485756e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 4194304.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.15 --reftime 56265.0 --f0orbit 0.005 --freeRadiusFactor 2 --mismatch 0.15 --debug 1 -o LATeah0056F_1496.0_372801_0.0_0_0.out
output files: 'LATeah0056F_1496.0_372801_0.0_0_0.out' '../../projects/einstein.phys.uwm.edu/LATeah0056F_1496.0_372801_0.0_0_0' 'LATeah0056F_1496.0_372801_0.0_0_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah0056F_1496.0_372801_0.0_0_1'
06:13:53 (14009): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
06:13:53 (14009): [normal]: WARNING: Resultfile '../../projects/einstein.phys.uwm.edu/LATeah0056F_1496.0_372801_0.0_0_1' present - doing nothing
06:13:53 (14009): [debug]: Set up communication with graphics process.
mv: LATeah0056F_1496.0_372801_0.0_0_0.out: No such file or directory
mv: LATeah0056F_1496.0_372801_0.0_0_0.out: No such file or directory
06:16:57 (14042): [normal]: This Einstein@home App was built at: Jul 26 2017 12:06:48

06:16:57 (14042): [normal]: Start of BOINC application 'hsgamma_FGRP5_1.08_x86_64-apple-darwin__FGRPSSE'.
06:16:57 (14042): [debug]: 2.1e+15 fp, 5.7e+09 fp/s, 365926 s, 101h38m45s89
command line: hsgamma_FGRP5_1.08_x86_64-apple-darwin__FGRPSSE --inputfile ../../projects/einstein.phys.uwm.edu/LATeah0056F.dat --alpha 4.7717882413 --delta -0.1998787927 --skyRadius 0.00306804779 --ldiBins 15 --f0start 1480 --f0Band 16 --firstSkyPoint 372801 --numSkyPoints 79 --f1dot -1.0e-13 --f1dotBand 1.0e-13 --df1dot 1.838485756e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 4194304.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.15 --reftime 56265.0 --f0orbit 0.005 --freeRadiusFactor 2 --mismatch 0.15 --debug 1 -o LATeah0056F_1496.0_372801_0.0_0_0.out
output files: 'LATeah0056F_1496.0_372801_0.0_0_0.out' '../../projects/einstein.phys.uwm.edu/LATeah0056F_1496.0_372801_0.0_0_0' 'LATeah0056F_1496.0_372801_0.0_0_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah0056F_1496.0_372801_0.0_0_1'
06:16:57 (14042): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
06:16:57 (14042): [normal]: WARNING: Resultfile '../../projects/einstein.phys.uwm.edu/LATeah0056F_1496.0_372801_0.0_0_1' present - doing nothing
06:16:57 (14042): [debug]: Set up communication with graphics process.
mv: LATeah0056F_1496.0_372801_0.0_0_0.out: No such file or directory
mv: LATeah0056F_1496.0_372801_0.0_0_0.out: No such file or directory
mv: LATeah0056F_1496.0_372801_0.0_0_0.out: No such file or directory
06:20:02 (14078): [normal]: This Einstein@home App was built at: Jul 26 2017 12:06:48

It appears the task is having a problem trying to finish.  See the line with the time stamp of 06:10:51 where the task called bionic_finish. Notice the move commands "mv" where it gets the "No such file or directory" error. The files that logic is trying to move exist in the slot folder. 

The logic appears to keep looping thru the logic about every 3 minutes and matches the fall back time of the Elapsed time.  That is the time cycled from about 09:21:09 back to 09:19:38.

In summary, the logic appears to have lost its way.

John

John I Prewitt
John I Prewitt
Joined: 31 May 12
Posts: 4
Credit: 259373174
RAC: 7006

Christian, Further analysis

Christian,

Further analysis indicates that task did not output anything to the LATeah0056F_1496.0_372801_0.0_0_0.out file which indicates a problem with the task.

John

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.