Work units never finish

Robert J. O’Hara
Robert J. O’Hara
Joined: 20 Jun 06
Posts: 5
Credit: 191933
RAC: 310
Topic 219011

My E@Home work units never seem to complete now — a problem that's been happening for several weeks or months. They download fine and run, but rarely make it beyond 10-30% and then seem to stall. The two that are "running" now are at 21% and 26% and BOINC says the deadline was 30 May. I didn't used to have this problem I don't believe, although E@H was always slower than SETI and MilkyWay. I've been an E@H participant off and on for years and have over 13,000 completed credits, so something has changed. I'm currently running BOINC on a MacBook Air (and never use the GPU, which heats the machine up too much).

I have read and followed these instructions:

If you are having trouble completing work by the deadline, the simplest and best thing to do is to decrease the size of your 'work cache'. Go to your Dashboard and edit your general preferences so that 'Connect to network about every N days' is set to 0.1 days. This way, your computer won't download more work than it can do in a single day.

But even with the 0.1 days setting E@H still runs for days without completing, and always runs two units (never one, which might be faster). Work units did complete successfully in the past, but are not doing so now.

Any suggestions on how to get back on track? I'm just a regular user, not a technical specialist. I don't want to have to drop E@H, but I'm not contributing anything useful at present.

Many thanks.

Bob

mikey
mikey
Joined: 22 Jan 05
Posts: 11946
Credit: 1832592105
RAC: 218179

Robert J. O’Hara wrote:My

Robert J. O’Hara wrote:

My E@Home work units never seem to complete now — a problem that's been happening for several weeks or months. They download fine and run, but rarely make it beyond 10-30% and then seem to stall. The two that are "running" now are at 21% and 26% and BOINC says the deadline was 30 May. I didn't used to have this problem I don't believe, although E@H was always slower than SETI and MilkyWay. I've been an E@H participant off and on for years and have over 13,000 completed credits, so something has changed. I'm currently running BOINC on a MacBook Air (and never use the GPU, which heats the machine up too much).

I have read and followed these instructions:

If you are having trouble completing work by the deadline, the simplest and best thing to do is to decrease the size of your 'work cache'. Go to your Dashboard and edit your general preferences so that 'Connect to network about every N days' is set to 0.1 days. This way, your computer won't download more work than it can do in a single day.

But even with the 0.1 days setting E@H still runs for days without completing, and always runs two units (never one, which might be faster). Work units did complete successfully in the past, but are not doing so now.

Any suggestions on how to get back on track? I'm just a regular user, not a technical specialist. I don't want to have to drop E@H, but I'm not contributing anything useful at present.

Many thanks.

Bob

Simple solution then...click account above, then click preferences then project. Then it can get tricky....if you are using the default generic as the venue for your laptop just scroll down the page and change the last 3 boxes on the page that have 0.5 settings to 1.0. Be sure to click Save Changes at the bottom of the page. Then go back and abort all your Einstein workunits  and you will start running the new ones one at a time. However if you've changed the venue from the default generic then it's easier to scroll to the bottom of that account, preferences, project page and then on the right it says (show comparison view), click on that and it will show all the venues and you can then scroll down and see if any of the others also have a 0.5 in any of the bottom 3 boxes and change them too. You will have to change them one at a time though.

The 0.5 means to run 2 workunits at a time, if you had  0.3 in them you would be running 3 wu's at a time, a 0.25 would be 4 wu's at one time. This is usually used for gpu's not cpu's though so also make sure you have NO selected at the top of that same account, preferences, project page for all the gpu's. Be sure to click Save Changes at the bottom of the page after you've made the changes.

gemini8
gemini8
Joined: 31 May 11
Posts: 10
Credit: 197624500
RAC: 2320

This year I've had some

This year I've had some work-units that ran more than a day instead of about 45 minutes on my AMD GPU.

They finished ok.

Credit was the same as for all others as well.

So, if credit is not crucial for you, you might want to have the work-units run to see if they are finishing.

- - - - - - - - - -

Greetings, Jens

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Robert J. O’Hara wrote:My

Robert J. O’Hara wrote:
My E@Home work units never seem to complete now — a problem that's been happening for several weeks or months. They download fine and run, but rarely make it beyond 10-30% and then seem to stall. The two that are "running" now are at 21% and 26% and BOINC says the deadline was 30 May. I didn't used to have this problem I don't believe, although E@H was always slower than SETI and MilkyWay. I've been an E@H participant off and on for years and have over 13,000 completed credits, so something has changed. I'm currently running BOINC on a MacBook Air (and never use the GPU, which heats the machine up too much).

Upon checking what type of work you currently are running I found it to be "Gamma-ray pulsar search #5", these work units all run for about the same time but depending on the search parameters they have a few differing characteristics, one being that they might not checkpoint frequently.

The task you aborted had 79 skypoints to analyze and would thus write 79 checkpoints, the application only reports progress to Boinc when it checkpoints. If the app doesn't reach a checkpoint mark before it's interrupted it will restart from scratch the next time it's run and thus never finish.
If Boinc doesn't get a progress update from the app in a few minutes after it starting then Boinc will start to use "pseudo progress" and increase the progress percentage based on the estimated remaining time (probably very inaccurate before you manage to complete a task, once tasks start to complete it will be adjusted to be more accurate). This can give the impression that the task is progressing faster than it actually is. Once the app reports progress Boinc will adjust to it and the progress will be more accurate.

It's been a while since a ran any work for FGRP5 so don't know for sure how long they may take but I do believe they run for something like 8-12 hours on a fairly fast machine. It can take 30 min - 1 hour between checkpoints on a slower machine.

So what to do?

  • Run the machine for longer periods without interrupting Boinc. If you open Boinc Manager in advanced view and go the Task tab and then click on a running task to highlight it you can then click on Properties on the left, in the window that comes up you can see the "CPU time since checkpoint", if blank no checkpoint has been written.
  • Check your computing preferences and make sure "Leave non-GPU tasks in memory while suspended" is set to yes. This will ensure that the task is not restarted regardless of if you allow Boinc to run while you use the machine or not or if you suspend Boinc. Boinc will still keep out of the way when you need the processing power for something else.
  • If you want to reduce the CPU tasks that are run then adjust "Use at most: 100% of the processors" to something lower, for a 2 core machine set it to 50% for only 1 task instead of 2. This setting is also found in the computing prefs.
  • If the above doesn't give satisfying results then review the project preferences and deselect "Gamma-ray pulsar search #5", you should then get CPU work from another search, currently "Continuous Gravitational Wave search O2 All-Sky" that might work better.

For the tasks you're currently running I would consider aborting them as they are now 10 days past deadline and you might not get any credits for them. It might be better to start new tasks and try to get them completed.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

mikey wrote:Simple solution

mikey wrote:

Simple solution then...click account above, then click preferences then project. Then it can get tricky....if you are using the default generic as the venue for your laptop just scroll down the page and change the last 3 boxes on the page that have 0.5 settings to 1.0. Be sure to click Save Changes at the bottom of the page. Then go back and abort all your Einstein workunits  and you will start running the new ones one at a time. However if you've changed the venue from the default generic then it's easier to scroll to the bottom of that account, preferences, project page and then on the right it says (show comparison view), click on that and it will show all the venues and you can then scroll down and see if any of the others also have a 0.5 in any of the bottom 3 boxes and change them too. You will have to change them one at a time though.

The 0.5 means to run 2 workunits at a time, if you had  0.3 in them you would be running 3 wu's at a time, a 0.25 would be 4 wu's at one time. This is usually used for gpu's not cpu's though so also make sure you have NO selected at the top of that same account, preferences, project page for all the gpu's. Be sure to click Save Changes at the bottom of the page after you've made the changes.

Why do you bring up GPU settings when the OP clearly states he doesn't run GPU apps?
The settings your refer to are purely GPU settings and nothing else!
To control how many CPU tasks are run one adjusts "Use at most XX% of the processors" in the computing settings unless you get into advanced control of Boinc using different .xml files for configuration.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109964441836
RAC: 30768844

Robert J. O’Hara wrote:My

Robert J. O’Hara wrote:
My E@Home work units never seem to complete now — a problem that's been happening for several weeks or months. They download fine and run, but rarely make it beyond 10-30% and then seem to stall. The two that are "running" now are at 21% and 26% and BOINC says the deadline was 30 May.


Robert, your computer is a pretty little slip of a thing that probably has heat issues when trying to just run the OS at idle or browse the internet.  Your description of running for a while and then seeming to stall sounds like it is overheating and throttling.  That's hardly surprising since crunching creates a significant extra heat load, even for the best cooling systems.

Please ignore the confusing GPU based comments of other responders and pay close attention to what Holmis has written.  There are a lot of good suggestions there.  You need to really understand all the points he has listed.  If you don't, please ask extra questions.

Your default processor speed of 1.8ghz is already low by design (to try to limit heat) and the thing looks sleek and pretty because of a lack of a proper cooling system (my guess).  Also, have you ever cleaned out whatever cooling system it does have?  If the cooling system hasn't been maintained and is partially blocked with dust and fluff, the machine will automatically force the processor speed to a very low value.  My guess is your problems lie in that direction.

Something strange is happening with your tasks and their deadlines.  The tasks list for your computer used to show just three tasks a little while ago.  Now it shows five.  At the former point, there were just two in progress tasks that had a 'sent time' of 31 May 2019 7:45:12 UTC.  Now those two tasks show a new 'sent time' of 9 Jun 2019 21:29:26 UTC with a deadline just 5 days from now.

Some how, your original tasks seem to have become 'lost' (perhaps you reset the project) and the scheduler has sent them back to you with the original deadline - that's my guess.  In addition you have two further new tasks with the full 14 day deadline.  So you now have 4 tasks in progress, with the 5th being the aborted task.  That aborted task (way past its deadline according to your description) shows a sent time only 5 days before you aborted it??  Must also have been a lost task that was sent back to you as well by the look of things.  Please stop losing your tasks :-).  Whatever you're doing to 'lose' them isn't helping :-).

Your first priority is to make sure your machine isn't overheating.  If you now have 4 tasks running on your machine, it surely will be.  Get the cooling system checked and cleaned as soon as possible.  To limit heat in the meantime, change your compute preferences to allow BOINC to use only 1 core (25%) and follow Holmis' advice about making sure tasks are kept in memory so you don't lose progress.  My guess is that if there is no interruption, a task may well take around 24 hours or longer to complete.  If the cooling system checks out as good, you may be able to run tasks on two cores (50%), but I don't particularly like your chances if you want to avoid throttling.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.