Einstein@Home does not switch every hour like it suppose to

Eleandrus
Eleandrus
Joined: 6 Feb 08
Posts: 1
Credit: 1171361
RAC: 0
Topic 219045

     I have been running Seti@Home for many years and wanted to add another program, Einstein caught my eye and i set the resources to 50/50.  I noticed right away that Seti stopped working and Einstein hogged all the cpu power and only worked on its work over and over. I know have Seti work that expires on the 29th and look like they'll be wasted if Einstein doesn't stop hogging all the cpu.  I reduced its share down to 30 Einstein/20 Quake /50 seti and it that did not put a dent into Einstein slowing down.   I have let this go since the beginning of May and nothing has changed or normalized. I may have to delete all of my Einstein work if this bug or problem isn't fixed, which i don't want to

Eleandrus

alanb1951
alanb1951
Joined: 28 Nov 16
Posts: 18
Credit: 651137145
RAC: 346312

When you join a new project,

When you join a new project, the BOINC client will tend to give precedence to that project to try to head towards an even distribution of work completed.  So, when you signed up to Einstein the BOINC client scheduler will run more Einstein than SETI for a while!

However, the client is supposed to keep an eye on when results are due to be returned, so as your SETI jobs get nearer their deadline it ought to start running them (in what some people refer to as "panic mode"...)   Note that this is a BOINC thing, not an Einstein thing!

In the case of CPU tasks, there is something else you can do if you are prepared to create/edit a text file in the BOINC project directory for Einstein.  I'm a Linux user, so I don't know where that directory might be on your machine(s), but it should be called einstein.phys.uwm.edu - the file you would need to create is called app_config.xml.  The technique to be used is to constrain the number of tasks running at once, rather than depending on BOINC to swap them in and out!

If you already know about that, you probably don't need what follows(!), however...   The file should be created with a text editor (not a word processor!), taking care to not accidentally add a .txt extension to the file name when saving it.

If you have a 6-core (12 thread) CPU and you want to constrain Einstein to only use (say) 4 of those 12 threads,  you would create an app_config.xml in the Einstein project directory containing the following (or add the second line of this to an existing file just before the end...)

<app_config>  <project_max_concurrent>4</project_max_concurrent></app_config>

Once that file is in place, select "Read config files" from the Options menu in Boinc Manager (Advanced View - I don't think it shows up in Simple View!) and the BOINC client scheduler should promptly stop running all but 4 of the Einstein tasks, starting SETI jobs in their place...

That "project_max_concurrent" trick is the simplest way of tuning what will run at any given time (I use it to moderate the amount of climateprediction.net tasks I run on the rare occasions there's any Linux work-units!);  hopefully it might be of some assistance for you.

Oh, by the way, if BOINC is switching tasks hourly and you haven't enabled "Leave non-GPU tasks in memory while suspended" in "Computing Preferences | Disk and Memory" tasks that are suspended will restart from the last checkpoint, and that could mean a lot of repeat processing!

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110027846127
RAC: 22471824

Hi Eleandrus, welcome back to

Hi Eleandrus, welcome back to Einstein.  I see you originally joined back in 2008.

I had a look at your FX-8100 8 core machine which seems to be the one currently crunching Einstein.  Your tasks list shows 25 tasks in total, 11 of which are completed and validated and a further 14 in progress.  You received the first of your current tasks back on June 6 and the first of those to be returned was sent back on June 11 - 5 days later.

It doesn't really seem possible that "Einstein is hogging all the CPU power" as you put it.  If so, and with 8 cores, why was only one task returned on June 11, and not more of them, and why so long after receiving it?  I guess the reason might be that you don't run your machine very often or that you have lots of other things besides crunching that you are doing and you have your preferences set to not crunch when you are using your machine for your own purposes.  The fact that all tasks returned seem to use significantly more elapsed (run) time than the actual CPU time seems to suggest that your machine was struggling under the load of other competing work it had to do.  It might also be exacerbated by insufficient RAM.  How much memory does the machine have?

As AlanB1951 mentions, Einstein has absolutely no control over when its tasks are processed.  That's BOINC's job.  You mention that Seti's next deadline is 29th.  Einstein's deadlines have been earlier than that - 20th for those tasks you received on the 6th.  BOINC is doing it's job of trying to make sure no task will exceed its deadline whilst trying to honour your resource shares.  It would appear that BOINC thinks Einstein might miss its deadline.  When that deadline pressure has passed, BOINC will stop running Einstein and then run Seti, with its longer deadlines.  At the moment you still have Einstein tasks with earlier deadlines than the 29th so BOINC may well be compelled to keep doing those.

BOINC makes no guarantee that it will precisely honour resource shares in the short term.  It will certainly attempt to do so in the longer term - which is exactly how resource shares are supposed to work.  If you want to make BOINC's job a lot easier, there are two things to do.

  1. Set a relatively small work cache size and allow BOINC to run when you are using the machine yourself if possible.  BOINC will learn over time how much of any 24hr period it will be able to use and will scale back on how much work it downloads accordingly.  At the start, it's very likely to get more work than it should if you start off with a multi-day work cache but only run your machine sporadically.  You can always increase the cache size slowly once things are working well and that could take quite a while to achieve.
  2. Try not to allow BOINC to get into high priority mode.  You can see if tasks are running in "high priority" by looking at the tasks tab in BOINC Manager - advanced view.  If you see tasks in this mode, it most likely means you have too large a work cache size for BOINC to cope with.  If you reduce it down to 0.1 days, BOINC may well decide that high priority is no longer needed.  Let it run like that until the 'at risk' tasks are completed and safely returned.  Then you could increase it (a little at a time) but not back to where it was that caused the problem in the first place.

Please realise that it's very hard to 'diagnose from a distance' so the above comments are conjecture on my part based on what I saw in your list of tasks, just for Einstein.  There may be other factors at play as well so if there are and you tell us about them, it should be possible for you to have a much better experience.

Please let us know how you go.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.