Missing one of my computers

schildbuerger
schildbuerger
Joined: 24 Aug 13
Posts: 13
Credit: 3616767
RAC: 763
Topic 219094

Hello together,

since some months one of my computers is not listed at BOINC combined, but it is working on einstein tasks and still solving this tasks. Today I recocckniced that it is working on a Task which timed out on 27.03.2019!!! but I've received the task in the last view days. The einstein website logged the last contact to this computer at 13.03.2019 and the computers IP isn't my current.

I called this web session by click on "Your Computer" button in BOINC Manager of the interesting computer, so it seems there is something wrong in the association of the machine.

Has anyone an Idea how to fix the mess?

Regards,

  Klaus

Edit: The missing computer is this one: 12245902

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5849
Credit: 110010173177
RAC: 23874569

schildbuerger wrote:.. it is

schildbuerger wrote:
.. it is working on einstein tasks and still solving this tasks.


I checked the online database and the computer ID you quote has zero tasks.  Yes, the time of last contact for that computer is 13 Mar 2019 and tasks received at that time would have had a deadline of 27 Mar.  You must have had the task you talk about since 13 Mar.  You can't have received it "in the last few days" because it has not existed in the database for a long time and so couldn't possibly have been sent recently.

You should abort any old tasks that your BOINC client still shows as being on your machine.  Go to the tasks tab in BOINC Manager (advanced view), highlight any task(s) you see there and click the 'abort' button.  Your client should contact the server and return the aborted task(s).  If you need to, click the 'update' button after selecting the Einstein project on the projects tab.  Then go to the event log and if you don't understand the messages you see there, please copy and paste what you do get as a result of the 'update'.  If there are no real messages appearing, please stop and restart the BOINC client and copy all the startup messages you get (perhaps 30-50 lines) into a new reply in this thread.  That will help us to diagnose the problem.

Cheers,
Gary.

schildbuerger
schildbuerger
Joined: 24 Aug 13
Posts: 13
Credit: 3616767
RAC: 763

Hi Gary,   I've aborted the

Hi Gary,

 

I've aborted the task and got a new Milkyway Task. The message log contains following lines:

26.06.2019 21:00:07 |  | Starting BOINC client version 7.2.42 for x86_64-pc-linux-gnu 26.06.2019 21:00:07 |  | log flags: file_xfer, sched_ops, task 26.06.2019 21:00:07 |  | Libraries: libcurl/7.36.0 OpenSSL/1.0.1f zlib/1.2.8 libidn/1.28 libssh2/1.4.3 librtmp/2.3 26.06.2019 21:00:07 |  | Data directory: /var/lib/boinc-client 26.06.2019 21:00:07 |  | No usable GPUs found 26.06.2019 21:00:07 |  | Host name: video02 26.06.2019 21:00:07 |  | Processor: 2 GenuineIntel Pentium(R) Dual-Core  CPU      E5200  @ 2.50GHz [Family 6 Model 23 Stepping 6] 26.06.2019 21:00:07 |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf eagerfpu pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm lahf_lm dtherm 26.06.2019 21:00:07 |  | OS: Linux: 3.13.0-170-generic 26.06.2019 21:00:07 |  | Memory: 1.95 GB physical, 2.00 GB virtual 26.06.2019 21:00:07 |  | Disk: 1.79 TB total, 1.37 TB free 26.06.2019 21:00:07 |  | Local time is UTC +2 hours 26.06.2019 21:00:07 |  | Config: GUI RPCs allowed from: 26.06.2019 21:00:07 |  | 192.168.178.20 26.06.2019 21:00:07 |  | 192.168.2.101 26.06.2019 21:00:07 | Asteroids@home | URL http://asteroidsathome.net/boinc/; Computer ID 308524; resource share 500 26.06.2019 21:00:07 | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 12245902; resource share 800 26.06.2019 21:00:07 | Universe@Home | URL http://universeathome.pl/universe/; Computer ID 46805; resource share 450 26.06.2019 21:00:07 | LHC@home | URL https://lhcathome.cern.ch/lhcathome/; Computer ID 10391031; resource share 400 26.06.2019 21:00:07 | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 689393; resource share 450 26.06.2019 21:00:07 | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 7975813; resource share 200 26.06.2019 21:00:07 |  | General prefs: from http://qcn.caltech.edu/sensor/ (last modified 09-Apr-2015 16:53:06) 26.06.2019 21:00:07 |  | Host location: none 26.06.2019 21:00:07 |  | General prefs: using your defaults 26.06.2019 21:00:07 |  | Reading preferences override file 26.06.2019 21:00:07 |  | Preferences: 26.06.2019 21:00:07 |  | max memory usage when active: 1000.58MB 26.06.2019 21:00:07 |  | max memory usage when idle: 1801.05MB 26.06.2019 21:00:07 |  | max disk usage: 100.00GB 26.06.2019 21:00:07 |  | max CPUs used: 1 26.06.2019 21:00:07 |  | suspend work if non-BOINC CPU load exceeds 25% 26.06.2019 21:00:07 |  | (to change preferences, visit a project web site or select Preferences in the Manager) 26.06.2019 21:00:07 | Einstein@Home | Task h1_0330.75_O1C02Cl3In0__O1OD1_330.95Hz_307_0 is 91.07 days overdue; you may not get credit for it.  Consider aborting it. 26.06.2019 21:00:07 |  | Not using a proxy 27.06.2019 03:07:52 | Einstein@Home | task h1_0330.75_O1C02Cl3In0__O1OD1_330.95Hz_307_0 aborted by user 27.06.2019 03:07:53 | Milkyway@Home | Sending scheduler request: To fetch work. 27.06.2019 03:07:53 | Milkyway@Home | Requesting new tasks for CPU 27.06.2019 03:07:54 | Einstein@Home | Computation for task h1_0330.75_O1C02Cl3In0__O1OD1_330.95Hz_307_0 finished 27.06.2019 03:07:55 | Milkyway@Home | Scheduler request completed: got 1 new tasks 27.06.2019 03:07:57 | Milkyway@Home | Started download of parameters-83-4s.txt 27.06.2019 03:07:57 | Milkyway@Home | Started download of stars-83-donlon.txt 27.06.2019 03:07:58 | Milkyway@Home | Finished download of parameters-83-4s.txt 27.06.2019 03:07:59 | Milkyway@Home | Finished download of stars-83-donlon.txt 27.06.2019 03:07:59 | Milkyway@Home | Starting task de_modfit_83_bundle4_4s_south4s_1_1561047003_2264129_1

In the near past I've received only eisntein Tasks. Strange?

 

Thanks for your fast reply,

 

  Klaus

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5849
Credit: 110010173177
RAC: 23874569

schildbuerger wrote:... The

schildbuerger wrote:
... The message log contains following lines:


If you post text like that between [ code ] tags (click the triangle in front of BBCODE HELP to see details) when composing a message it will be a lot easier to read :-).  Also if, before pasting, you click the third last icon at the RH end of the set of controls at the top of the composition window, your paste will happen "as plain text" which may give a more readable result.

I've checked the event log and everything seems normal there.  The task you had belonged to a previous search which is no longer current.  Your computer did contact the server when I checked after your latest post and the result of that contact showed as "not requesting any work" so I'm wondering if you haven't checked your project preferences and selected one of the current CPU searches.  By the look of things, you just need to do that.

If you go to Account -> Preferences -> Project and scroll down until you find the list of Applications, the ones that currently have CPU work are Gamma-ray Pulsar search #5 and Continuous Gravitational Wave search O2 All-Sky.  Select one or both of those, as you prefer, and you should then be able to get some tasks.

schildbuerger wrote:
In the near past I've received only eisntein Tasks. Strange?


The last task you received was back in March, the one you just aborted.  If you had received anything recently, it would have been listed in BOINC Manager and also on the website.  Your machine had no contact with the project after March until you sent back the aborted task.  You can't receive tasks if you don't make contact and make a work request :-).

Cheers,
Gary.

schildbuerger
schildbuerger
Joined: 24 Aug 13
Posts: 13
Credit: 3616767
RAC: 763

Hi Gary, near every day I

Hi Gary,

near every day I take a look on the computer and on every time the last months it's working on einstein - I swear! Perhaps it's working permanetly on that task and doesn't finish till I aborted it yesterdy. But now every thing works fine. The milkyway task is done and now its working on seti.

Yesterday I missed to update the einstein project before posting my log. I've done that now and got no task because  "Not requesting tasks: don't need". If it is helpful I can post a new log (this time inserted as plain text in Courier :-), sorry), please call for it.

Thanks for your help,

  Klaus

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5849
Credit: 110010173177
RAC: 23874569

schildbuerger wrote:...

schildbuerger wrote:
... Perhaps it's working permanetly on that task and doesn't finish till I aborted it yesterdy.


I can't think of any other way to explain it  Did you ever check to see if it was the same task day after day?  Did you ever check the elapsed time and % progress that was showing for the task?  We are talking about a period of three months.  It seems quite far fetched to imagine that the one task was still not finished after all that time.

I believe that tasks that haven't even started by the deadline do get aborted as soon as the deadline passes.  Tasks that have started are left to continue.  You do get warnings in the event log.  In the snip you posted, here is what the BOINC client said about that task during the re-start.

26.06.2019 21:00:07 | Einstein@Home | Task h1_0330.75_O1C02Cl3In0__O1OD1_330.95Hz_307_0 is 91.07 days overdue; you may not get credit for it. Consider aborting it.

Every time you started your machine, a message like that would have been given to alert you to the problem.

GW tasks (by design) are likely to take around 12 hours to complete on a relatively modern machine.  We could assume the elapsed time to crunch a task on your host (older Pentium dual core 2.5GHz) would be longer - say 18-24 hrs perhaps.  Even if slow, the task should be finished well within the deadline.  So the question I ask myself (and then try to answer) is, "how could such a task still be running and not finished 105 days after it was originally sent?"

Well, it probably depends largely on how you use your machine and how you have set your preferences.  If you turn the machine on, use it for a session and then turn it off when you finish, that could be a factor.  If you had a preference setting for suspending computation when the user is active set to 'yes', then it would be possible for very little progress to be made during the session.  User active just means the user was using the keyboard/mouse, irrespective of how much CPU activity there was.  So just processing emails or browsing the internet would prevent BOINC from doing work.  I notice your machine is running an old BOINC client and an old Linux kernel. I imagine it's not your 'daily driver' so perhaps it's putting itself into some sort of low power state and not actually doing any crunching until you 'wake it up'?

Another preference setting is for whether or not you allow tasks to remain in memory when suspended.  If that setting is to remove tasks from memory when suspended, then the only way for the task to continue when allowed, is to reload at the point where it was at the time the previous checkpoint was written.  So, with that setting, you always lose any computation after the last checkpoint up until the point where the task was suspended (and therefore removed from memory).

schildbuerger wrote:
Yesterday I missed to update the einstein project before posting my log. I've done that now and got no task because  "Not requesting tasks: don't need". If it is helpful I can post a new log.

There's no need to click 'update' unless you have a specific reason to force contact with the project.  The client will take care of that automatically when needed.  The event log you posted was quite sufficient to show the situation.  I'd be interested in knowing how you managed to keep a task crunching for that long without it ever completing.  Information about your usage patterns and preference settings would be useful for diagnosing why the task wasn't completed within the original deadline.

From what you have said, you seem to be crunching only one task at a time, even though you have a dual core CPU.  Do you allow BOINC to use only 50% of the cores?  What size (in days) do you use for your work cache?

Cheers,
Gary.

schildbuerger
schildbuerger
Joined: 24 Aug 13
Posts: 13
Credit: 3616767
RAC: 763

Hi, I'm back again. Sorry for

Hi, I'm back again. Sorry for the late reply!

Normally I didn't look at the message log till there seems to be a problem. I also do not look at the tasks on detail, boinc manager is just running to get an overview to see wat is going on. So I can state that the computer only works on einstein the last weeks, nothing more what I remember. I take a closer look to the tasklist because the task shows an due-date for the next day and the rest time has too many hours that by normal usage of the computer the task could finish in time. A closer look at the due-time teaches me that it is dated in march and so I contacted the forum in this thread.

I want to tell something about the computer. In it's main job it acts as a DVB-S receiver/recorder (yaVDR, based on ubuntu)  so it runs nearly every day a view hours. For this job it needs one core exclusive so I restricted the usage of boinc to 50%. I set up the system some years ago and installed boinc at the same time. After that I use apt-get distupgade every view weeks to update the system. Boinc preferences haven't changed after the initial setup and looks like this (there is a german language installed, I have to translate the prompts to english):

  • use cpu time: 100%
  • interrupt on akku usage: unchecked (it is a desktop computer without akku)
  • interrupt on user activity: unchecked
  • pause on Processor usage: checked and 25%
  • keep non GPU-tasks in memory when paused: unchecked
  • store workdays: 0
  • store additional workdays: 0,25
  • switch between tasks: 60 minutes
  • store tasks at most: 60 seconds

After I aborted the einstein task it does only milkyway and seti tasks. But now I got the first einstein, it comes in the last half hour. Project settings looks like this:

  • einstein (800)
  • asteroids (500, no new tasks)
  • universe (450 no new tasks)
  • milkyway (450)
  • lhc (400, no new tasks)
  • seti (200)

Boinc manager is running on another maschine and contacted remote. If I look to the tasklist the one active task normally never interrupts until it switched to another task caused by boinc-automatics.

Hopefully my informations helps to find out what's going on. For me the computer and boinc works fine after I aborted the einstein task. But if you want more informations to find out what happend I gladly give required informations if I can.

Regards,

  Klaus

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5849
Credit: 110010173177
RAC: 23874569

Hi Klaus,Thanks for

Hi Klaus,
Thanks for supplying the extra information.  I think it's now possible to explain the behaviour you have described.

Firstly, I believe your machine had been working on that one aborted Einstein task for three months and was never able to complete it - strange as that may seem.  This would explain why your client favoured the other projects and prevented a new Einstein task until just now. The client knew that it had spent many hours trying to complete just that one task so as soon as it was aborted and returned, the client had to favour other projects to give them their rightful share - as dictated by your resource share settings.  It took quite a while to 'catch up' the deficiency.

The real problem is caused by some settings that aren't appropriate for your main use of that machine.  You describe this as "it needs one core exclusive" for the "main job".  So by all means restrict BOINC to 50% of the cores if you wish.  However DON'T double restrict by setting "Pause on processor usage of 25%".

I've never tested this out but here is what I believe may very well be happening.  You need to understand that BOINC is designed to run at the lowest priority - it's designed to use 'spare' CPU cycles.  My experience is that it's quite good at this.  Of course that very much does depend on the characteristics of the main job.

A lot of 'heavy use' apps will tend to be 'bursty' - ie peaks of very high CPU use and troughs of very low use.  Unless the app is multi-threaded and extremely heavy in its CPU use, the 50% cores setting should give it all that it needs.  When you also set "Pause at 25%", I'm wondering if that is being interpreted as "any core using >25% for a non-BOINC app will cause a pause".  You really need to test this for yourself because I think that your main job probably has regular bursts above 25% and so BOINC dutifully pauses crunching when that happens, despite the main job having its own exclusive core to use.

Also, you don't have the setting checked for keeping the task in memory when suspended.  So that means if BOINC is being paused very regularly by >25% bursts on the other core, any progress is being thrown away and the task is forced to restart from the last saved checkpoint.  If your main job runs all the time the machine is on, I could then understand why that one task was never able to finish.

Another thought has occurred to me.  When you first posted about this issue, I seem to recall seeing a very old version of BOINC - version 5 or version 6 comes to mind (I could easily be mistaken).  Today I see version 7.2.42.  Have you upgraded BOINC recently?  If you have, I vaguely remember some sort of problem with very old BOINC versions to do with that "using >25%" setting.  I have a nagging feeling that when that was in use, old BOINC versions would pause crunching far more frequently than should have been the case.  I've never used that setting so I don't remember any details.  Whatever the issue was, I think it was fixed before 7.2.42 came along.  So if you have upgraded BOINC, you should be fine.

Your new Einstein task is not a GW task like the previous one.  It's a Gamma-ray pulsar task (FGRP5) and it will be able to finish a bit faster than a GW task (if you allow it to).  However it will checkpoint less frequently than the GW tasks so the potential to lose progress is higher if you don't change your settings as suggested.

As I write this, you've had the task for several hours.  I believe it should checkpoint on your machine, perhaps every 10-15 mins, or something like that.  It would be interesting to know how long the machine has been running since the task was received and what progress is showing for it (both elapsed time and % complete).  I believe it will write a checkpoint every 1.13924% of progress - ie. at values showing in BOINC Manager (advanced view) of 1.139%, 2.278%, 3.418%, 4.557%, ....  If you care to give the 3 figures at any point (machine on time, task elapsed time, and task % completed) it should be easy to see if significant crunching time and progress is being lost.  If you've already shut the machine off, just record the 3 values you see immediately after a restart and then again when you are about to shut it down.  The differences between the two sets will tell the story.

Cheers,
Gary.

schildbuerger
schildbuerger
Joined: 24 Aug 13
Posts: 13
Credit: 3616767
RAC: 763

Hi Gary, it's late night in

Hi Gary,

it's late night in Germany and some drinks later ;-)

So I've read a lot of foreign text and understand not much. But the computer is still running now and at this time the values of that einstein task are following: done = 4:33h, remaining = 5:24h, completed = 35,5400%. The Computer will work for the next hours, because it's recording actually.

I will take a look on your post next day with fresh brain.

Thanks,

  Klaus

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5849
Credit: 110010173177
RAC: 23874569

schildbuerger wrote:it's late

schildbuerger wrote:
it's late night in Germany and some drinks later ;-)

Yes, I did know what time it was in Germany so wasn't expecting an immediate reply.  As soon as I saw one, I guessed about the drinks as well :-) ;-).

Assuming the drinks were not excessive and your numbers are believable, :-)  I would say that everything is pretty much as it should be.  You received the task at 20:30 UTC 6 Jul and you posted at 01:10 UTC 7 Jul.  So, at the time you posted, the machine had been running the task for 4:40h which agrees very closely with the run time of 4:33h.  My conclusion is that you obviously need a few more drinks since you are not sufficiently inebriated - go look that one up :-).

Seriously, at the current rate of progress, the task will get to ~90% (the end of the primary calculations) at around an elapsed time of just over 11:30h.  The rate of progress for these tasks is fairly uniform so this number should be fairly close (unless you suddenly start losing progress due to restarting from an earlier checkpoint.  At ~90%, progress will seem to stop for perhaps another 30-50 minutes as the followup stage doesn't include an indication of continuing progress.  When the followup stage completes, the progress will jump immediately to 100% and the result will be uploaded.  Total time for the task should be between 12 and 13 hours.  This is very much what I would expect for your machine.

schildbuerger wrote:
The Computer will work for the next hours, because it's recording actually.

I don't know anything about recording/editing/post processing of video since I've never felt a need to do any of that.  I wouldn't have a clue as to what parts of the overall process consume a lot of CPU.  On the assumption that recording is just laying down the bytes on a disk, maybe the CPU isn't too stressed with this activity so the Einstein task isn't getting interrupted.

When you get your brain back :-) please see if you can work out what I've written and if it doesn't make any sense, please ask.  It will be interesting to know if this task fully completes in the projected time, or if something intervenes and it doesn't.  By the time you wake up - maybe afternoon local time :-) - it should be well and truly completed and hopefully replaced with something else.

Cheers,
Gary.

schildbuerger
schildbuerger
Joined: 24 Aug 13
Posts: 13
Credit: 3616767
RAC: 763

Hi Gary, thanks for your

Hi Gary,

thanks for your reply.

It took quite a while to 'catch up' the deficiency. (how do I get the box that you use for quotes?)

I would agree to that.

The machine shuts down automatically short after finishing the recording if it is not used as receiver. If I end showing TV I shut down the machine manually. It starts automatically if a record job stands before and it is not running, so normally it's running only a view hours a day.

The main job needs one CPU core, some bus bandwidth to store the TV-stream to hard disk and a bit of memory. The requirements are quite less.

So we can configure BOINC as it will be best for its demands but uses only one core. What settings would you suggest in concrete?

How do I update to a new BOINC version? My Linux knowledges are quite low so I would need a concrete command. In the seti forum I found this commands:

sudo add-apt-repository ppa:costamagnagianfranco/boinc

sudo apt-get update

sudo apt-get upgrade

 

And on BOINC Forum there is a completely different process for that task.

The current einstein task is the only one in the tasklist, it shows 64,672% progress, 9:03h done and 2:21h remaining.

I have to find an end for today. Have I missed something?

Regards,

  Klaus

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.