Recent Work Not Identified in Account

Einsteinpi

Joined: 10 Dec 14

Posts: 32

Credit: 835000

RAC: 216

19 Apr 2018 16:31:32 UTC

Topic 214604

(moderation:

)

I'm running BRP Search (Arecibo), version 1.06, on a Raspberry Pi B+ (first generation). I think this is identified in my account as Computer 12117052. A work unit was completed on this computer yesterday, but there is no indication of the completion in the account, other than the date of last contact with the server. Clicking on the that link displays a page that includes the following line:

p2030.20161219.G202.29-01.06.N.b4s0g0.00000_3584_1 refused: result already reported as success

I don't know if that is the work unit that completed yesterday, but it appears that no credit was given for its completion. If the result was reported earlier, it doesn't show in the account. Further, I don't see the work unit currently being processed by the computer or the work unit that was downloaded and waiting for processing to begin.

What needs to be done to get this computer's work (pending, in progress, and completed) identified in my account?

Many thanks for your help!

Einsteinpi

Joined: 10 Dec 14

Posts: 32

Credit: 835000

RAC: 216

Another work unit was

20 Apr 2018 23:12:01 UTC

Message 165093

(moderation:

)

Another work unit was completed today, but it, too, doesn't appear in my Account.The Raspberry Pi B+ is currently working on another unit, and that isn't shown in the Account either.

The "Computers Active in past 30 days" window shows Computer 12117052 last contacted Einstein@Home today (20 Apr). Clicking on that link produces the following:

2018-04-20 18:18:38.3570 [PID=25324]   Request: [USER#xxxxx] [HOST#12117052] [IP xxx.xxx.xxx.215] client 7.4.23
2018-04-20 18:18:38.4354 [PID=25324] [debug]   [HOST#12117052] Resetting nresults_today
2018-04-20 18:18:38.4354 [PID=25324] [debug]   have_master:1 have_working: 1 have_db: 1
2018-04-20 18:18:38.4354 [PID=25324] [debug]   using working prefs
2018-04-20 18:18:38.4354 [PID=25324] [debug]   have db 1; dbmod 1441645448.000000; global mod 1441645448.000000
2018-04-20 18:18:38.4362 [PID=25324] [CRITICAL]   [HOST#12117052] [RESULT#? p2030.20161219.G202.29-01.06.N.b4s0g0.00000_3587_0] can't find result
2018-04-20 18:18:38.4362 [PID=25324]    [send] effective_ncpus 1 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2018-04-20 18:18:38.4363 [PID=25324]    [send] effective_ngpus 0 max_jobs_on_host_gpu 999999
2018-04-20 18:18:38.4363 [PID=25324]    [send] Not using matchmaker scheduling; Not using EDF sim
2018-04-20 18:18:38.4363 [PID=25324]    [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2018-04-20 18:18:38.4363 [PID=25324]    [send] work_req_seconds: 0.00 secs
2018-04-20 18:18:38.4363 [PID=25324]    [send] available disk 73.24 GB, work_buf_min 0
2018-04-20 18:18:38.4363 [PID=25324]    [send] active_frac 0.999311 on_frac 0.980619 DCF 3.506727
2018-04-20 18:18:38.4407 [PID=25324]    Sending reply to [HOST#12117052]: 0 results, delay req 60.00
2018-04-20 18:18:38.4408 [PID=25324]    Scheduler ran 0.088 seconds

Please note the line in bold above that says, "can't find result."

I hope this additional information can help someone determine the cause of work not showing in my Account and tell me what I need to do to fix the problem. Help please!

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

If you restart Boinc on your

20 Apr 2018 23:20:19 UTC

Message 165094

(moderation:

)

If you restart Boinc on your Raspberry Pi and check the "Event log" (Boinc Manager - Tools meny -> Event log) you should see a line like "17-Apr-2018 19:04:32 [Einstein@Home] URL http://einstein.phys.uwm.edu/; Computer ID 11468519; resource share 400".

What does it show as the Computer ID?

Einsteinpi

Joined: 10 Dec 14

Posts: 32

Credit: 835000

RAC: 216

Thank you for your reply. I

21 Apr 2018 2:50:38 UTC

Message 165095

(moderation:

)

Thank you for your reply.

I suspended and then resumed the project and then checked the "Event Log." I don't see a Computer ID listed. The latest message just shows that the project was resumed. The only information listed in the log is project, date/time and message. No URL, computer ID, or resource share are shown.

FYI: I'm running the Raspberry Pi headlessly and use VNC viewer to get a graphical interface so I can access BOINC Manager. The operating system on this Pi is "Jessie." It's not the latest available for the Pi, but I can't update it because Einstein@Home won't run on this model of Pi with the latest operating system

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119694083848

RAC: 25384156

Einsteinpi wrote:Another work

21 Apr 2018 3:58:07 UTC

Message 165096 in response to message 165093

(moderation:

)

Einsteinpi wrote:

Another work unit was completed today, but it, too, doesn't appear in my Account.The Raspberry Pi B+ is currently working on another unit, and that isn't shown in the Account either.

The "Computers Active in past 30 days" window shows Computer 12117052 last contacted Einstein@Home today (20 Apr). Clicking on that link produces the following:

2018-04-20 18:18:38.3570 [PID=25324] Request: [USER#xxxxx] [HOST#12117052] [IP xxx.xxx.xxx.215] client 7.4.23

....

There are some strange things you need to investigate.

If you are sure a task was completed and returned very recently, then hostid 12117052 wasn't the ID that did it. The last task returned from that ID was a compute error on 13 April - more than a week ago. 12117052 shows as having BOINC client 7.4.23 now on 20 April, so it's still communicating, but it's not asking for work and has none on board.

The last error task (a week ago) shows BOINC client 7.6.33 in the error output. Have you perhaps downgraded BOINC between 13th and 20th April? Have you changed the OS in any way? Is it possible you have more than one account that these devices might be registered under? Without knowing what you might have changed along the way, it's hard to reconcile the conflicting indications.

I think the two different message lines you highlighted might well just be 'red herrings' and not really related to the true problem. I have seen the 'result already reported as success' message quite a few times before. Perhaps because the Einstein servers are often under pressure, there can be situations where a result being reported can be received and acted upon by the server but the acknowledgement might get lost somehow. The client only deletes the need to report the result after receiving an acknowledgement from the server. If it doesn't get one it will keep resending the report until it does. The server will eventually acknowledge and will log the situation as 'already reported'. If you see this sort of message, it means that the result has already been credited so your client can now safely delete the entry. There is no ongoing problem.

I vaguely remember seeing the second message as well. I think it might go something like this. If a host has a completed task that is unreported and the host has been off or out of contact for long enough for the task to have expired, a replacement will have been sent to a third host, which could be returned quickly and the quorum completed while your host has been out of contact. Once completed, work units don't remain in the online database for very long so the results that are part of the quorum quickly disappear. Your host, when contact is resumed, will try to report the task and will get the 'can't find result' type of response.

It also could just be an extension of the first situation. Your host might have reported a result (perhaps by clicking update) and then the machine got switched off without waiting for the acknowledgement, particularly if the server was being very slow, as it sometimes is. If the result had been accepted (but your client doesn't know this) it will try again when it is next started. In the meantime, because the quorum was actually completed anyway, the record of it is removed from the database fairly quickly. I'm not sure what the interval is these days - quite short I suspect, to keep the online database within reasonable bounds. If the boinc client that owned that task gets started up again and tries to make a report after the quorum has already been deleted, you can imagine what the server response might be :-).

Apart from trying to explain the log messages, I'm also wondering if your hardware might have some sort of dual identity and might have been switched from one to the other and perhaps back again. The fact that there are two different BOINC versions being reported shows that changes have been made. I think only you will really be able to sort this out but documenting all software changes certainly would help. I'm reasonably confident that it's not a server side problem or bug so you really have to look closer to home.

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119694083848

RAC: 25384156

Einsteinpi wrote:I suspended

21 Apr 2018 4:45:50 UTC

Message 165097 in response to message 165095

(moderation:

)

Einsteinpi wrote:

I suspended and then resumed the project and then checked the "Event Log." I don't see a Computer ID listed. The latest message just shows that the project was resumed. The only information listed in the log is project, date/time and message. No URL, computer ID, or resource share are shown.

This information is only logged in the event log as startup messages when the client actually starts. Suspending/resuming doesn't stop and start the client. To get the information, you don't really need to restart the client. What you see in the event log is also recorded on disk in the file 'stdoutdae.txt' in the BOINC directory. There is also the previous version of this file under a different extension. Between the two, you will have a lot of past history.

Depending on how often you do actually stop and restart the client, there could be several rounds of startup mesages stored there. If you can browse the current file (and the previous file if needed) and search for a line that reads, "Starting BOINC client version x.yy.zz ...." you will be in the right place. Over the lines that follow, you will find the requested information. This will also tell you exactly when you stopped and restarted the client because the dates and times for each of these is recorded as well.

You should also be able to find the messages for when particular tasks of interest were started, when they finished, and when they were uploaded. If you then browse back to the client startup messages, you will know the exact hostID they were reported under to the website. From this you should be able to work out what is going on.

Cheers,
Gary.

Einsteinpi

Joined: 10 Dec 14

Posts: 32

Credit: 835000

RAC: 216

Gary Roberts wrote:Have you

21 Apr 2018 18:26:56 UTC

Message 165103 in response to message 165096

(moderation:

)

Gary Roberts wrote:

Have you perhaps downgraded BOINC between 13th and 20th April? Have you changed the OS in any way? Is it possible you have more than one account that these devices might be registered under? Without knowing what you might have changed along the way, it's hard to reconcile the conflicting indications

I have two Raspberry Pis (one a first-generation 2B and a first-generation B+) and changed the operating systems on both of them beginning on the 13th of April. It's likely this is the cause of the problem. I don't remember the exact sequence of events, but maybe the pertinent detail is that I moved the operating system that had been running on the 2B to the B+ and attached the Einstein@Home project. There must have been a downgrade of BOINC/Einstein app on the B+ because only the 1.06 app will run on that computer. Version 1.42 was running on the 2B and now is running 1.47 beta with "Stretch." The "stdoutdae.txt" file for the B+ gives the Computer ID as 12117052 after the "Starting BOINC client version..." line for the 13th of April. I don't see the "stdoutdae.txt" file in the 2B directory, so I can't see what its Computer ID number was prior to the 13th.

What is the simplest way for me to deal with this mess? Detach the B+ and start over? Will accumulated credit be lost if I do that?

Thank you for taking the time to try to help me sort out this self-inflicted problem.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119694083848

RAC: 25384156

Einsteinpi wrote:I have two

23 Apr 2018 4:37:23 UTC

Message 165127 in response to message 165103

(moderation:

)

Einsteinpi wrote:

I have two Raspberry Pis (one a first-generation 2B and a first-generation B+) and changed the operating systems on both of them beginning on the 13th of April.

Your computers list shows three computers with one of them having had no contact since the early hours of 14th April. I guess that would be the ID of the B+ before you changed the OS. The thing that puzzles me is that the ID (12639727) is actually newer than the ID of 12117052 which you say is the same machine (B+) after the OS change. After a change like this, I would expect a change of host ID to result in an even newer one again and not an older one. So I'm guessing this really means that you not only swapped the OS on that machine but you also swapped the complete BOINC tree. Because of the big differences between a B+ and a 2B, this is bound to cause problems for the Einstein app.

I'm really not the person to advise you on this. I know nothing of any detail about this type of hardware or of what type/version of OS and BOINC/Project software you need to use to get things running properly. There is a long running sticky thread in Crunchers Corner that deals with these sort of devices. If I were you, I'd be perusing all the more recent stuff in that thread because I'm pretty sure I remember comments about various Pi versions and what will or will not run on them. I believe the B+ is pretty limited and maybe you'll need to be very specific for the OS/app versions you can actually use.

If it was running satisfactorily previously, before you swapped the OS, your easiest and least frustrating option is to install a fresh copy of that and start again - with a fresh host ID. If you can get BOINC to startup, you could just reset the project to start afresh (unless you already have a full copy of a system that works. If you have that, you could just remove the hostid in the state file to force a new one so that there's no possibility of having one that's already in use.

Quote:

What is the simplest way for me to deal with this mess? Detach the B+ and start over?

Probably.

Quote:

Will accumulated credit be lost if I do that?

No - as long as all host IDs you end up creating all belong to the one account, each individual ID will retain the credit it had accumulated and the account total will continue to reflect the sum of all the individual IDs. Just don't make the mistake of creating a second (or higher) account for yourself.

Quote:

Thank you for taking the time to try to help me sort out this self-inflicted problem.

You're welcome. I'm sorry that I can't really be of much assistance as I'm unfamiliar with both the hardware itself and the particular types of software that will run on it. Good luck with getting it going again.

Cheers,
Gary.

Einsteinpi

Joined: 10 Dec 14

Posts: 32

Credit: 835000

RAC: 216

Just to update with the

23 Apr 2018 18:59:25 UTC

Message 165141 in response to message 165127

(moderation:

)

Just to update with the latest development:

I detached the Raspberry Pi B+ from Einstein@Home, then reattached that computer to the project. It now appears that new work units for the B+ have been assigned to Computer 12641167, which previously had the name "Raspberry Pi 2B." Work that was underway on the 2B isn't showing anywhere in my web page account, but I can see that four work units are being processed when I remotely access (with ssh) the 2B. Should I have detached/reattached both the B+ and 2B? What does "Reset" accomplish?

It seems that I somehow need to establish separate host IDs for each computer, but don't know how to do that. Ideally, I'd like to be able to have the account identify the B+ and 2B separately (as it once was set up before I messed things up with my OS upgrades).

Is it possible/advisable to delete the present account and start afresh with a new one? I just want both Pis to continue supporting the Einstein Project to the full extent that they can.

Thank you again, Gary, for taking the time to try to help me sort this out. I recognize that your time could be better spent on issues involving more productive computers, so I'm grateful for your efforts.

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

When one does a

23 Apr 2018 22:29:48 UTC

Message 165144

(moderation:

)

When one does a detach/reattach sequence I believe the project servers will try to match the host against an existing record and with the confusion going on it will probably get it wrong.

At this point it might be worth to try and delete the Boinc data directory to eradicate all traces of Einstein@home on the host and then attach it again. If you should do this on the B+ or the 2B I can't say but if you care about credits then do it on the one least likely to earn credit for the work in progress, that would be the one not showing on the website as having tasks in progress.

The location of the Boinc data directory is shown in the start up messages in Boinc's Event log.

Einsteinpi

Joined: 10 Dec 14

Posts: 32

Credit: 835000

RAC: 216

I've dropped the B+ from the

24 Apr 2018 3:04:24 UTC

Message 165146

(moderation:

)

I've dropped the B+ from the account and reset the 2B. The account now shows work in progress for the 2B, so I guess the account configuration is working as it should.

Thanks again to Holmis and Gary for your kind assistance. It's much appreciated!

Recent Work Not Identified in Account

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports