I'm confused ...

GreyCruncher
GreyCruncher
Joined: 2 Sep 06
Posts: 22
Credit: 28664453
RAC: 0
Topic 194357

http://einsteinathome.org/host/1938556/tasks
and
http://einsteinathome.org/host/1938556/tasks&offset=20
shows some entries in red. Okay, I'm checking this on my cruncher ... and I found nothing related.

zaphod:~# grep 10-May /mnt/home/seti2/stdoutdae.old |grep -i einstei
10-May-2009 03:20:41 [Einstein@Home] Starting p2030_53834_38094_0063_G47.74+00.45.C_6.dm_529_0
10-May-2009 03:20:43 [Einstein@Home] Starting task p2030_53834_38094_0063_G47.74+00.45.C_6.dm_529_0 using einsteinbinary_ABP1 version 104
10-May-2009 07:13:19 [Einstein@Home] Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 1 completed tasks
10-May-2009 07:13:24 [Einstein@Home] Scheduler request succeeded: got 0 new tasks
10-May-2009 11:21:36 [Einstein@Home] Restarting task p2030_53834_38094_0063_G47.74+00.45.C_6.dm_529_0 using einsteinbinary_ABP1 version 104
10-May-2009 15:11:44 [Einstein@Home] Sending scheduler request: To fetch work. Requesting 40513 seconds of work, reporting 0 completed tasks
10-May-2009 15:11:55 [Einstein@Home] Scheduler request succeeded: got 2 new tasks
10-May-2009 15:11:55 [Einstein@Home] Message from server: (Project has no jobs available)
10-May-2009 15:11:56 [Einstein@Home] Started download of skygrid_0820Hz_S5R5.dat
10-May-2009 15:11:58 [Einstein@Home] Finished download of skygrid_0820Hz_S5R5.dat
10-May-2009 19:21:53 [Einstein@Home] Restarting task p2030_53834_38094_0063_G47.74+00.45.C_6.dm_529_0 using einsteinbinary_ABP1 version 104
10-May-2009 21:25:41 [Einstein@Home] Computation for task p2030_53834_38094_0063_G47.74+00.45.C_6.dm_529_0 finished
10-May-2009 21:25:41 [Einstein@Home] Starting h1_0817.25_S5R4__1109_S5R5a_1
10-May-2009 21:25:41 [Einstein@Home] Starting task h1_0817.25_S5R4__1109_S5R5a_1 using einstein_S5R5 version 101
10-May-2009 21:25:43 [Einstein@Home] Started upload of p2030_53834_38094_0063_G47.74+00.45.C_6.dm_529_0_0
10-May-2009 21:25:46 [Einstein@Home] Finished upload of p2030_53834_38094_0063_G47.74+00.45.C_6.dm_529_0_0
10-May-2009 22:01:33 [Einstein@Home] Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 1 completed tasks
10-May-2009 22:01:38 [Einstein@Home] Scheduler request succeeded: got 0 new tasks

Today I found:

25-May-2009 11:25:36 [Einstein@Home] New host venue: home
25-May-2009 11:25:36 [Einstein@Home] Generated new host CPID: c85ac7124343388e94a47abd8f341e12

Right now another CPID ...
25-May-2009 12:50:41 [Einstein@Home] Generated new host CPID: ce3c7ca5b1xxxxxxfdc7ef86aea7abf6
Strange.

And now I'm confused. It looks for me as something (the einstein database?) mixed some host IDs ...
Does someone have an idea what happend?
Thanks

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

I'm confused ...

OK, couple of things.

First off, when you are looking for specific tasks in the CC logs, you have to search for the filename of the task to see if there are any entries for the candidate tasks you are interested in.

If there aren't any at all, then that task is most likely a ghost.

The strange part here is that ordinarily EAH is configured to resend 'lost' work to the host if possible and necessary. Therefore, if the 'red' tasks were in fact ghosts, why didn't the project resend them before they expired on deadline? It's had plenty of opportunities to do so since May 12th (that's the first showing contact time after the send date of the expired tasks).

The next question is, did you detach the project? If not, then this looks like a spontaneous detach.

The only thing I know of which causes one of those is if the CC sends a scheduler request to the project which is less than or equal to the last RPC Sequence Number the project has on record as coming from the host. Typically, that comes from things like trying to restore the BOINC folders from a backup set, a really ugly CC crash which requires it to rollback to the backup client_state file upon restart of the CC, and/or a user error in editing the client_state.

HTH,

Alinator

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

You can also get detaches

You can also get detaches from account managers if the project is not properly listed in the AM. I have had BAM do that to me a couple of times ...

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

RE: You can also get

Message 92942 in response to message 92941

Quote:
You can also get detaches from account managers if the project is not properly listed in the AM. I have had BAM do that to me a couple of times ...

Hmmm...

Don't know, never used an AM.

However looking this over again, that might be a good candidate cause here if the OP uses an AM.

Alinator

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

RE: RE: You can also get

Message 92943 in response to message 92942

Quote:
Quote:
You can also get detaches from account managers if the project is not properly listed in the AM. I have had BAM do that to me a couple of times ...

Hmmm...

Don't know, never used an AM.

However looking this over again, that might be a good candidate cause here if the OP uses an AM.


Just a thought. I usuallly use them only when setting up a new computer to save typing. Connect to the AM and it downloads all the 50 some projects to attach ... saves looking up URLs and passwords.

Then I detach...

GreyCruncher
GreyCruncher
Joined: 2 Sep 06
Posts: 22
Credit: 28664453
RAC: 0

RE: OK, couple of

Message 92944 in response to message 92940

Quote:

OK, couple of things.

First off, when you are looking for specific tasks in the CC logs, you have to search for the filename of the task to see if there are any entries for the candidate tasks you are interested in.

If there aren't any at all, then that task is most likely a ghost.

The strange part here is that ordinarily EAH is configured to resend 'lost' work to the host if possible and necessary. Therefore, if the 'red' tasks were in fact ghosts, why didn't the project resend them before they expired on deadline? It's had plenty of opportunities to do so since May 12th (that's the first showing contact time after the send date of the expired tasks).

The next question is, did you detach the project? If not, then this looks like a spontaneous detach.

The only thing I know of which causes one of those is if the CC sends a scheduler request to the project which is less than or equal to the last RPC Sequence Number the project has on record as coming from the host. Typically, that comes from things like trying to restore the BOINC folders from a backup set, a really ugly CC crash which requires it to rollback to the backup client_state file upon restart of the CC, and/or a user error in editing the client_state.

HTH,

Alinator

This computer has not been detached from the project, it still receives workunits. And I never used an account manager. I also have not seen any "strange" workunit entries in the boincmgr view, all workunit shown there were finished fine. so I assume my computer has never seen these workunits ... and why should it, the computer requested no work or received work at this time.
ntp is set up and in sync, the timezone ist 1 hour different and the log ist complete for the may 10th.
And no entries in the log for "lost results".

The funny thing is:

10-May-2009 15:11:44 [Einstein@Home] Sending scheduler request: To fetch work. Requesting 40513 seconds of work, reporting 0 completed tasks
10-May-2009 15:11:55 [Einstein@Home] Scheduler request succeeded: got 2 new tasks

And my computer received two workunit ... and not the red workunits.
...
And much more strange: see http://einsteinathome.org/host/1938556/tasks

The website says, my computer is detached ... but that's absolutely wrong. http://einsteinathome.org/host/1938556 is my computer ... and the boincmgr shows 3 active workunits.

Slowly it's getting weird.
Stephan

GreyCruncher
GreyCruncher
Joined: 2 Sep 06
Posts: 22
Credit: 28664453
RAC: 0

26-May-2009 02:04:31

26-May-2009 02:04:31 [Einstein@Home] Computation for task h1_0817.35_S5R4__1075_S5R5a_0 finished
26-May-2009 02:04:31 [Einstein@Home] Starting h1_0817.30_S5R4__666_S5R5a_1
Archive: ../../projects/einstein.phys.uwm.edu/skygrid_0820Hz_S5R5.dat
inflating: ./skygrid_0820Hz_S5R5.dat 26-May-2009 02:04:31 [Einstein@Home] Starting task h1_0817.30_S5R4__666_S5R5a_1 using einstein_S5R5 version 101

26-May-2009 02:04:34 [Einstein@Home] Started upload of h1_0817.35_S5R4__1075_S5R5a_0_0
26-May-2009 02:04:42 [Einstein@Home] Finished upload of h1_0817.35_S5R4__1075_S5R5a_0_0
26-May-2009 07:28:02 [Einstein@Home] Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 1 completed tasks
26-May-2009 07:28:07 [Einstein@Home] Scheduler request succeeded: got 0 new tasks

How can my computer finish a workunit when it should detached?

---> h1_0817.35_S5R4__1075_S5R5a,
http://einsteinathome.org/workunit/53078752 says, the client has detached.
127813948 1938556 24 May 2009 9:03:19 UTC 25 May 2009 11:50:38 UTC Over Client detached New 0.00

And of course I'd like to get the credits.
Stephan

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 715018597
RAC: 934897

Strange indeed, I forwarded

Strange indeed, I forwarded this to the admins.

One question: The logs mentions somethings about a changed venue (which probably prompted the generation of a new CPID). Anyway, did you change the venue or something else on the morning of May 25?

Do you still have the logs available for 25 May 2009 11:50:38 UTC the time when the client detach was received by the server (and a few minutes before that time)? That might be helpful when trying to nail down the problem.

Thanks
Bikeman

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

The messages from Messages

Message 92947 in response to message 92946

The messages from Messages tab are also stored in the file stdoutdae.txt (and there for a longer time).

Computer sind nicht alles im Leben. (Kleiner Scherz)

GreyCruncher
GreyCruncher
Joined: 2 Sep 06
Posts: 22
Credit: 28664453
RAC: 0

RE: Strange indeed, I

Message 92948 in response to message 92946

Quote:

Strange indeed, I forwarded this to the admins.

One question: The logs mentions somethings about a changed venue (which probably prompted the generation of a new CPID). Anyway, did you change the venue or something else on the morning of May 25?

Do you still have the logs available for 25 May 2009 11:50:38 UTC the time when the client detach was received by the server (and a few minutes before that time)? That might be helpful when trying to nail down the problem.

Thanks
Bikeman

I did not change anything, not even the venue (at least I do not know). Additionally, I'm using an global_prefs_override.xml file which was unchanged ...
I have any logs you or Bernd or any admin likes to have back to Feb 2009. The actual log is around 250k and can be mailed.
Stephan

grep 25-May /mnt/home/seti2/stdoutdae.txt |grep -i einstei
25-May-2009 04:53:33 [Einstein@Home] Starting h1_0817.30_S5R4__667_S5R5a_1
25-May-2009 04:53:34 [Einstein@Home] Starting task h1_0817.30_S5R4__667_S5R5a_1 using einstein_S5R5 version 101
25-May-2009 05:18:42 [Einstein@Home] Sending scheduler request: To fetch work. Requesting 15782 seconds of work, reporting 0 completed tasks
25-May-2009 05:18:47 [Einstein@Home] Scheduler request succeeded: got 1 new tasks
25-May-2009 05:18:47 [Einstein@Home] Message from server: (Project has no jobs available)
25-May-2009 10:03:31 [Einstein@Home] Restarting task h1_0817.30_S5R4__667_S5R5a_1 using einstein_S5R5 version 101
25-May-2009 11:25:31 [Einstein@Home] Sending scheduler request: To fetch work. Requesting 13834 seconds of work, reporting 0 completed tasks
25-May-2009 11:25:36 [Einstein@Home] Scheduler request succeeded: got 1 new tasks
25-May-2009 11:25:36 [Einstein@Home] New host venue: home
25-May-2009 11:25:36 [Einstein@Home] Generated new host CPID: c85ac7124343388e94a47abd8f341e12
25-May-2009 11:25:38 [Einstein@Home] Started download of p2030_54006_00031_0000_G62.73+02.84.C_0_78.binary
25-May-2009 11:25:45 [Einstein@Home] Finished download of p2030_54006_00031_0000_G62.73+02.84.C_0_78.binary
25-May-2009 12:38:16 [Einstein@Home] Computation for task h1_0817.30_S5R4__667_S5R5a_1 finished
25-May-2009 12:38:19 [Einstein@Home] Started upload of h1_0817.30_S5R4__667_S5R5a_1_0
25-May-2009 12:38:28 [Einstein@Home] Finished upload of h1_0817.30_S5R4__667_S5R5a_1_0
25-May-2009 12:50:35 [Einstein@Home] Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 1 completed tasks
25-May-2009 12:50:41 [Einstein@Home] Scheduler request succeeded: got 0 new tasks
25-May-2009 12:50:41 [Einstein@Home] Generated new host CPID: ce3c7ca5b1cdc750fdc7ef86aea7abf6
25-May-2009 16:38:28 [Einstein@Home] Starting h1_0817.35_S5R4__1075_S5R5a_0
25-May-2009 16:38:28 [Einstein@Home] Starting task h1_0817.35_S5R4__1075_S5R5a_0 using einstein_S5R5 version 101
25-May-2009 20:28:22 [Einstein@Home] Sending scheduler request: To fetch work. Requesting 6952 seconds of work, reporting 0 completed tasks
25-May-2009 20:28:27 [Einstein@Home] Scheduler request succeeded: got 1 new tasks
25-May-2009 20:28:27 [Einstein@Home] Message from server: (Project has no jobs available)
25-May-2009 22:33:59 [Einstein@Home] Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
25-May-2009 22:34:04 [Einstein@Home] Scheduler request succeeded: got 0 new tasks
25-May-2009 22:42:36 [Einstein@Home] Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
25-May-2009 22:42:47 [Einstein@Home] Scheduler request succeeded: got 0 new tasks
25-May-2009 22:49:31 [Einstein@Home] Sending scheduler request: To fetch work. Requesting 13656 seconds of work, reporting 0 completed tasks
25-May-2009 22:49:42 [Einstein@Home] Scheduler request succeeded: got 1 new tasks
25-May-2009 22:49:42 [Einstein@Home] Message from server: (Project has no jobs available)
25-May-2009 22:52:19 [Einstein@Home] Restarting task h1_0817.35_S5R4__1075_S5R5a_0 using einstein_S5R5 version 101
25-May-2009 23:44:04 [Einstein@Home] Starting p2030_53995_06094_0051_G69.44-02.70.C_2.dm_371_0
25-May-2009 23:44:06 [Einstein@Home] Starting task p2030_53995_06094_0051_G69.44-02.70.C_2.dm_371_0 using einsteinbinary_ABP1 version 104

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 715018597
RAC: 934897

This is really

This is really puzzling.

Just a wild shot in the dark: Around the time where the problem started, you attached a new host : http://einsteinathome.org/host/1955506

Is it possible that somehow (e.g. by initially cloning the BOINC directory) the new host assumed the identity of the other hosts and threw the client-server comms out of sync? It seems to be quite a coincidence that the problem started very shortly after the new host was attached.

CU
Bikeman

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.