Trouble with Gamma-ray pulsar search #2 v0.01

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 249,055,126
RAC: 33,162

RE: In all GW runs, there

Quote:

In all GW runs, there have been 'sun*' and 'earth*' ephemeris files which are required by all tasks and so are marked with the tag to prevent them from ever being deleted.

I'm guessing that the 'EPH' is short for ephemeris and that 'JPLEPH.405' should therefore also be a file.

All this is correct. The tag is there in the FGRP workunit definitions, which should be visible in the sched_reply files which contain FGRP2 WUs.

There seems to be a problem with parsing it in certain client versions.

Any reference on what versions work and which don't?

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,927,925,218
RAC: 763,836

RE: RE: In all GW runs,

Quote:
Quote:

In all GW runs, there have been 'sun*' and 'earth*' ephemeris files which are required by all tasks and so are marked with the tag to prevent them from ever being deleted.

I'm guessing that the 'EPH' is short for ephemeris and that 'JPLEPH.405' should therefore also be a file.


All this is correct. The tag is there in the FGRP workunit definitions, which should be visible in the sched_reply files which contain FGRP2 WUs.

There seems to be a problem with parsing it in certain client versions.

Any reference on what versions work and which don't?

BM


OK, two cases, both Windows.

v6.12.34 - file JPLEPH.405 is present on the machine, and is referenced with in client_state - but there are currently no FGRP#2 tasks on the machine. Correct behaviour.

v7.0.44 (current alpha) - file JPLEPH.405 was downloaded afresh with new work, and when it's present, the reference in client_state also contains the tag. But it gets deleted when the last cached task on the machine is completed. This is in the v7.0.44 log:

Quote:
17/01/2013 12:06:41 | Einstein@Home | [sched_op] Starting scheduler request
17/01/2013 12:06:41 | Einstein@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA (7295.86 sec, 0.00 inst)
17/01/2013 12:06:41 | Einstein@Home | Sending scheduler request: To fetch work.
17/01/2013 12:06:41 | Einstein@Home | Requesting new tasks for NVIDIA
17/01/2013 12:06:41 | Einstein@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
17/01/2013 12:06:41 | Einstein@Home | [sched_op] NVIDIA work request: 7295.86 seconds; 0.00 devices
17/01/2013 12:06:43 | Einstein@Home | Scheduler request completed: got 3 new tasks
17/01/2013 12:06:43 | Einstein@Home | [sched_op] Server version 611
17/01/2013 12:06:43 | Einstein@Home | BOINC will delete file JPLEPH.405 (no longer needed)
17/01/2013 12:06:43 | Einstein@Home | Project requested delay of 60 seconds
17/01/2013 12:06:43 | Einstein@Home | [sched_op] estimated total CPU task duration: 0 seconds
17/01/2013 12:06:43 | Einstein@Home | [sched_op] estimated total NVIDIA task duration: 8797 seconds
17/01/2013 12:06:43 | Einstein@Home | [sched_op] Deferring communication for 1 min 0 sec
17/01/2013 12:06:43 | Einstein@Home | [sched_op] Reason: requested by project
17/01/2013 12:06:43 | | [work_fetch] Request work fetch: RPC complete

There's no reference to the server requesting a delete in http://einstein.phys.uwm.edu/host_sched_logs/5744/5744895

Edit - the JPLEPH.405 file on my v6.12.34 machine is datestamped 02 August 2011, and hasn't been mentioned in BOINC message logs going back to 13 March 2012.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 249,055,126
RAC: 33,162

Thanks, that's interesting.

Thanks, that's interesting. HB reported that with 7.0.29 (Mac OS X) the tag isn't even in the client_state.xml file.

However it is there for the GW search's sticky files (h1_*, l1_*, earth*, sun*). Does 7.0.44 behave correctly wrt. these "sticky" files?

BM

Edit: Oliver reported that the "XML" parser in the client was changed for 6.13.x Clients. The only difference in the XML to be parsed seems to be that there are TABs in the FGRP WUs, the GW uses only SPACEs.

BM

BM

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 982
Credit: 25,170,813
RAC: 2

For the record, the BML

For the record, the BML (BOINC XML) parser got changed in 7ef34c2 (r23984). This commit got integrated in client_release_6.13.2. It could be a regression in parsing whitespace used for tag indentation...

HTH,
Oliver

Einstein@Home Project

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,927,925,218
RAC: 763,836

RE: Thanks, that's

Quote:

Thanks, that's interesting. HB reported that with 7.0.29 (Mac OS X) the tag isn't even in the client_state.xml file.

However it is there for the GW search's sticky files (h1_*, l1_*, earth*, sun*). Does 7.0.44 behave correctly wrt. these "sticky" files?

BM

Edit: Oliver reported that the "XML" parser in the client was changed for 6.13.x Clients. The only difference in the XML to be parsed seems to be that there are TABs in the FGRP WUs, the GW uses only SPACEs.

BM


This doesn't feel like a parsing problem here, more of a logic problem.

I've still got the single task downloaded this morning with the fresh issue of JPLEPH.405, but the tag (which was pesent after download) is no longer there. Working hypothesis is that it was removed when that "will delete ... no longer needed" message appeared in the log. And the hypothesis is that the message appeared (self-generated by the client, not from the sched_reply) when a scheduler RPC allocated new work, not for FGRP, and there would be no FGRP work left on the machine after the currently active task completes. But it looks as if the actual sched_reply file was overwritten by a second RPC one minute later - I'll have to re-check that one.

The machine does have h1 / l1 / earth / sun files from previous GW runs, marked in client_state, although the machine has no GW tasks at present. I'll try and see if there's any visible difference in the tagging.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 249,055,126
RAC: 33,162

I started producing FGRP2

I started producing FGRP2 work on Albert that should not have any TABs in the file_info. Maybe it gets any better there.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,927,925,218
RAC: 763,836

RE: I started producing

Quote:

I started producing FGRP2 work on Albert that should not have any TABs in the file_info. Maybe it gets any better there.

BM


I think it must be a server configuration issue. In a sched_reply from Albert timed 13:27 UTC today, I'm seeing:

BOINC will delete file JPLEPH.405 (no longer needed) JPLEPH.405 That maybe sounds like new functionality built into the server when the sticky file mechanism was reviewed for BOINC v7, last year - maybe BOINC v6 clients can't handle it.

For the record, I'm testing on the BOINC v7.0.44 machine (host 5744895) which was only built last August, and has never run BOINC v6, ever. But it has truly sticky GW files, from both Einstein and Albert, so there's a definite difference in behaviour between the FGRP and GW file-sets.

Edit - matching Albert server log contains

2013-01-17 13:27:51.2393 [PID=24524] [user_messages] [HOST#5367] MSG(low) BOINC will delete file JPLEPH.405 (no longer needed)

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 249,055,126
RAC: 33,162

RE: I think it must be a

Quote:
I think it must be a server configuration issue.

Correct.

What happened is that the "locality scheduler" didn't know about the JPLEPH.405 file (of the non-locality FGRP application) and sends a 'delete' request.

This should be fixed now on both Albert and Einstein.

Thank you very much!

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 249,055,126
RAC: 33,162

Wait - the JPLEPH.405 file

Wait - the JPLEPH.405 file shouldn't have a tag, right?

Why then does the server (here: locality scheduler) knows about the file at all, so it can send a delete request?

BM

BM

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 982
Credit: 25,170,813
RAC: 2

Watch out: 4e603ea (r23431)

Watch out: 4e603ea (r23431) became effective as of client_release_6.13.0.

Oliver

Einstein@Home Project

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.