Questions, comments and problems on new Fermi LAT gamma-ray pulsar search

Bruce Allen

Moderator

Joined: 15 Oct 04

Posts: 1120

Credit: 172127663

RAC: 0

Hi Frank, RE: i

6 Jul 2011 7:35:04 UTC

Message 105792 in response to message 105765

(moderation:

)

Hi Frank,

Quote:

i don't seem to be able to opt out on S6Bucket, but i want to do so.

This is a deliberate policy decision on my part; let me explain why.

The search for gravitational waves is the fundamental reason for Einstein@Home, and I want to keep that at the core of our activities. The scientific impact of gravitational wave detections and observations is hard to overstate.

Detecting gravitational waves is hard -- it's not a walk in the park -- and I want to ensure that at least half of our computational resources go into that direction. I hope you understand!

Cheers,
Bruce

Director, Einstein@Home

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253667984

RAC: 35093

Apparently the current app

6 Jul 2011 7:37:05 UTC

Message 105793

(moderation:

)

Apparently the current app does checkpoint correctly, but it doesn't call boinc_checkpoint_completed(), which signals this to the Core Client.

The next app version will have this fixed, and more frequent progress updates, too. I already fixed this in the source code.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 3044637617

RAC: 2023864

Task 236578787 completed and

6 Jul 2011 8:16:05 UTC

Message 105794

(moderation:

)

Task 236578787 completed and reported. Out of curiosity, I looked through the result lists for my wingmate host 4092109. The Tesla card is returning results just fine, but I couldn't find any CPU results - just some post-deadline 'aborted by [user]', as we've discussed before.

On the checkpoint issue: it won't be visible in the limited amount of debug data returned by the client, but I found these lines in stderr.txt on my P4

19:29:15 (2376): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRP1_0.16_windows_intelx86.exe'.
% checkpoint read: skypoint 5
% Starting barycentering for sky point 6 / 50

and it seems to be carrying on OK from there.

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7394481687

RAC: 1980886

My task from wu 100294686

6 Jul 2011 13:35:21 UTC

Message 105795

(moderation:

)

My task from wu 100294686 completed and reported. My quorum partner runs an i7 and has little work in queue, but reported several errors on GW work on July 4 and almost nothing on July 5. So confirmation might take a while.

Reported CPU time on my host was 28,896.21. This host is currently typically taking somewhat over 19,000 seconds on GW work.

The stderr text visible in the task page is quite extensive.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 3044637617

RAC: 2023864

RE: The stderr text visible

6 Jul 2011 14:01:42 UTC

Message 105796 in response to message 105795

(moderation:

)

Quote:

The stderr text visible in the task page is quite extensive.

It's detailled, but unfortunately not comprehensive. We only see the last (I think) 64 KiB of a file which will end up at over 750 KiB for a completed run. We join your result, for example, in the middle of the 46th. out of 50 skypoints. My benchmark/checkpoint restart happened ar skypoint 5, well before the segment we might expect to see displayed.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253667984

RAC: 35093

We'll certainly reduce the

6 Jul 2011 14:04:58 UTC

Message 105797 in response to message 105796

(moderation:

)

We'll certainly reduce the verbosity in the future.

mickydl*

Joined: 7 Oct 08

Posts: 39

Credit: 200374822

RAC: 0

I also got a WU here,

6 Jul 2011 19:21:18 UTC

Message 105798

(moderation:

)

I also got a WU here, finished, reported and validated. The runtime of roughly 4h on my machine could be about right but the 920s for my wingman seem odd.

mickydl*

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 3044637617

RAC: 2023864

RE: I also got a WU here,

6 Jul 2011 19:43:48 UTC

Message 105799 in response to message 105798

(moderation:

)

Quote:

I also got a WU here, finished, reported and validated. The runtime of roughly 4h on my machine could be about right but the 920s for my wingman seem odd.

mickydl*

Your wingmate has

% checkpoint read: skypoint 48 so it re-started with just two skypoints to go. The app is failing to tell BOINC about checkpoints, and - perhaps as a consequence? - BOINC isn't remembering time already spent before a break in computing. The 920s will be the time taken for those last two skypoints only.

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

Actually it seems to have

6 Jul 2011 19:50:26 UTC

Message 105800 in response to message 105799

(moderation:

)

Actually it seems to have restarted after skypoint #49 to:

Quote:

% checkpoint read: skypoint 49
...
% Search time spent: 513 s
% * Time spent on coherent follow-ups: 409 s

513 s + 409 s = 922 s

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 3044637617

RAC: 2023864

Task 236580621 is back from

7 Jul 2011 17:39:20 UTC

Message 105801

(moderation:

)

Task 236580621 is back from the P4, though again suffering (at least temporarily) from MIA wingman syndrome.

I'm still wondering why the ATLAS node that I'm waiting on for my first validation doesn't seem to be using any of its four CPUs for crunching - I see pleanty of valid BRP3cuda32fullCPU results, but all the CPU tasks (which are still being allocated) end up past deadline and self-aborted. Seems a waste of good bandwidth, somehow.

Questions, comments and problems on new Fermi LAT gamma-ray pulsar search

Forums › Technical News

Comment viewing options

Forums › Technical News