Questions, comments and problems on new Fermi LAT gamma-ray pulsar search

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1,119
Credit: 172,127,663
RAC: 0

Hi Frank, RE: i

Hi Frank,

Quote:
i don't seem to be able to opt out on S6Bucket, but i want to do so.

This is a deliberate policy decision on my part; let me explain why.

The search for gravitational waves is the fundamental reason for Einstein@Home, and I want to keep that at the core of our activities. The scientific impact of gravitational wave detections and observations is hard to overstate.

Detecting gravitational waves is hard -- it's not a walk in the park -- and I want to ensure that at least half of our computational resources go into that direction. I hope you understand!

Cheers,
Bruce

Director, Einstein@Home

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 249,059,472
RAC: 33,410

Apparently the current app

Apparently the current app does checkpoint correctly, but it doesn't call boinc_checkpoint_completed(), which signals this to the Core Client.

The next app version will have this fixed, and more frequent progress updates, too. I already fixed this in the source code.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,927,955,216
RAC: 762,051

Task 236578787 completed and

Task 236578787 completed and reported. Out of curiosity, I looked through the result lists for my wingmate host 4092109. The Tesla card is returning results just fine, but I couldn't find any CPU results - just some post-deadline 'aborted by [user]', as we've discussed before.

On the checkpoint issue: it won't be visible in the limited amount of debug data returned by the client, but I found these lines in stderr.txt on my P4

19:29:15 (2376): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRP1_0.16_windows_intelx86.exe'.
% checkpoint read: skypoint 5
% Starting barycentering for sky point 6 / 50

and it seems to be carrying on OK from there.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,157
Credit: 7,183,444,931
RAC: 768,000

My task from wu 100294686

My task from wu 100294686 completed and reported. My quorum partner runs an i7 and has little work in queue, but reported several errors on GW work on July 4 and almost nothing on July 5. So confirmation might take a while.

Reported CPU time on my host was 28,896.21. This host is currently typically taking somewhat over 19,000 seconds on GW work.

The stderr text visible in the task page is quite extensive.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,927,955,216
RAC: 762,051

RE: The stderr text visible

Quote:
The stderr text visible in the task page is quite extensive.


It's detailled, but unfortunately not comprehensive. We only see the last (I think) 64 KiB of a file which will end up at over 750 KiB for a completed run. We join your result, for example, in the middle of the 46th. out of 50 skypoints. My benchmark/checkpoint restart happened ar skypoint 5, well before the segment we might expect to see displayed.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 249,059,472
RAC: 33,410

We'll certainly reduce the

We'll certainly reduce the verbosity in the future.

BM

BM

mickydl*
mickydl*
Joined: 7 Oct 08
Posts: 39
Credit: 200,374,822
RAC: 0

I also got a WU here,

I also got a WU here, finished, reported and validated. The runtime of roughly 4h on my machine could be about right but the 920s for my wingman seem odd.

mickydl*

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,927,955,216
RAC: 762,051

RE: I also got a WU here,

Quote:

I also got a WU here, finished, reported and validated. The runtime of roughly 4h on my machine could be about right but the 920s for my wingman seem odd.

mickydl*


Your wingmate has

% checkpoint read: skypoint 48 so it re-started with just two skypoints to go. The app is failing to tell BOINC about checkpoints, and - perhaps as a consequence? - BOINC isn't remembering time already spent before a break in computing. The 920s will be the time taken for those last two skypoints only.

Holmis
Joined: 4 Jan 05
Posts: 1,118
Credit: 1,055,935,564
RAC: 0

Actually it seems to have

Actually it seems to have restarted after skypoint #49 to:

Quote:

% checkpoint read: skypoint 49
...
% Search time spent: 513 s
% * Time spent on coherent follow-ups: 409 s

513 s + 409 s = 922 s

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,927,955,216
RAC: 762,051

Task 236580621 is back from

Task 236580621 is back from the P4, though again suffering (at least temporarily) from MIA wingman syndrome.

I'm still wondering why the ATLAS node that I'm waiting on for my first validation doesn't seem to be using any of its four CPUs for crunching - I see pleanty of valid BRP3cuda32fullCPU results, but all the CPU tasks (which are still being allocated) end up past deadline and self-aborted. Seems a waste of good bandwidth, somehow.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.