Visualization of S5R2 / S5R3 difference

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 695,103,089
RAC: 133,081
Topic 193259

Hi!

I was playing around with a tool called "Topcat", visualizing Einstein@Home result files (the stuff that gets sent back to the server from your PC).

The first picture shows a visualization of an actual S5R2 result file (h1_0281.20_S5R2__65_S5R2c_4_0 to be exact).

The meaning is the following:

The sphere you see is the starsphere as seen in the "Screensaver" view. Think of it as Space with Earth in the center.

Every point in the sphere is a measurement point for a candidate pulsar, there are 10000 in each result (the 10000 most interesting ones), where the color of the dot also shows how interesting the candidate is: red are the most interesting, violet the least interesting). Because there can be multiple candidates at the same sky coordinate (for slighly different frequency and other search parameter), the Z-axis is also used to move the more interesting candidates to the outside of the sphere and the least interesting ones to the inside. So the few red dots that are close to the sphere's surface are the most interesting.

The following image is the same diagram for an actual S5R3 result (h1_0372.90_S5R2__83_S5R3a_1_0).

As you can see, only a segment of the sky is searched in this workunit, but in greater detail (the file still contains 10000 candidates = dots). Other workunits from this frequency series will cover the rest of the sky.

This might help to demonstrate why we see more variation in runtime in S5R3. The algorithm seems to choose certain parameters that are important for performance differently in different regions of the sky, which didn't matter in S5R2 as this averaged out over the complete sky. In S5R3, however, some WU are now "faster" than others.

The "Topcat" software is free, if anybody is interested I can post a How-To later how to visualize your own results (not that it makes sense, but just for fun).

I think this might be a cool addition to the current screensaver, if it's not too much of a memory and performance killer.

CU
Bikeman

Reinhard Prix
Reinhard Prix
Joined: 15 Oct 04
Posts: 6
Credit: 1,197,631
RAC: 0

Visualization of S5R2 / S5R3 difference

Wow, very impressive!

You are absolutely right that in S5R3 we've started splitting the sky and each WU doing only one sky-patch, while in S5R2 each WU was "all-sky". We also think this is what caused the run-time variations as you suspected, and we're currently looking into how to best deal with that: either by adjusting the credits correctly to the actual runtime for a given skypatch, or by eliminating the run-time fluctuations.

Your visualization tool definitely looks very nice, and I'm happy to see it's been released under the GPL. A little how-to on how you produced those plots would be interesting indeed.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 695,103,089
RAC: 133,081

Hi! The TopCat tool can be

Hi!

The TopCat tool can be found here .

It's written in JAVA and should run on all E@H supported platforms. The development of this tool has a background in astronomy, so it's probably also possible to import the coordinates of some known pulsars from scientific catalog data into the diagrams, just as in the screen-saver view.

Next challenge is to intercept an example result before it's uploaded to the central server.

1) make sure you have one more task in the queue so you don't waste CPU cycles,
2) disable network communication in BOINC manager
3) once the result is finished, you'll find a file named h1_FFFF.FF_S5R2_SEQ_S5R3a_N_M (where FFFF.FF, SEQ, N and M are numbers) in the projects/einstein.phys.uwm.edu subfolder of your BOINC installation.

copy this to a different location, and rename it so it has a "zip" extension (it's really a zip file).

4) re-enable network access so the result gets delivered.

5) unzip the file. You should now see an ASCII file with 10001 lines.
6) remove the last line (something like "%DONE") from this file, you now have a file containing 10000 lines of numbers arranged in 5 columns, and save the file

7) start TOPCAT

8) Use the LOAD button to import the file changed in 6), choosing AUTO or ASCII as format

9) Push the "Spherical Plot" button

10) in the dialog, choose:

Longitude Axis: col2
Latitude Axis: col3

Make sure to select "radians" as unit for both axis

Push this button to get a z-axis. Select col5.

You should now see lots of dots :-). To add some color, push this button and select col5 as well for this "auxiliary axis" .

Voila.

You can rotate the sphere by dragging it around with the mouse.

Would be cooler if there was an easier way to retain the result-files before they are sent to the server and then deleted automatically.

CU

H-BE

Jean Jeener
Jean Jeener
Joined: 3 Jun 05
Posts: 37
Credit: 5,291,636
RAC: 0

Congratulation, and thank you

Congratulation, and thank you very much: this gives a vivid illustration of the continuous improvement in search strategy used by Einstein@Home. More posts of this type and quality may help recruit new crunchers and keep them longer. Also, it provides whoever teaches a course on data processing a real life example that the field is very much alive. Instructions about using TopCat would be welcome.

With best regards to Bikeman, Jean Jeener.

P.S. While I was writing this post, your directions for use of TopCat appeared in your recent post. Should one not copy the relevant file to a different place and rename it, before re-enabling network communication?

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,927,308,604
RAC: 769,136

RE: Would be cooler if

Message 74757 in response to message 74755

Quote:
Would be cooler if there was an easier way to retain the result-files before they are sent to the server and then deleted automatically.


I think BoincLogX may do this. I'll play around with it when I get home tonight.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 695,103,089
RAC: 133,081

RE: P.S. While I was

Message 74758 in response to message 74756

Quote:

P.S. While I was writing this post, your directions for use of TopCat appeared in your recent post. Should one not copy the relevant file to a different place and rename it, before re-enabling network communication?

Upps, I inserted the instruction to re-enable network access in the wrong line, thanks for pointing this out. It's corrected now.

CU
Bikeman

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,927,308,604
RAC: 769,136

Confirmed, BoincLogX does -

Confirmed, BoincLogX does - or can be configured to - keep a copy of the result files that are uploaded to BOINC servers, including Einstein. Following Bikeman's instructions, I was able to get a TopCat display out of a LogX file that had happened to save itself alongside the SETI files I was interested in.

However, the capture of result files seems to be pretty weak. I have logging set for once every 30 seconds: it's only caught seven S5R3 results since the beginning of the month, on a box which on average churns one out every 15 hours or so (say 3 every 2 days - I would expect to have seen 30 or more in there).

You could probably catch the files more reliably with a shorter monitoring interval, but that would cost you more processing cycles to run the logger. Alternatively, running with networking disabled should do it - at least, BoincLogX would then save you the manual copying step: you could even use BOINC's scheduler to enable networking once a day to upload, report and recache, and you'd have the result files available for future display. You would, of course, still need to rename, unzip, and edit the files before loading them into TopCat.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 249,033,415
RAC: 33,904

RE: Would be cooler if

Message 74760 in response to message 74755

Quote:
Would be cooler if there was an easier way to retain the result-files before they are sent to the server and then deleted automatically.


Apps which use the "old checkpointing" (all official Apps except Windows) keep an intermediate version of the (uncompressed) ASCII file in their slot directory while computing. Should be easy to copy it on-the-fly.

For Apps with the new checkpointing this information is stored in the (binary format) checkpoint file. I'll contact you privately, you're probably able to write a conversion tool faster than me (at least in Java).

BM

BM

archae86
archae86
Joined: 6 Dec 05
Posts: 3,157
Credit: 7,182,514,931
RAC: 756,479

RE: You could probably

Message 74761 in response to message 74759

Quote:
You could probably catch the files more reliably with a shorter monitoring interval, but that would cost you more processing cycles to run the logger. Alternatively, running with networking disabled should do it - at least, BoincLogX would then save you the manual copying step: you could even use BOINC's scheduler to enable networking once a day to upload.


It is an odd thing that for all the coaching to users that we should not use third-party means to return results immediately for fear of overloading the servers, the project standard tool, when things are in equilibrium, seems routinely to upload them so quickly that logging tools are apt to miss them.

One other thing, from experience: If you are running more than one project, and you turn off networking for a day, when you turn it back on the low resource share project can dramatically overfetch. It seems that the threshold for "should I prefetch" correctly took resource share into account, but that the calculation for "how much am I below desired, and thus how much should I fetch?" did not.

If you only run Einstein on a host, this is of no concern. If, as I do, you run 5% SETI share on a 95% Einstein host, this effect can be rather dramatic. I've seen it in the past for server outages, but believe the same would apply to self-imposed network suspension. Fortunately, the most recent BOINC versions have fixed this (5.10.20 is OK, maybe one or two releases before that).

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,927,308,604
RAC: 769,136

RE: RE: You could

Message 74762 in response to message 74761

Quote:
Quote:
You could probably catch the files more reliably with a shorter monitoring interval, but that would cost you more processing cycles to run the logger. Alternatively, running with networking disabled should do it - at least, BoincLogX would then save you the manual copying step: you could even use BOINC's scheduler to enable networking once a day to upload.

It is an odd thing that for all the coaching to users that we should not use third-party means to return results immediately for fear of overloading the servers, the project standard tool, when things are in equilibrium, seems routinely to upload them so quickly that logging tools are apt to miss them.


Slightly different issue.

The standard imprecation might be writen "Don't use Report Results Immediately". I've not seen anyone asking for any sort of delay in uploading result files - as in, straight dump of data onto a project server disk.

What they wanted us to avoid is the update/return/report stage (choose any of BOINC's multifarious terminologies), because that involves opening a connection to the database, updating several table rows, etc. The idea is that we batch together several WU reports, and all the table updates can be done within a single database-open session.

Mind you the prospect of batching results together doesn't seem very promising, when my Celeron 400 takes about 5 days runtime for each WU....

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,927,308,604
RAC: 769,136

RE: .... Fortunately, the

Message 74763 in response to message 74761

Quote:
.... Fortunately, the most recent BOINC versions have fixed this (5.10.20 is OK, maybe one or two releases before that).


I remember the problem only too well, but I thought they fixed it somewhere in the middle of the BOINC v5.8.xx range - v5.8.16 seemed to be all right, from what I remember.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.