Postprocessing of E@H data..but how?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 705393697
RAC: 588892
Topic 193383

Just curious: What happens to the returned workunits once they are validated?

E@H awards about 7 million credits each day, that should mean about ca 7e06 credits / 240 (credits per result) / 2 ~ 15,000 validated results per day.

At an average (compressed) size of more than 75 kB, this would mean roughly 1 TB of zip-compressed ASCII data each day. I guess storing the data in a DB would require roughly the same amount of space, or only slightly less.

For the whole S5R3 run this means a couple of hundreds of TB of raw data contributed by clients.

Do these results pile up in a giant storage array until the run is completed or will postprocessing begin immediately after validation? Or after all results for a certain frequency band are in?? As I said, I'm just curious.

CU
Bikeman

JLDun
JLDun
Joined: 22 Apr 06
Posts: 10
Credit: 274103
RAC: 437

Postprocessing of E@H data..but how?

"I have this question too."

peanut
peanut
Joined: 4 May 07
Posts: 162
Credit: 9644812
RAC: 0

I'm curious too. The volume

I'm curious too. The volume of data for this project is truely mind boggling. I wonder how the servers keep up with it all.

If you think of the universe as a computer cluster of untold numbers of particle "processors" each with a program to follow the rules of physics, then you really have a super duper powerful computer. How may tera bytes of storage does the universe have? I'm just rambling now.

Odysseus
Odysseus
Joined: 17 Dec 05
Posts: 372
Credit: 20464361
RAC: 5353

RE: If you think of the

Message 76477 in response to message 76476

Quote:
If you think of the universe as a computer cluster of untold numbers of particle "processors" each with a program to follow the rules of physics, then you really have a super duper powerful computer. How may tera bytes of storage does the universe have? I'm just rambling now.


ISTR reading (in Asimov?) many years ago an estimate of around 10^70 for the number of particles in the observable universe. Sum over the number of bits required to specify the quantum state of each one …

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: Just curious: What

Quote:

Just curious: What happens to the returned workunits once they are validated?

E@H awards about 7 million credits each day, that should mean about ca 7e06 credits / 240 (credits per result) / 2 ~ 15,000 validated results per day.

At an average (compressed) size of more than 75 kB, this would mean roughly 1 TB of zip-compressed ASCII data each day. I guess storing the data in a DB would require roughly the same amount of space, or only slightly less.

For the whole S5R3 run this means a couple of hundreds of TB of raw data contributed by clients.

Do these results pile up in a giant storage array until the run is completed or will postprocessing begin immediately after validation? Or after all results for a certain frequency band are in?? As I said, I'm just curious.

CU
Bikeman

Here's another question along the same line. . .

Now that the S5R3 apps include some of what used to be post-processing functions, can we now extract any meaningful data from the S5R3 units that have been completed, or do we still have to wait until all work units have completed?

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

RE: RE: If you think of

Message 76479 in response to message 76477

Quote:
Quote:
If you think of the universe as a computer cluster of untold numbers of particle "processors" each with a program to follow the rules of physics, then you really have a super duper powerful computer. How may tera bytes of storage does the universe have? I'm just rambling now.

ISTR reading (in Asimov?) many years ago an estimate of around 10^70 for the number of particles in the observable universe. Sum over the number of bits required to specify the quantum state of each one …


Yes, but some of them may be "entangled", so that a measurement on the state of one particle gives also the state of the other particle. This is the idea on which quantum computing is based (see www.qubit.org).
Tullio

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6585
Credit: 310935802
RAC: 78054

Well, I've been doing a bit

Well, I've been doing a bit of digging at the LIGO Document Control Center using the phrase "Data Analysis" entered into the Keyword field ( no entries in other fields ). I recovered many hits but in particular G060539-00.pdf, which is titled LIGO Data Analysis Systems (Data Management and Analysis) - Annual NSF Review ( LDAS ) presented on 23/10/2006. [ NB. LDAS took 22 man-years of software construction, and undergoes upgrades ie. versioning ]. Pages 3 thru 12 outline the sorts of numbers that are emitted by the detectors, and the hardware systems that handle it. The remainder deals with 'in house' analysis of both realtime and offline character not involving E@H particularly, but also see here and here. Heavy metal is probably an understatement. However page 7 indicates ~ 470 TeraBytes per year generated. This is about 3MB/second per interferometer, with the sampling rate on the differential arm ( 'gravity' ) signal @ 16384 Hz being ~2% of that - as what is also included is a raft of 'state of the IFO' time-aligned data channels like servo settings, seismometers etc... This is all continuous & un-triggered readout with GPS timestamps. So much for data production!!!

There are four major data analysis working groups - inspiral, burst, continuous wave and stochastic. These have different requirements for their searches/algorithms/waveforms due to the different astrophysical signal origins/types - though a typical approach uses matched filtering. It looks like those that are suitably permitted access/share the data via the LSC DataGrid - again not directly involving E@H.

Essentially E@H is simply looking for excess power ( significantly above noise ) in the data. Einstein @ Home is characterised as 'Off-line large scale computing power' and 'distributed data analysis system' - we're in the data analysis pipeline for LIGO - not the only ( but a major ) player. LDAS can perform 'data pre-processing, conditioning, reduction' and stores the outputs of the analyses within a relational database system.

So I'd say the results of our processing at E@H will find it's way to that repository, and a further guess would be that the timing of post-processing would depend upon a given strategy. I would expect/think that a stepping through the phase space of whichever is the search/problem in question, lends itself to the pipelined structure of this enterprise. There are some early conceptual design/requirements documents here, here, here, here, here, and here - which don't quite directly answer the questions asked in his thread but give a background flavour to the task at hand.

Cheers, Mike.

NB. Oh, and from memory, the cosmologists who do inflation/big-bang modelling ( from the 'slow roll' of an 'inflaton' field ) routinely talk of state/entropy generation to a googol ( 10^100 ) or so magnitude. If you like this is ~ the quantity of states that the universe starts with - the initial 'clock winding up' - and it's downhill from there. But heck, there must be a bucket or three assumptions in such guesstimates. :-)

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

jowr
jowr
Joined: 19 Feb 05
Posts: 55
Credit: 1947636
RAC: 0

That's interesting. Would

That's interesting.

Would it be reasonable to assume that E@H will find everything there is to find since we are processing _all_ of the data on _all_ of the sky while looking for anything poking out of the noise?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.