Stimulated by breaking the 2M barrier, I looked at the top participants here a few days ago and did a survey of the general relationship between credit/RAC vs 'population' at E@H. So given our penchant for graphics, below is a screen shot of an Excel sheet :
Total credit top row, RAC on the bottom. First two columns of figures correlate credit/RAC exceeding with the number that do. For instance there are 24 users who exceed 10M in total credit, and 104 users that exceed 10K in RAC. Third and fourth columns are the logs of the first two respectively because .....
The first column of plots show that the curves hug the axes closely meaning that there is/are ( very ) inverse relationships going on. To be more helpful then are log/log plots on the rightmost column, which I will call linear on a good day with a following wind.
Skip to the punchline :
numberOfUsers(over total credit x) ~ x^(-1.25)
numberOfUsers(over RAC x) ~ x^(-1.05)
That is, both are about the same and decrease only slightly faster than 1/x.
Conclusion? E@H, just like nature, abounds in power laws. You'd get much the same behaviour with earthquake strength vs frequency ( at a given location ), or number of lightning strikes per year per square mile ( most never get hit, but some quite a lot ), bit transfer error string length vs frequency of said error on an DSL line etc..... :-)
Cheers, Mike.
( edit ) [ And, of course, a glance at the detailed stats ( say, first 100 of either ) will show the usual suspects recur. That is, those with higher RAC tend to have higher total credit ..... but that's hardly surprising! ]
( edit ) Oh, and it shows that at most ~2500 ( 1765 + 617 ) E@H participants have either total credit > 1M and/or RAC > 1K. Since there are rather more than that 'active' by a factor of ~ 40x, then about > 97% are under both 1M & 1K. E@H truly belongs to the masses. :-)
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Copyright © 2024 Einstein@Home. All rights reserved.
Here's a casual observation
)
Very cool indeed!
Would it be worthwhile to exclude the users that represent large clusters? They play in a different league anyway, don't they?
CU
Bikeman
RE: Very cool
)
Yeah, that RAC of 2500+ thousands x one user with the total credit of 200+ millions x one user is the AEI and Merlin grouping(s) swapping around ..... :-)
As you asked .... I've knocked off the top 3 bands of each .....
which is not radically different, but we've lost a couple of a 'knees'. It's easier to see the inverse on the plain plots ( hyperbolae, guys & gals ), and the log/log plots become more linear ( a truer power law ) - did you expect that? It's a good guess because these leviathans/monuments do, as you say, come from a totally different stable/stud. They are dedicated and publicly funded, not stochastically drawn from the rabble/hoi-polloi .... :-) :-)
I love E@H .... cheeeesy grin .... there are so many numbers, numbers, numbers & numbers
Cheers, Mike.
( edit ) That is, AEI has less total credit but more RAC than Merlin & the contrapositive.
( edit ) I've just noticed that dear Prof Bruce is fading out with a RAC of 0.24 after a 'career' of 158,541,694 ....
( edit - oh, shaddup Mike ....) Knees, knees, knees ..... reminds me of the wide spectrum log/log plot of cosmic ray energy vs. intensity which has a similiar 'kink' behaviour to the first set of rightmost plots below. The change in the 'line' also heralds ( last I heard ) a different mechanism of production at source for the ultra-high energy end. Amazing how far phenomenology can get you.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Interesting. At the risk
)
Interesting.
At the risk of thread drift, does anyone have an idea for the source of the divergence in the "new host" count from the "new user" count shown here
It seems that new users/day has been pretty stable near 200 for the last year, but that new hosts per day took a stride up in mid-March, and, with glitches, has stayed much higher than before right up through now.
Even though with late May we are entering what seemed a clear summer doldrum period last year, the aggregate trends look different this year.
Just to tack back toward Mike's thread--perhaps our newer signup population is much plumper at the high-RAC end than historically, so that it is even more helpful to total throughput than it looks. I know, I know, new machines are fast, but I'm wondering if these may also have higher participation and retention rates.
RE: Interesting. At the
)
Drift away, it's a legitimate strategy .... :-)
I had glanced at that and I don't have any strong idea, yet.
As I mentioned in Milestones III, entry into the top 'x' participants takes ~ thrice more RAC than about 12 months ago. So certainly the speed racers are going faster. IF you let that rock down the line via some semi-stable ( over time ) power-law-ish linkage AND IF one assumes project cross-parity is ( like monetary inflation models ) a good breadbasket type 'true' standard ( because of the E@H credit/hour adjustments to it ) THEN we are all lifting our game as a population ( doh ..... see the TFlops ). This logic isn't bullet proof of course but demonstrates gross herd dynamics - some law of large numbers which leads to 'normal' type distributions or mildly skewed ones. It doesn't label the trends in the 'movers' though .... but it has self-consistency. :-)
NOW as to whether new vs old, participate/retain, higher/lower you wont get that out without deeper analysis. You'd want a 'cohort' approach - see who signs up when, and then 'lifetime' them. If newer cohorts fade/decline slower than the older ones ( retention ) AND their contemporaneous ( on the day ) RAC rankings are also higher ( RAC richness ), as each cohort respectively 'ages', then your trends are confirmed.
Surely the base data for this must be somewhere? Hmmmmmmm ...... :-)
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
RE: Interesting. At the
)
I've speculated before that this might be due to the way science grid hosts execute E@H. I would think that they run science tasks in a batch like fashion, and that maybe E@H is not so much an ever-present background task but gets submitted to the grid hosts as "filler" batch jobs. So when no other work is scheduled, the hosts get E@H jobs.
This is speculation, but if this works this way, maybe the grid host will execute a wrapper that attaches the host, executes the E@H job, and then detaches the host again.
Detaching and reattaching the same hosts over and over seem to be the only reasonably explanation I can think of (short of an error in the statistics gathering code) why there is such a high ratio of new host/new user lately.
Any other idea?
CU
Bikeman
RE: ... which is not
)
Sure I did ;-)
Exactly! I've intercepted most of the results sent back by my "Little Farm" at home to E@H and now have approx. 3400 result files or 34 Mio data points (10k measuremts per result file) to play with, visualize, filter, sort, .... big fun. Even with very crude statistical queries it's easy to see that the 60Hz harmonics (EM interference) are special, as well as some violin modes of the suspension systems in the interferometers as documented in the S4 results.
CU
Bikeman
Thanks to that complexity
)
Thanks to that complexity course I was required to take, when I see a power law relationship, I immediately think "fractal." What's harder is deducing the underlying system property that is causing that relationship. Money to buy computers? Willingness to put those computers to work? Both?
I think you should integrate
)
I think you should integrate these graphs (Am I right?) to see how much work these first hosts do. The work they do is an area under the graph. So, then we can talk about the distribution of work and the value that top 2 users give us against others.