This post tries to answer some questions regarding the O1AS20-100 F and I searches. Feel free to post more questions.
What's the criteria to decide which search a host gets?
One result from the tuning run was that we saw a clustering of average runtime per host. 16% of hosts finished a tuning task in 8 to 10 hours on average. Another 11% took 24 to 28 hours and at the end 6% took more than 38 hours per task. This produced multiple problems with runtime estimation and fetching work for Einstein@home in general.
I analyzed the hosts participating in the tuning run, to find out what may cause this. It became clear that the cache size was a main factor but also if the CPU used Hyper Threading or not (which limits the available cache per thread). The latter is not detected by the BOINC Client. Since we can only make a decision to send work to a host based on what the host tells us we had to find a common denominator for each class of hosts. With some statistics knowledge applied to the data I came up with two classes of hosts. Fast hosts that in the tuning run showed an average runtime less than 14 hours and "not so fast" hosts that took more than that. It turned out that this was also the median of the host population at that time. With this 50:50 separation I started to look at the CPU models in each category and selected the ones that where only listed in one of the categories and then came up with a formula to decide if a CPU model that had hosts in both categories should be assigned to the fast category or not. I also tried to build categories based on cache size alone but this was not suitable since a lot of hosts that should have been fast were in fact very slow (based on the tuning data).
This resulted in the currently used criteria for the cpu model reported by the BOINC Client. The O1AS20-100F search contains all the data from the O1 run and gets assigned to the fast hosts. The O1AS20-100I search contains only a subset of the O1 data and gets assigned to all other hosts.
Does the O1AS20-100F version use AVX ?
Yes, the applications are the same for both searches. In fact they are the same as in the tuning run.
Why are there two separate user preference items for O1AS20-100F and O1AS20-100I?
This is because each search is a separate BOINC application and thus automatically gets a preference item. Those are still useful if you don't want to have Gravitational Wave tasks (I can't imagine why) but they don't change which app is assigned to a specific host. This is happening via the cpu model reported.
Will this criteria change over time?
Certainly, we're going to monitor the average runtimes per host for each search. If there is a specific cpu model that should be moved to the other category we will do that. But we will need enough tasks returned for each search to make an informed decision.
Edit 2016/06/13: We removed the CPU model criteria today, see: O1AS20-100 search now open for all CPU models.
Is there a GPU version in the works to speed up things?
The Binary Radio Pulsar search code is by far the most optimized for GPU, we get a speed-up (with GPUs compared to CPU only) well greater than 10 (depending on the individual GPU and CPU of course). For the GW search, the FFT part of the computation takes only roughly half the computing time for CPUs, so offloading this to the GPU can at most speed up the computation by a factor of 2. We are quite sure that the other parts of the computation (besides FFT) can also be ported to GPUs, but we have no plans to do that in the near future. We may change this decision later depending on science priorities, tho.
Copyright © 2024 Einstein@Home. All rights reserved.
Gravitational Wave search O1AS20-100 F and I FAQ
)
Thanks Christian !
Bill
Thanks!
)
Thanks!
Does a GPU version is in the
)
Does a GPU version is in the works to speed up things?
RE: It became clear that
)
Thanks Christian for posting this. Is HT halving the cache per thread (2 HT/core), or does it put some other limit?
I guess to fix this requires a BOINC enhancement to report the cache size ?
Since the host specs are
)
Since the host specs are dispalyed for each client on the Einstein@home webseite, it should be possible to read out the CPU model from that data and derive that CPU's cache size based on a simple specs table, shouldn't it?
Michael.
RNA World - A Distributed Supercomputer to Advance RNA Research
RE: Since the host specs
)
That would be a really HUGE table and needed to be constantly updated. Moreover as already mentioned not all CPUs contain the final CPU model string, some models can also have bugs in the string.
RE: RE: It became clear
)
I guess it depends on the chip architecture on how the L2/L3 cache is implemented (Intel sometimes calls it SmartCache). I'm currently gathering more data to see how the number of cores affects the runtime. We already get the cache size reported. Or at least we get a number from the BOINC client that it gets from the operating system that it thinks it is the cache size. So far the values seem reasonable but they also vary within a specific cpu model. So I'm not sure we can trust this value.
I also updated the first post with an answer to the GPU question.
Christian Beer wrote:Why are
)
I have an i5-2500K host which received F work until I turned off the preferences to receive CPU, and also the specific work preference for F and I.
About five days ago I turned the CPU work option back on, but (partly out of curiosity) enabled I without enabling F.
Ever since the host has been steadily requesting CPU work, and consistently has not gotten any. Usually both the message log and the most recent work request log have contained a message something like "No work is available for Gravitational Wave search O1 all-sky I". This has persisted long after the server problems, though I thought perhaps it might be reluctant to send me work because of the locality system.
So perhaps the behavior is that one's selection of "Gravitational Wave search O1 all-sky F" vs "Gravitational Wave search O1 all-sky I" on the Einstein preferences change will not influence which application type work a given host receives, but can preclude a host getting O1AS20-100 work at all if the user enables the "wrong" type for the host capabilities? If that is the case, this specific "no work is available" message is unhelpful, as apparently the real meaning is "no work of this type is permitted to be sent to this host".
In the broader scope of things, this is a minor matter, and might be more trouble to tidy up than it is worth.
RE: In the broader scope of
)
Exactly. We need to build in more custom code that is probably only useful for this special case to "combine" the preference selection.
Reading the explanations I'm
)
Reading the explanations I'm guessing that my Phenom(tm) II X4 965 is too puny to be granted "F" work because I get the "no work is available for that application" message instead of an actually useful message like "your puny CPU hah hah hah" {yes, I get that BOINC controls the messages}
So my question is really about why the project feels the need for such task segregation. I personally don't have any issues with runtime estimation or work allocation so long as I can actually get work.
If I select "F" and deselect "I" in preferences, why can't you send me "F" tasks? By setting the preferences, I agree and accept any and all risks with regard to runtime estimation and task allocation.
I sometimes run certain PrimeGrid tasks that take weeks to complete. I'm OK with a task taking 3 days or more. Why should my puny CPU be punished like this?
:)