They are beta testing single replication at seti. Could it be possible at einstein? It could possibly duplicate our processing power
Just my personal opinion, but given the fact that E@H (well like any BOINC project I guess) is often used for overclocking calibration, I don't see this happening.
Apart from overclocked or fading hardware, we've seen computational problems because of bugs in compilers and operating systems (e.g. the Linux kernel bug that causes "signal 8" errors if you are lucky and maybe just randomly wrong results if you are unlucky) that IMHO discourage single replication. E@H is a "needle in haystack" type of search, and if you happen not to find any needles, at least you want to make a statement like "we can be 95% sure that any needle in that haystack must be below size X because otherwise we would have found it". If you are not sufficiently confident in the validity of results returned (either because of computational errors or because someone manipulated the app to run faster but possibly returing bad results), you can't make that sort of statement.
I wonder how this risk will be handled at S@H? But I guess they care less for "upper limits" (like 'If an alien civilization would use this and that technology to transmit a signal omnidirectionally, they are at least X lightyears away').
You could probably make certain OS's / builds / application version combinations cause a second issue of the WU. I think some project is already doing that or has done that, but I'm not sure which. Problem with that is that you probably don't notice in time if a certain OS / build / application is causing problems. Also, wouldn't it theoritically be possible that if two people use the same problem-causing setup it'll still validate? If that's true then there will always be a risk of the needle being overlooked
@ Bikeman: Agreed, this is a much more rigorous project in terms of evaluating the returned data from the hosts than SAH.
I don't care how long it takes to get through a science run (within reason) as long as we don't miss a wave or 'find' one that's hokey. ;-)
@ BR: That's the whole point for the multiple replication. It gives 'natural' warning indicators that are relatively easy to spot in the returning data stream and can be addressed before it turns the whole run into 'junk'.
Given how long they take even under the best conditions, you want to minimize the times you have to start over after six months in. ;-)
I wonder how this risk will be handled at S@H? But I guess they care less for "upper limits" (like 'If an alien civilization would use this and that technology to transmit a signal omnidirectionally, they are at least X lightyears away').
CU
Bikeman
Analysis of the tasks at Seti show very few completed tasks are corrupt.
It is being trialled on Seti Beta with each host earning a "relability" score. The very reliable hosts will do 90% of work unvalidated, with 10% of tasks chosen at random validated.
The tasks for the host are hidden until completed so that a user will not know which tasks are to be validated. And the replication suffix _N is also randomised.
Also now that they are using the Alpha receiver, which has seven beams each with Horizontal and Vertical polarization, means there are now 14 * more units compared to the old Line Feed receiver. As the beams have some overlap and any signal received will probably not be Horizontal or Vertical, then any signal from ET will probably be in more than one task. So presumably some cross checking can be done if a significant signal is reported in one workunit.
WCG is already using "single replication" for some of their applications (sub-projects so to speak). They keep a "reliability score" of each host, which basically determines the ratio in which these hosts get sent tasks for "single" or "validated" workunits (both are available). Even a highly "reliable" host gets sent a task that's compared to another task of the same workunit from time to time, so they occasionally verify their reliability records.
Yes, there still is a chance that if two people use the same "broken" application on Einstein@home the results of these two are compared, and so wrong results might end up as canonical. However with the increased number of compute cluster machines that only run "official" apps this case has become rather unlikely.
The techs (like me) discussed "single replication" with the scientists behind Einstein@home when we switched to "server assigned credit", which allowed us to reduce the initial replication from 3 to 2 (I think S4 was the first run to feature this). The scientists insisted on at least a quorum of 2 for verification. I don't think they feel differently about that nowadays.
WCG is already using "single replication" for some of their applications (sub-projects so to speak). They keep a "reliability score" of each host, which basically determines the ratio in which these hosts get sent tasks for "single" or "validated" workunits (both are available). Even a highly "reliable" host gets sent a task that's compared to another task of the same workunit from time to time, so they occasionally verify their reliability records.
Yes, there still is a chance that if two people use the same "broken" application on Einstein@home the results of these two are compared, and so wrong results might end up as canonical. However with the increased number of compute cluster machines that only run "official" apps this case has become rather unlikely.
The techs (like me) discussed "single replication" with the scientists behind Einstein@home when we switched to "server assigned credit", which allowed us to reduce the initial replication from 3 to 2 (I think S4 was the first run to feature this). The scientists insisted on at least a quorum of 2 for verification. I don't think they feel differently about that nowadays.
BM
I am running QMC@home where quorum is 1. I am running stock apps both in QMC and Einstein, optimized apps on SETI MB and Astropulse because stock apps are very slow on my Opteron 1210. I am also running CPDN and CPDN Beta on the same CPU, always using Linux, and also LHC when available. Its FP unit seems very reliable, I never get "compute error".
Tullio
I wonder how this risk will be handled at S@H? But I guess they care less for "upper limits" (like 'If an alien civilization would use this and that technology to transmit a signal omnidirectionally, they are at least X lightyears away').
Analysis of the tasks at Seti show very few completed tasks are corrupt.
A small percentage that is. A few bad hosts could still cause a lot of possibly localised pollution.
Quote:
It is being trialled on Seti Beta with each host earning a "relability" score. The very reliable hosts will do 90% of work unvalidated, with 10% of tasks chosen at random validated.
The tasks for the host are hidden until completed so that a user will not know which tasks are to be validated. And the replication suffix _N is also randomised.
Good stuff. That eliminates one of my 'cheats danger' arguments.
Quote:
... then any signal from ET will probably be in more than one task. So presumably some cross checking can be done if a significant signal is reported in one workunit.
Matt on s@h also mentioned that 'sanity checks' were to be included server-side for each workunit to catch cheater hosts that might try quickly returning 'incompletely' processed units.
All good for improved performance provided that the cheaters or dubious hosts can not lever the system for their own abominable or ignorant gains.
Quorum replication
)
Just my personal opinion, but given the fact that E@H (well like any BOINC project I guess) is often used for overclocking calibration, I don't see this happening.
Apart from overclocked or fading hardware, we've seen computational problems because of bugs in compilers and operating systems (e.g. the Linux kernel bug that causes "signal 8" errors if you are lucky and maybe just randomly wrong results if you are unlucky) that IMHO discourage single replication. E@H is a "needle in haystack" type of search, and if you happen not to find any needles, at least you want to make a statement like "we can be 95% sure that any needle in that haystack must be below size X because otherwise we would have found it". If you are not sufficiently confident in the validity of results returned (either because of computational errors or because someone manipulated the app to run faster but possibly returing bad results), you can't make that sort of statement.
I wonder how this risk will be handled at S@H? But I guess they care less for "upper limits" (like 'If an alien civilization would use this and that technology to transmit a signal omnidirectionally, they are at least X lightyears away').
CU
Bikeman
You could probably make
)
You could probably make certain OS's / builds / application version combinations cause a second issue of the WU. I think some project is already doing that or has done that, but I'm not sure which. Problem with that is that you probably don't notice in time if a certain OS / build / application is causing problems. Also, wouldn't it theoritically be possible that if two people use the same problem-causing setup it'll still validate? If that's true then there will always be a risk of the needle being overlooked
@ Bikeman: Agreed, this is a
)
@ Bikeman: Agreed, this is a much more rigorous project in terms of evaluating the returned data from the hosts than SAH.
I don't care how long it takes to get through a science run (within reason) as long as we don't miss a wave or 'find' one that's hokey. ;-)
@ BR: That's the whole point for the multiple replication. It gives 'natural' warning indicators that are relatively easy to spot in the returning data stream and can be addressed before it turns the whole run into 'junk'.
Given how long they take even under the best conditions, you want to minimize the times you have to start over after six months in. ;-)
Alinator
RE: I wonder how this risk
)
Analysis of the tasks at Seti show very few completed tasks are corrupt.
It is being trialled on Seti Beta with each host earning a "relability" score. The very reliable hosts will do 90% of work unvalidated, with 10% of tasks chosen at random validated.
The tasks for the host are hidden until completed so that a user will not know which tasks are to be validated. And the replication suffix _N is also randomised.
Also now that they are using the Alpha receiver, which has seven beams each with Horizontal and Vertical polarization, means there are now 14 * more units compared to the old Line Feed receiver. As the beams have some overlap and any signal received will probably not be Horizontal or Vertical, then any signal from ET will probably be in more than one task. So presumably some cross checking can be done if a significant signal is reported in one workunit.
WCG is already using "single
)
WCG is already using "single replication" for some of their applications (sub-projects so to speak). They keep a "reliability score" of each host, which basically determines the ratio in which these hosts get sent tasks for "single" or "validated" workunits (both are available). Even a highly "reliable" host gets sent a task that's compared to another task of the same workunit from time to time, so they occasionally verify their reliability records.
Yes, there still is a chance that if two people use the same "broken" application on Einstein@home the results of these two are compared, and so wrong results might end up as canonical. However with the increased number of compute cluster machines that only run "official" apps this case has become rather unlikely.
The techs (like me) discussed "single replication" with the scientists behind Einstein@home when we switched to "server assigned credit", which allowed us to reduce the initial replication from 3 to 2 (I think S4 was the first run to feature this). The scientists insisted on at least a quorum of 2 for verification. I don't think they feel differently about that nowadays.
BM
BM
RE: WCG is already using
)
I am running QMC@home where quorum is 1. I am running stock apps both in QMC and Einstein, optimized apps on SETI MB and Astropulse because stock apps are very slow on my Opteron 1210. I am also running CPDN and CPDN Beta on the same CPU, always using Linux, and also LHC when available. Its FP unit seems very reliable, I never get "compute error".
Tullio
RE: RE: I wonder how this
)
A small percentage that is. A few bad hosts could still cause a lot of possibly localised pollution.
Good stuff. That eliminates one of my 'cheats danger' arguments.
Matt on s@h also mentioned that 'sanity checks' were to be included server-side for each workunit to catch cheater hosts that might try quickly returning 'incompletely' processed units.
All good for improved performance provided that the cheaters or dubious hosts can not lever the system for their own abominable or ignorant gains.
Happy crunching,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)