That's all well and good sir, but this project awards far more credit for the same CPU time *or* GPU time than SETI and Rosetta (of which I also actively participate).
I'm not being critical of this, but I do wonder if the competitive spirit draws some here instead of some of those other projects because they perceive they are doing more. I'll admit, I check out how I am doing in comparison to others on Boincstats, but I would be doing this anyway. But the numbers do help me benchmark my machines and help me allocate resources to these projects more efficiently.
One thing you are forgetting is that more credits often means more users and that means a much bigger strain on the Projects hardware which then causes their costs to go up, often exponentially! Seti is already down 2 to 3 days per week now because they can't afford to keep running 24/7/365 anymore, how much would a hardware failure affect them? Probably catastrophically I would imagine.
A while back the the Boinc programmers tried to implement a standard credit awarding system for all projects, the problem is it didn't fit every projects needs. SOME projects sell the results of our crunching for them, getting lots of users can increase that cash flow for them, so they wanted to be giving out lots of credits to attract lots of users. Other projects do not sell our results, they just publish them every once in a while, they don't much care about the numbers of users, just that the units get crunched.
Even Seti at one point didn't have the resources to even look at the results we were sending back to them, so they were storing them on a shelf for 'some day' when they could review them. It got so bad for them that they were even resending out the SAME units that had already been crunched just so people wouldn't leave the project. The 'new' results were just discarded as they already had results from those workunits sitting on the shelf.
In short the awarding of credits can be a VERY VERY touchy subject and 'fighting words' for some people. What it is has come down to is each project doing it's own thing for it's own reasons and it seems to be working just fine, for the most part. Given that credits should not be compared between this or that project, but only within a single project and then only to see how your pc's are doing today as compared to last week or last month. They can also be used to compare how my pc is doing compared to your pc, but there are a TON of variables that makes any direct comparison nearly impossible without some kind of context. Like are mine or yours running 24/7/265 and at 100% 100% of the time? Are either of us running 2 or even 3 or more gpu units at the same time? Are we using 100% of the cpu cores in our machines, or are one or both of us using less? Are either of us overclocking, or underclocking?
Your comment is kind of getting at something that got me wondering about this. I recently started running an array of Raspberry Pi2s wondering if I could match the compute power of a PC more economically (or at least scale my processing power under the *ahem* household budget--not get caught :-) ).
I quickly realized, however, that even if I could, there was no real comparison for the same MIPS/FLOPS because of optimizations like SSE2, and that the different processors were performing in completely different classes with entirely different credit structures--making comparisons void, or at least really tough to figure out.
I also started using a SETI client built from source for the Raspberry Pi2 which did perform in the same class with all of the other processors--but SETI for Mac (for example) has clients optimized for SSE4.1, SSSE3, and AVX (if your processor supports those). It's been really fascinating comparing the different amounts of credit granted versus processing time for each of those and trying to guess how much those optimizations play.
It's also fascinating to see how different processors--even within the same operating system--compare in and across these projects. Part of the reason I care is that I was trying to project what components would help the most when I recently built my i7 system. I was trying to get an accurate estimate of how it would perform with this or that processor and graphics card in each of the different projects.
Sure. I get what you are saying, but you also seem to be suggesting again that they could modulate the credit awards to keep participation levels where they want them (supply and demand). I'm not saying that's what's happening, but it could be a lever for many users on a lot of these projects.
As for what I am doing with them, look at my other post today in this thread. One of the things I was doing was trying to anticipate how different components would work in these projects to help guide me in building a machine. That my wife thinks is for video editing. (just kidding)
Even Seti at one point didn't have the resources to even look at the results we were sending back to them, so they were storing them on a shelf for 'some day' when they could review them. It got so bad for them that they were even resending out the SAME units that had already been crunched just so people wouldn't leave the project. The 'new' results were just discarded as they already had results from those workunits sitting on the shelf.
Are you absolutely certain of the exact truth of the specifics of this?
I'm not asserting that I am any more correct,or even that you are wrong, but... you level an UGLY accusation SETI's direction. Why is it so ugly? It implies that the lab would knowingly allow tens of thousands of people to waste millions and millions of dollars on electricity and throw-away results in order to keep participants.
What I DO know happened was a refinement was made to the programs (adding auto-correlation) and a lot of work was re-run.
As far as I know your first observation is still true: Results are being shelved. The "Nitpicker" machine (donated largely by one member of the GPU Users Group) was not up-to the task once it was tried. The computer that would be capable of looking at the data on any sort of reasonable schedule has not been asked-for; so the data we send back for one of two programs IS sitting on the shelf waiting for analysis. Either that, or there is a secret, stealth computer somewhere running a program nobody seems to know exists.
But... as I said to begin-with, I'm not claiming information that contradicts you. What you are reporting could be "in addition-to" what little I know.
I just want to be *sure* that you have your facts precisely correct before I send a letter to Berkeley, the California legislature, and the EPA about the money and the gigawatts of power being wasted by SETI for results that are immediately discarded.
Even Seti at one point didn't have the resources to even look at the results we were sending back to them, so they were storing them on a shelf for 'some day' when they could review them. It got so bad for them that they were even resending out the SAME units that had already been crunched just so people wouldn't leave the project. The 'new' results were just discarded as they already had results from those workunits sitting on the shelf.
Are you absolutely certain of the exact truth of the specifics of this?
I'm not asserting that I am any more correct,or even that you are wrong, but... you level an UGLY accusation SETI's direction. Why is it so ugly? It implies that the lab would knowingly allow tens of thousands of people to waste millions and millions of dollars on electricity and throw-away results in order to keep participants.
What I DO know happened was a refinement was made to the programs (adding auto-correlation) and a lot of work was re-run.
As far as I know your first observation is still true: Results are being shelved. The "Nitpicker" machine (donated largely by one member of the GPU Users Group) was not up-to the task once it was tried. The computer that would be capable of looking at the data on any sort of reasonable schedule has not been asked-for; so the data we send back for one of two programs IS sitting on the shelf waiting for analysis. Either that, or there is a secret, stealth computer somewhere running a program nobody seems to know exists.
But... as I said to begin-with, I'm not claiming information that contradicts you. What you are reporting could be "in addition-to" what little I know.
I just want to be *sure* that you have your facts precisely correct before I send a letter to Berkeley, the California legislature, and the EPA about the money and the gigawatts of power being wasted by SETI for results that are immediately discarded.
I left Seti in March of 2007 and haven't been back since, what I said was true then, but it may or may not be true now, that's why I said "at some point", perhaps I should have been more clear as to the time frame. I'm sorry if that caused any confusion.
Even back in the pre Boinc days Seti would reissue work just to keep the users caches full, with no intention whatsoever of even looking at the reissued workunits. What would be the point of relooking at results you already have looked at once, twice or even 20 times? To be honest though back in those days there were not lot of other options if one wanted to crunch.
Is this different from the validation method of having two or more machines work the same unit and compare the results? It seems like this would be a way to add more rigor to the process.
Also, why would they need in-house processing power to do what nitpicker did, or the analysis of the results? You have a huge pool of people dying to contribute their CPU & GPU cycles. If the process is amenable to distributed computing, they could write another app akin to how they made Astropulse, or how this project has numerous applications to analyze different data types. Even if this requires a live, synchronized data connection, I'll bet there are users with low-latency high-bandwidth connections who would almost literally kill for the opportunity to run a special project like this. Or if that wouldn't work, then there are plenty of people willing and capable of absorbing very, very large data sets and processing them. Heck, I'd even chip in on a fundraising campaign to rent them out some Amazon cloud computing time.
I don't mean to be trite about this either, because I suspect another major constraint is skilled programmer time to make any of this happen, along with ongoing efforts to optimize the existing database and current suite of applications. But the whole point of these projects was to minimize the need for expensive hardware and use contributions from all of us to help do the heavy lifting.
If the process is amenable to distributed computing
Aye, there's the rub. Distributed computing as practiced in the BOINC empire requires that the problem of interest be divided up into discrete chunks with each chunk having an extraordinary ratio of internal computing to external communication.
Lots (most) of interesting problems don't naturally partition that way.
That's why I included the provisos of a select few who are willing to process very, very large chunks in one go. There are something like over 1.5M (active?) users on SETI, and some of the top players have some really awesome hardware, and it's only getting better as new stuff comes out (like the NVIDIA Pascal architecture coming out, allegedly 10x better than Maxwell).
Hmm. I also wonder if SETI (and this project) could automate even more of this processing if it had access to something like Google's DeepMind. Teach it what a few signal candidates would look like and all of the possible permutations (like Doppler shifts, gravitational effects, interference, harmonics, etc.) Maybe it would at least help refine the way we are using all of this power to search more efficiently and only turn in more manageable results.
Again, and as usual and always, I could be wrong, but I think the scientific rigor of things sort-of demands that the "final analysis" be performed in-house on the raw data.
Bernd or someone could tell us for sure, but I believe that we're finding signals in the data and we're really finding them; yet, I think that these are re-run on the Atlas supercluster before anything is published.
One could hardly blame them for that caution.
We're the sand-sifters who identify that there is "something" here; then the archaeologist looks at it to determine what it is and what dynasty it is from.
Imagine the scientific rigor that would have to be applied to something as completely outlandish (no pun intended) as discovering an "intelligent alien signal" down in the noise.
In any case, I would find it disturbing to discover that any project would be so... unethical as to run data for fun to "keep" their volunteers knowing the whole time that there is zero scientific benefit to the computing they are asking others to do.
SETI is an incredible long-shot, anyway. Maybe there isn't that much difference, practically speaking, to doing it "for fun" and doing it "for science," but you would HOPE that your electric bill wasn't being justified by some project trying to keep you on the hook. Just re-crunching data and throwing the results away would be a criminal waste of resources and time.
I am a global warming denier, and not ashamed of it. But the idea of burning fuel to watch it burn is waste and I really despise waste. On the scale we're talking about... WOW, would that ever be a sad, sad state of affairs.
I don't remember exactly what I read or where or who wrote it, but I know I have heard Dr. Ransom say it at the end of one of his pulsar talks and I think it was Seth Shostak that said something about it in one of ATA talks: The push is for real-time analysis because the scale and nature of the data defies recording and distributing it.
What I think does not matter. What I vaguely remember may be worse than remembering nothing.
Therefore; I don't assert that anything I've said is correct.
And it is 5:08am New Year's Day morning and I've been up 20 hours AND I've been celebrating, so... does that make me legally incompetent to think about things like this?
In any case, I would find it disturbing to discover that any project would be so... unethical as to run data for fun to "keep" their volunteers knowing the whole time that there is zero scientific benefit to the computing they are asking others to do.
SETI is an incredible long-shot, anyway. Maybe there isn't that much difference, practically speaking, to doing it "for fun" and doing it "for science," but you would HOPE that your electric bill wasn't being justified by some project trying to keep you on the hook. Just re-crunching data and throwing the results away would be a criminal waste of resources and time.
Seti didn't admit it, the first time, until they were already sending out new data, the second time some cruncher, that just happened to be logging unit names, discovered it and they were forced to admit they were doing it again. The second time they said it was because 'the data was delayed coming from the receiver and they wanted to ensure people could keep crunching'. Then when it came out that they were ALSO just putting it on the shelf for 'future' analysis, lots of people left and I for one never went back. Way too many other projects in the sea to only fish for one thing!
RE: That's all well and
)
One thing you are forgetting is that more credits often means more users and that means a much bigger strain on the Projects hardware which then causes their costs to go up, often exponentially! Seti is already down 2 to 3 days per week now because they can't afford to keep running 24/7/365 anymore, how much would a hardware failure affect them? Probably catastrophically I would imagine.
A while back the the Boinc programmers tried to implement a standard credit awarding system for all projects, the problem is it didn't fit every projects needs. SOME projects sell the results of our crunching for them, getting lots of users can increase that cash flow for them, so they wanted to be giving out lots of credits to attract lots of users. Other projects do not sell our results, they just publish them every once in a while, they don't much care about the numbers of users, just that the units get crunched.
Even Seti at one point didn't have the resources to even look at the results we were sending back to them, so they were storing them on a shelf for 'some day' when they could review them. It got so bad for them that they were even resending out the SAME units that had already been crunched just so people wouldn't leave the project. The 'new' results were just discarded as they already had results from those workunits sitting on the shelf.
In short the awarding of credits can be a VERY VERY touchy subject and 'fighting words' for some people. What it is has come down to is each project doing it's own thing for it's own reasons and it seems to be working just fine, for the most part. Given that credits should not be compared between this or that project, but only within a single project and then only to see how your pc's are doing today as compared to last week or last month. They can also be used to compare how my pc is doing compared to your pc, but there are a TON of variables that makes any direct comparison nearly impossible without some kind of context. Like are mine or yours running 24/7/265 and at 100% 100% of the time? Are either of us running 2 or even 3 or more gpu units at the same time? Are we using 100% of the cpu cores in our machines, or are one or both of us using less? Are either of us overclocking, or underclocking?
Your comment is kind of
)
Your comment is kind of getting at something that got me wondering about this. I recently started running an array of Raspberry Pi2s wondering if I could match the compute power of a PC more economically (or at least scale my processing power under the *ahem* household budget--not get caught :-) ).
I quickly realized, however, that even if I could, there was no real comparison for the same MIPS/FLOPS because of optimizations like SSE2, and that the different processors were performing in completely different classes with entirely different credit structures--making comparisons void, or at least really tough to figure out.
I also started using a SETI client built from source for the Raspberry Pi2 which did perform in the same class with all of the other processors--but SETI for Mac (for example) has clients optimized for SSE4.1, SSSE3, and AVX (if your processor supports those). It's been really fascinating comparing the different amounts of credit granted versus processing time for each of those and trying to guess how much those optimizations play.
It's also fascinating to see how different processors--even within the same operating system--compare in and across these projects. Part of the reason I care is that I was trying to project what components would help the most when I recently built my i7 system. I was trying to get an accurate estimate of how it would perform with this or that processor and graphics card in each of the different projects.
Sure. I get what you are
)
Sure. I get what you are saying, but you also seem to be suggesting again that they could modulate the credit awards to keep participation levels where they want them (supply and demand). I'm not saying that's what's happening, but it could be a lever for many users on a lot of these projects.
As for what I am doing with them, look at my other post today in this thread. One of the things I was doing was trying to anticipate how different components would work in these projects to help guide me in building a machine. That my wife thinks is for video editing. (just kidding)
RE: Even Seti at one
)
Are you absolutely certain of the exact truth of the specifics of this?
I'm not asserting that I am any more correct,or even that you are wrong, but... you level an UGLY accusation SETI's direction. Why is it so ugly? It implies that the lab would knowingly allow tens of thousands of people to waste millions and millions of dollars on electricity and throw-away results in order to keep participants.
What I DO know happened was a refinement was made to the programs (adding auto-correlation) and a lot of work was re-run.
As far as I know your first observation is still true: Results are being shelved. The "Nitpicker" machine (donated largely by one member of the GPU Users Group) was not up-to the task once it was tried. The computer that would be capable of looking at the data on any sort of reasonable schedule has not been asked-for; so the data we send back for one of two programs IS sitting on the shelf waiting for analysis. Either that, or there is a secret, stealth computer somewhere running a program nobody seems to know exists.
But... as I said to begin-with, I'm not claiming information that contradicts you. What you are reporting could be "in addition-to" what little I know.
I just want to be *sure* that you have your facts precisely correct before I send a letter to Berkeley, the California legislature, and the EPA about the money and the gigawatts of power being wasted by SETI for results that are immediately discarded.
RE: RE: Even Seti at
)
I left Seti in March of 2007 and haven't been back since, what I said was true then, but it may or may not be true now, that's why I said "at some point", perhaps I should have been more clear as to the time frame. I'm sorry if that caused any confusion.
Even back in the pre Boinc days Seti would reissue work just to keep the users caches full, with no intention whatsoever of even looking at the reissued workunits. What would be the point of relooking at results you already have looked at once, twice or even 20 times? To be honest though back in those days there were not lot of other options if one wanted to crunch.
Is this different from the
)
Is this different from the validation method of having two or more machines work the same unit and compare the results? It seems like this would be a way to add more rigor to the process.
Also, why would they need in-house processing power to do what nitpicker did, or the analysis of the results? You have a huge pool of people dying to contribute their CPU & GPU cycles. If the process is amenable to distributed computing, they could write another app akin to how they made Astropulse, or how this project has numerous applications to analyze different data types. Even if this requires a live, synchronized data connection, I'll bet there are users with low-latency high-bandwidth connections who would almost literally kill for the opportunity to run a special project like this. Or if that wouldn't work, then there are plenty of people willing and capable of absorbing very, very large data sets and processing them. Heck, I'd even chip in on a fundraising campaign to rent them out some Amazon cloud computing time.
I don't mean to be trite about this either, because I suspect another major constraint is skilled programmer time to make any of this happen, along with ongoing efforts to optimize the existing database and current suite of applications. But the whole point of these projects was to minimize the need for expensive hardware and use contributions from all of us to help do the heavy lifting.
RE: If the process is
)
Aye, there's the rub. Distributed computing as practiced in the BOINC empire requires that the problem of interest be divided up into discrete chunks with each chunk having an extraordinary ratio of internal computing to external communication.
Lots (most) of interesting problems don't naturally partition that way.
That's why I included the
)
That's why I included the provisos of a select few who are willing to process very, very large chunks in one go. There are something like over 1.5M (active?) users on SETI, and some of the top players have some really awesome hardware, and it's only getting better as new stuff comes out (like the NVIDIA Pascal architecture coming out, allegedly 10x better than Maxwell).
Hmm. I also wonder if SETI (and this project) could automate even more of this processing if it had access to something like Google's DeepMind. Teach it what a few signal candidates would look like and all of the possible permutations (like Doppler shifts, gravitational effects, interference, harmonics, etc.) Maybe it would at least help refine the way we are using all of this power to search more efficiently and only turn in more manageable results.
Again, and as usual and
)
Again, and as usual and always, I could be wrong, but I think the scientific rigor of things sort-of demands that the "final analysis" be performed in-house on the raw data.
Bernd or someone could tell us for sure, but I believe that we're finding signals in the data and we're really finding them; yet, I think that these are re-run on the Atlas supercluster before anything is published.
One could hardly blame them for that caution.
We're the sand-sifters who identify that there is "something" here; then the archaeologist looks at it to determine what it is and what dynasty it is from.
Imagine the scientific rigor that would have to be applied to something as completely outlandish (no pun intended) as discovering an "intelligent alien signal" down in the noise.
In any case, I would find it disturbing to discover that any project would be so... unethical as to run data for fun to "keep" their volunteers knowing the whole time that there is zero scientific benefit to the computing they are asking others to do.
SETI is an incredible long-shot, anyway. Maybe there isn't that much difference, practically speaking, to doing it "for fun" and doing it "for science," but you would HOPE that your electric bill wasn't being justified by some project trying to keep you on the hook. Just re-crunching data and throwing the results away would be a criminal waste of resources and time.
I am a global warming denier, and not ashamed of it. But the idea of burning fuel to watch it burn is waste and I really despise waste. On the scale we're talking about... WOW, would that ever be a sad, sad state of affairs.
I don't remember exactly what I read or where or who wrote it, but I know I have heard Dr. Ransom say it at the end of one of his pulsar talks and I think it was Seth Shostak that said something about it in one of ATA talks: The push is for real-time analysis because the scale and nature of the data defies recording and distributing it.
What I think does not matter. What I vaguely remember may be worse than remembering nothing.
Therefore; I don't assert that anything I've said is correct.
And it is 5:08am New Year's Day morning and I've been up 20 hours AND I've been celebrating, so... does that make me legally incompetent to think about things like this?
Probably. G'nite everybody!
RE: In any case, I would
)
Seti didn't admit it, the first time, until they were already sending out new data, the second time some cruncher, that just happened to be logging unit names, discovered it and they were forced to admit they were doing it again. The second time they said it was because 'the data was delayed coming from the receiver and they wanted to ensure people could keep crunching'. Then when it came out that they were ALSO just putting it on the shelf for 'future' analysis, lots of people left and I for one never went back. Way too many other projects in the sea to only fish for one thing!