On CUDAs and Statistical Confidence

Gerry Rough
Gerry Rough
Joined: 1 Mar 05
Posts: 102
Credit: 1847066
RAC: 0
Topic 194232

I was wondering what will happen to the stats folks when the CUDA apps start rolling out. As those who create WUs know, BOINC projects do not exhaustively run every concievable variation of what needs to get done with a given science app: they run enough WUs to be pretty certain that their results are accurate. This is known in the statistical universe as the confidence number: at 95% confidence, for example, we are 95% certain that our statistical result is in fact accurate, and not an outlyer

Hence, the question that comes with at least a fairly dramatic increase in E@h processing power via CUDA: Will we see shorter runs of WUs for a given science application (such as S5R4, etc.), or will E@h statisticians decide to simply roll out more WUs to crunch, thereby increasing the confidence level of the results of the current E@h application? Let's face it, increasing the confidence number can dramatically increase the number of WUs on a given science app just to get a few more percentage points of confidence. Any thoughts?


(Click for detailed stats)

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 714998597
RAC: 934292

On CUDAs and Statistical Confidence

The physicists can correct me if I'm wrong, but I think the nature of a E@H result is a bit more complex than that. AFAIK, it's something like this:

If there is a continuous gravitational wave with

a) amplitude greater some measure h
b) frequency in the range [f_min,f_max]
c) spin-down_rate (first frequency derivative over time] in the range [fdot_min,fdot_max]

then the confidence level of finding the signal is greater than C %. (An additional search parameter is the location of the source in the sky, but we are already performing an all-sky search here so it's not a variable to worry about atm).

Now, instead of trying to increase the confidence level C to levels far higher than (say 95%), you might want to

a) lower the amplitude for which you have a (say) 95% confidence level. That's usually called the sensitivity of the search. Needless to say, you can't increase sensitivity infinitely just by throwing more PC into the search effort, it's bounded by the sensitivity of the detectors.

b) increase the frequency range of the search. The higher the frequency, the more expensive the search is. That's the reason S5R5 was limited to signals at less than 1kHz.

c) increase the range of spin-down values that are considered.

b) and c) are bounded by what physicists consider realistic for a source, e.g. certain frequencies and spin-down (spin-up) rates are just "unphysical" because nobody can imagine a pulsar (our targeted GW source) to spin that way.

AFAIK, this "realistic" segment of parameter space hasn't been exhausted yet for S5 data at the highest possible sensitivity, so there is still room to expand the search to yet uncovered, but reasonable areas of "parameter space".

If, after that, you still have lots of CPU cycles at your disposal, you can always look for different sorts of signals. The type of signal covered in E@H now is a signal with a frequency that is decaying (or increasing) at a constant rate, so it can be described by a base frequency and it's first derivative over time. Maybe it makes sense to also take the second derivative into account. This would add a new dimension to the parameter space and increase the complexity of the search rather heavily.

Bikeman

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

Using history as a guide,

Using history as a guide, albeit for a different project, as processing speed hits a new plateau we do see increases in the sensitivity and depth of searching. When I first did SaH a typical task took about 30 hours. Now we are doing a search that is at least 16 times (if I remember the doublings correctly) more sensitive and the tasks complete in far less time on typical systems exclusive of CUDA.

Perhaps we shall see the same thing here?

Maybe even two levels of search? On the CPUs do a screening looking for the -9 class tasks or ones that are not "interesting" and then sample the remainder for increased processing using the "faster" and more productive searching.

mikey
mikey
Joined: 22 Jan 05
Posts: 12663
Credit: 1839061411
RAC: 4247

RE: Using history as a

Message 90865 in response to message 90864

Quote:

Using history as a guide, albeit for a different project, as processing speed hits a new plateau we do see increases in the sensitivity and depth of searching. When I first did SaH a typical task took about 30 hours. Now we are doing a search that is at least 16 times (if I remember the doublings correctly) more sensitive and the tasks complete in far less time on typical systems exclusive of CUDA.

Perhaps we shall see the same thing here?

Maybe even two levels of search? On the CPUs do a screening looking for the -9 class tasks or ones that are not "interesting" and then sample the remainder for increased processing using the "faster" and more productive searching.

Or even visa-versa, since gpu's have issues with fine detailed crunching, maybe the gpu's will crunch the units first and then the cpu's will crunch the units that need the fine detailed checking. Who knows, gpu's have come a long way since the early days and with better and faster ones coming out almost daily the sooner some of these other projects can get on board the better. I believe eventually all projects will have at least some level of gpu crunching!

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 714998597
RAC: 934292

Makes me wonder if it wouln't

Makes me wonder if it wouln't be better to split projects into CPU and GPU subprojects anyway, so that a) workunits could be optimized for the respective platform and b) credit-aware participants without supported GPUs would not have the feeling of competing with others from a different league.

CU
Bikeman

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

RE: Makes me wonder if it

Message 90867 in response to message 90866

Quote:
Makes me wonder if it wouln't be better to split projects into CPU and GPU subprojects anyway, so that a) workunits could be optimized for the respective platform and b) credit-aware participants without supported GPUs would not have the feeling of competing with others from a different league.


This has always been a problem in that the guy that owns a business and can and does put all his business PCs to work on his behalf will always outclass those of us that have to pay for this out of our own pockets ...

That said, as one of the "elite" types that is fanatical about doing work ... well ... how can I complain.

But this has been an issue in that the whole credit thing was never really thought out well ... and the developers have spent the last umpteen years studiously avoiding thinking about credit and the problems therein as it is not important ...

And maybe it is not, but it is also probably the biggest hot button issue ... and the source of a huge number of questions from new participants ... like "why can't the system get the numbers right?" or "Why did I get awarded less than I claimed?" But, maybe if we ignore it long enough, the problem will go away ...

mikey
mikey
Joined: 22 Jan 05
Posts: 12663
Credit: 1839061411
RAC: 4247

RE: RE: Makes me wonder

Message 90868 in response to message 90867

Quote:
Quote:
Makes me wonder if it wouln't be better to split projects into CPU and GPU subprojects anyway, so that a) workunits could be optimized for the respective platform and b) credit-aware participants without supported GPUs would not have the feeling of competing with others from a different league.

This has always been a problem in that the guy that owns a business and can and does put all his business PCs to work on his behalf will always outclass those of us that have to pay for this out of our own pockets ...

That said, as one of the "elite" types that is fanatical about doing work ... well ... how can I complain.

But this has been an issue in that the whole credit thing was never really thought out well ... and the developers have spent the last umpteen years studiously avoiding thinking about credit and the problems therein as it is not important ...

And maybe it is not, but it is also probably the biggest hot button issue ... and the source of a huge number of questions from new participants ... like "why can't the system get the numbers right?" or "Why did I get awarded less than I claimed?" But, maybe if we ignore it long enough, the problem will go away ...

You know as I was reading this my mind went to a tiered setup where someone with x to y machines is in the same group as all other people with the same numbers of computers. But then I went to the problem of how far the spread between x and y is. For example x is one computer and y is 10, the person with x will never be able to compete with the people with y now matter how much they try. The same thing then extrapolates into the upper tiers, or those with many more computers. Or even the person with only one pc but dual quad cores in it! I am not sure there is an easy answer!

Paul D. Buck
Paul D. Buck
Joined: 17 Jan 05
Posts: 754
Credit: 5385205
RAC: 0

RE: You know as I was

Message 90869 in response to message 90868

Quote:
You know as I was reading this my mind went to a tiered setup where someone with x to y machines is in the same group as all other people with the same numbers of computers. But then I went to the problem of how far the spread between x and y is. For example x is one computer and y is 10, the person with x will never be able to compete with the people with y now matter how much they try. The same thing then extrapolates into the upper tiers, or those with many more computers. Or even the person with only one pc but dual quad cores in it! I am not sure there is an easy answer!


At one point there was a comparison where you were ranked against people with the same number of computers. But, even that has weaknesses in that, for example, I have mid to high end GPUs in all my systems now (but the lowest end one) and that makes those systems rather higher end than many people's ...

Another attempt, though I think it will falter, is the Formula BOINC contest where teams are supposed to compete on an "even" start. But, here again, unlike a Car race, those teams with the largest number of participants is always going to finish in the top. If the team numbers are close, there may be some switching of places ... but ... long term, the more participants / computers is going to win each time.

Mostly this is about personal bests and seeing how much work one has done. Sadly though, many have this idea that because they have done more than most that they are somehow "better" or should be listened to more ... sadly, being able to rack up more counts than the next participant only means that I can afford to dump more money into the pot than those others ...

mikey
mikey
Joined: 22 Jan 05
Posts: 12663
Credit: 1839061411
RAC: 4247

RE: RE: You know as I was

Message 90870 in response to message 90869

Quote:
Quote:
You know as I was reading this my mind went to a tiered setup where someone with x to y machines is in the same group as all other people with the same numbers of computers. But then I went to the problem of how far the spread between x and y is. For example x is one computer and y is 10, the person with x will never be able to compete with the people with y now matter how much they try. The same thing then extrapolates into the upper tiers, or those with many more computers. Or even the person with only one pc but dual quad cores in it! I am not sure there is an easy answer!

At one point there was a comparison where you were ranked against people with the same number of computers. But, even that has weaknesses in that, for example, I have mid to high end GPUs in all my systems now (but the lowest end one) and that makes those systems rather higher end than many people's ...

Another attempt, though I think it will falter, is the Formula BOINC contest where teams are supposed to compete on an "even" start. But, here again, unlike a Car race, those teams with the largest number of participants is always going to finish in the top. If the team numbers are close, there may be some switching of places ... but ... long term, the more participants / computers is going to win each time.

Mostly this is about personal bests and seeing how much work one has done. Sadly though, many have this idea that because they have done more than most that they are somehow "better" or should be listened to more ... sadly, being able to rack up more counts than the next participant only means that I can afford to dump more money into the pot than those others ...

I totally agree, you, and me to some extent, seem to have more resources than some others which just means we can have more and better toys than some! It also means we are closer to the cutting edge when it comes to computers, in our cases. This means we often find problems and possibly even solutions to those problems before others even know there are problems! You for instance are doing SLI setups with some very high end video cards and then using that setup for gpu crunching!! Very cutting edge and also very helpful to the developers that care to look and see what is coming in the future for their own projects!

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6588
Credit: 312616773
RAC: 170033

The distributed computing

The distributed computing landscape just keeps broadening, doesn't it? Very healthy overall, even after applying a discount for some of the unhelpful competitive behaviours. It is still true that the vast bulk of DC work is done by 'common or garden' machines - no slight intended - and that must remain as the target audience of note. Exceptional performance is obviously welcomed, including clusters and whatnot, not simply for plain contribution but, as you say, for lessons learnt which nudge the boundaries toward larger/new host sets.

Reminds me of Australia's premier road race ( Bathurst ) that for a while had become a trophy you could buy - mainly by the very expensive method of testing to destruction, and thus establishing component lifetimes. Then strict technical regulation of specifics returned, plus true homologation, and the race returned to the merits of good driving, good crews and good luck. Still quite an expensive business but in a 'flat' sense. It is a truism that the starting grid does not predict the final order some six hours later.

I think we joked a while ago about toasters doing BOINC in the near future. The bread may become pure buckyballs if we wait for a confirmed GW detection before it pops up though ..... :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

mikey
mikey
Joined: 22 Jan 05
Posts: 12663
Credit: 1839061411
RAC: 4247

RE: Reminds me of

Message 90872 in response to message 90871

Quote:
Reminds me of Australia's premier road race ( Bathurst ) that for a while had become a trophy you could buy - mainly by the very expensive method of testing to destruction, and thus establishing component lifetimes. Then strict technical regulation of specifics returned, plus true homologation, and the race returned to the merits of good driving, good crews and good luck. Still quite an expensive business but in a 'flat' sense. It is a truism that the starting grid does not predict the final order some six hours later.
Cheers, Mike.

F1 is coming to your neighborhood, Australia, in a couple of weeks, they have all that stuff but still if you spend gobs of money, you can win! Or sometimes cheating can make you win races but still lose in the end!

And yes crunching in the future will be done on much different machines than it is now, but I agree with you...each project must figure out how to cater to its largest group of crunchers and then also slightly to its high and low end, in order to stay alive and be a viable alternative! Either that or it too will go by the wayside.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.