"Completed, marked as invalid". Bad luck or bad science?

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1,714,373,961
RAC: 0
Topic 197390

From time to time I see some "Invalid" tasks in my completed tasks list. That my GPU sometimes produce bad results is natural and I expect a few of my results to be genuinely "bad". But most (just a feeling, not statistics) of my invalid results are of the type where my AMD GPU has been voted down by 2 Nvidia GPU's. Here are two examples.

I'm aware that floating point math on a computer is an approximation, and comparing floating point math on different hardware and software must be a true nightmare. It seems to me that tasks create results that vary quite a lot from the "mathematically correct" and the final result depends on what hardware/software combo that happened to crunch it.

Which finally leads to my question:
Given that I'm correct in the above, how inaccurate is using Boinc compared to running the whole project on a supercomputer? Will a supercomputer create better science, i.e. find more pulsars, waves?

mikey
mikey
Joined: 22 Jan 05
Posts: 12,561
Credit: 1,838,894,370
RAC: 20,743

"Completed, marked as invalid". Bad luck or bad science?

Quote:

From time to time I see some "Invalid" tasks in my completed tasks list. That my GPU sometimes produce bad results is natural and I expect a few of my results to be genuinely "bad". But most (just a feeling, not statistics) of my invalid results are of the type where my AMD GPU has been voted down by 2 Nvidia GPU's. Here are two examples.

I'm aware that floating point math on a computer is an approximation, and comparing floating point math on different hardware and software must be a true nightmare. It seems to me that tasks create results that vary quite a lot from the "mathematically correct" and the final result depends on what hardware/software combo that happened to crunch it.

Which finally leads to my question:
Given that I'm correct in the above, how inaccurate is using Boinc compared to running the whole project on a supercomputer? Will a supercomputer create better science, i.e. find more pulsars, waves?

This has LONG been a problem for most projects, those that let people use both kinds of gpu's anyway. BUT it ALSO comes into play with Mac's, Intel and AMD cpu's, as they are also not identical in how they do things. That is why the projects give a 'fudge factor' that give the wiggle room necessary for all the varied components to come up with a 'close enough' yet Scientifically valid result.

As for your real question about super computers as opposed to personal computers...probably yes a super computer would certainly do it MUCH faster and more accurately. After all each result would be verified against a nearly identical component. But speed is not always the concern, after we users crunch the units the results must be compiled and the next set of runs designed, often on the first runs results and complications. Meaning a super computer that while being super fast could be too fast for people with other lines of work that they must do to pay the bills.

I was a forum moderator at a project a long time ago, they made a whole bunch of us moderators because the project admins didn't have the time anymore to watch over the project on a day to day basis. When we moderators had trouble we often asked the Admins for help, they would say 'we brought you onboard to handle it so handle it', and left us on our own. In short power struggles took over because the project admins had other things to do. So while yes using a super computer may sound like a great idea, it requires a bunch of people in the background working nearly full time to make sure it is fully involved and being used to its best advantage. That is not a likely scenario for most Boinc projects.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 249,026,371
RAC: 34,093

A numerical answer to a

A numerical answer to a problem is (almost) never right or wrong, it's rather about being precise enough or not. Our "validators" (that compare the results) are carefully set up to make sure that the results are on the "precise enough" side, even if this means that some results are rejected that would scientifically be valid, too.

With our Radio-Pulsar search AFAIK we do find the known pulsars (see re-detection), I don't know of any that we missed that were in data searched by us, so I doubt that with more precision we would find more pulsars. Given the quality of the data, I think our (accepted) precision is good enough.

The search for gravitational waves that we are performing on Einstein@Home is also more limited by computing power (number of floating-point operations) than by precision. As much as we can we do in single precision floating-point math, because we get more sensitivity from more operation (allowing longer integration times) than from more precision.

BM

BM

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,581
Credit: 308,273,232
RAC: 205,342

I think if we asked Bruce

I think if we asked Bruce Allen, he would say that E@H is a supercomputer ! It's physical components just aren't on the same piece of real estate. :-)

But I understand the sense in which you mean that. The heterogeneity of the computing units - contributor's hardware - is legion. I think it is amazing, and this is the struggle for the developers of the science apps, that such disparate platforms even line up in the same direction to the extent that one can even approximately have a fiducial expectation. Here's a recipe that could be describing Einstein At Home in its totality :

Quote:

.... a wide range of applications. Making sweeping generalizations about these applications is difficult. In every case, however, an application for a heterogeneous platform must carry out the following steps:

1. Discover the components that make up the heterogeneous system.

2. Probe the characteristics of these components so that the software can adapt to the specific features of different hardware elements.

3. Create the blocks of instructions (kernels) that will run on the platform.

4. Set up and manipulate memory objects involved in the computation.

5. Execute the kernels in the right order and on the right components of the system.

6. Collect the final results.


.... but is quoted from OpenCL Programming Guide, by: Aaftab Munshi; Benedict R. Gaster; Timothy G. Mattson; James Fung; Dan Ginsburg
Publisher: Addison-Wesley Professional ( July 13, 2011 ) Print ISBN-13: 978-0-321-74964-2
. This OpenCL/E@H correspondence struck me when reading this book ( apropos Parallella ) on the w/end.

[ However at a different level a supercomputer wouldn't be doing as good a job as E@H, for the simple reason that it couldn't be afforded and so wouldn't be used. The core value of E@H ( BOINC projects generally ) is that its' compute units are provided without asset/running costs to the scientific investigators. ]

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 695,102,964
RAC: 135,793

Just to stress the point,

Just to stress the point, heterogeneity is not a totally bad thing at all.

If you run some science code on a barn full of identical machines (that's what most supercomputers are like today) and get consistent results...well...big deal!! If you run the same code using different hardware, different FFT libs, different OSes etc and in 99.something % of the cases get consistent results (and almost identical results in the majority of the remaining cases, that gives you some additional confidence in your code...and some doubts if it doesn't work like this. In fact we found several bugs over the years when cross-validation did not work as expected.

Cheers
HB

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.