RAM errors caused by cosmic background radiation

user32094
user32094
Joined: 22 Feb 05
Posts: 22
Credit: 74,648
RAC: 0
Topic 191004

Some 5 years ago there was a study, maybe by IBM, about the effects of background radiation on PC RAM. They concluded that it causes 1 bit error/256 MB/month at sea level which is 9.1e-4 errors/MB/week.

The status page says 77000 hosts active during the last week and 380 results invalid. Albert uses about 10 MB RAM. 77000*10*9.1e-4=700 errors. This estimate is too high because some hosts aren't calculating all the time and some have ECC RAM. Maybe all the invalid results are caused by cosmic background radiation.

Wurgl (speak^Wcrunching for Special: Off-Topic)
Wurgl (speak^Wc...
Joined: 11 Feb 05
Posts: 321
Credit: 140,550,008
RAC: 0

RAM errors caused by cosmic background radiation

Quote:
Some 5 years ago there was a study, maybe by IBM, about the effects of background radiation on PC RAM. They concluded that it causes 1 bit error/256 MB/month at sea level which is 9.1e-4 errors/MB/week.

Arn't the cells on the chips much smaller now? If that radiation is a particle, then the chance for a hit is smaller, if it is a wave, then the change is maybe the same. However, smaller cells means less energy is needed to change there charge, so ... Hmm?

But one thing is for sure! Windows gets hit more likely than Linux! Or are those neat blue screens caused by something else?

ghstwolf
ghstwolf
Joined: 9 Feb 05
Posts: 24
Credit: 59,103
RAC: 0

RE: Some 5 years ago there

Quote:

Some 5 years ago there was a study, maybe by IBM, about the effects of background radiation on PC RAM. They concluded that it causes 1 bit error/256 MB/month at sea level which is 9.1e-4 errors/MB/week.

The status page says 77000 hosts active during the last week and 380 results invalid. Albert uses about 10 MB RAM. 77000*10*9.1e-4=700 errors. This estimate is too high because some hosts aren't calculating all the time and some have ECC RAM. Maybe all the invalid results are caused by cosmic background radiation.

There is an error in your calculation. The 1 bit error rate is figured based on 256MB, we're using >4% of that. With that adjustment, we would see 28 errors/ month. Of course we could consider it another way as well, that errors creep in systematically. That is, all the memory in the computer is considered. Even there, and using a 4 gig average (well above what I'd expect to see), you would be below the 1600 errors a month average (380/wk= @1645/month).

While a small percentage of the errors may indeed be caused that way, it isn't a major contributor. I'd love to see a breakdown, but I suspect you'd find between 5-10% of the machines contribute about 80% of the non-valid results. As long as that is true, it is safe to assume that most of those machines are OC'd beyond stable (for crunching, they likely run fine for everyday tasks). The rest are hardware related, bad bits of memory or hard disk space.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.