Invalid, but can't figure out why

Joined: 28 Mar 06
Posts: 21
Credit: 395978
RAC: 0
Topic 191825

Please take a look at this result:
http://einsteinathome.org/task/43271462

Any idea what happened there and why it's invalid?

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245269696
RAC: 12212

Invalid, but can't figure out why

Quote:
Any idea what happened there and why it's invalid?

The Task was "successful" (no "client error") and the result looks syntactically correct (else it would have been marked "validate error"). However, the numbers in it are not "close enough" to the reults of the other hosts that did this Task.

The valid results come from a Linux host running 4.17 and another Windows box ruuning 4.24 like yours, so it's no difference in the code that was used for the calculations.

I know that some AMD64 machines tend to simply give wrong floating-point results when the FPU gets too hot (instead of throwing exceptions or giving NANs). Watch your CPU temperature.

BM

BM

Mats Petersson
Mats Petersson
Joined: 26 Aug 05
Posts: 2
Credit: 432715
RAC: 0

RE: I know that some AMD64

Quote:
I know that some AMD64 machines tend to simply give wrong floating-point results when the FPU gets too hot (instead of throwing exceptions or giving NANs). Watch your CPU temperature.

Assuming when you say "too hot" you actually mean outside the specified range for the CPU temperature, then I agree that this is "expected" behaviour.

Unfortunately, it's very hard for the FPU to know if a few transistors in some corner (or the middle for that matter) are "misbehaving" - there is a thermal sensor in the CPU, but it's located at some arbitrarily choosen place, and if that's not triggered, the CPU continues working, even if some portion of the CPU is above the limit [also, the CPU overheating limit isn't there to prevent it from calculating incorrectly, but rather prevent it from destroying itself, so this limit is probably above the "normal operating temperature"].

I'm not aware of any FPU that "understands" that it's too hot and thus generating anything other than "incorrect results". Generating a NAN or exception is only possible if there is some logic to detect the scenario - which when some transistor is delivering the result late isn't something that is easy to detect....

Of course, you only need one single bit in a 32- or 64-bit result to be incorreect for the entire result to be completely wrong... :-(

--
Mats

Joined: 28 Mar 06
Posts: 21
Credit: 395978
RAC: 0

Thanks for the replies so

Thanks for the replies so far. Heat seems to be always an issue. I assume that my CPU got a little hot, simply because I have it overclocked by ~10%. Overclocking always dooms everything. ;-)
Usually, the Arctic Freezer has no problem keeping the temperature under 50°C (according to SpeedFan) and the room temperature is also stable at ~22°C all time of the year. 50°C shouldn't be an issue for AMD from what I've heard and read but I'm trying to be careful anyhow.

Mats Petersson
Mats Petersson
Joined: 26 Aug 05
Posts: 2
Credit: 432715
RAC: 0

RE: Thanks for the replies

Message 45895 in response to message 45894

Quote:
Thanks for the replies so far. Heat seems to be always an issue. I assume that my CPU got a little hot, simply because I have it overclocked by ~10%. Overclocking always dooms everything. ;-)
Usually, the Arctic Freezer has no problem keeping the temperature under 50°C (according to SpeedFan) and the room temperature is also stable at ~22°C all time of the year. 50°C shouldn't be an issue for AMD from what I've heard and read but I'm trying to be careful anyhow.

Miscalculation can be caused by speed also - or a combination of speed and heat. 50'C doesn't sound excessive, but who knows what temp-limits the CPU has for a specific frequency (outside the normal frequency-range). Your CPU has been tested for the speed that it's run at at stock speed.

I'm also a bit curious as to why your machine shows so abysmally low benchmark scores.

This machine: http://einsteinathome.org/host/396706
is a 3400+, which is the same speed (2.2GHz) as yours:
http://einsteinathome.org/host/581852
But yours have a benchmark score that is an order of magnitude different from mine...

This may indicate that your machine is suffering from some sort of problem...

--
Mats

Joined: 28 Mar 06
Posts: 21
Credit: 395978
RAC: 0

RE: Miscalculation can be

Message 45896 in response to message 45895

Quote:


Miscalculation can be caused by speed also - or a combination of speed and heat. 50'C doesn't sound excessive, but who knows what temp-limits the CPU has for a specific frequency (outside the normal frequency-range). Your CPU has been tested for the speed that it's run at at stock speed.

I'm also a bit curious as to why your machine shows so abysmally low benchmark scores.

This machine: http://einsteinathome.org/host/396706
is a 3400+, which is the same speed (2.2GHz) as yours:
http://einsteinathome.org/host/581852
But yours have a benchmark score that is an order of magnitude different from mine...

This may indicate that your machine is suffering from some sort of problem...

--
Mats

I've checked and was curious too. I re-benchmarked it and now the numbers seem to be more appropriate compared to your machine. I've overclocked from 2,2 Ghz to 2,42 GHz. The benchmark though has been taken on an XP that is installed for a year now with probably lot of trash since I use it for programming, playing, editing and so on. Also lots of applications running (FTP, Proxy for another machine, Winamp, etc.) all the time reducing my benchmarks.
Eventually, I'm taking less time for a WU though ~21100 vs. 24800. :-)

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

This sure looks realistic

This sure looks realistic now, your benchmarks are just a little bit higher than that of my own 3500+ (which is to be expected since I didn't overclock). Have you stopped getting invalid results, too?

Joined: 28 Mar 06
Posts: 21
Credit: 395978
RAC: 0

RE: This sure looks

Message 45898 in response to message 45897

Quote:
This sure looks realistic now, your benchmarks are just a little bit higher than that of my own 3500+ (which is to be expected since I didn't overclock). Have you stopped getting invalid results, too?

It was one invalid result so far and I can live with that, as it can happen from time to time.
I think my benchmark would be higher if I get rid of most of the junk lying around, but I don't bother since it's the outcome of the work that matters.
Overclocking is definitely worth trying if you can spare a processor and don't mind losing the guarantee. Especially in winter an overclocked CPU gives your house the needed heat. :-))

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

No, thanks a lot... as a poor

No, thanks a lot... as a poor student I'm not crazy about playing with my CPU's life ;-) As for the invalid result, you are of course right, it happens. When this PC was new and I had just attached to Einstein it crunched the first WU just fine, the second didn't validate and number 3 crashed with a computation error. Had me a bit worried ^^ but that same comp has been crunching fine ever since. I haven't changed anything about the OS, BOINC client, hardware or whatever and it certainly wasn't heat. I guess I'll never find out where those errors came from, they just seemed to appear out of nowhere...

So, keep on crunching, everyone!

Greetz, Annika

Metod, S56RKO
Metod, S56RKO
Joined: 11 Feb 05
Posts: 135
Credit: 810108508
RAC: 61795

RE: As for the invalid

Message 45900 in response to message 45899

Quote:
As for the invalid result, you are of course right, it happens. When this PC was new and I had just attached to Einstein it crunched the first WU just fine, the second didn't validate and number 3 crashed with a computation error. Had me a bit worried ^^ but that same comp has been crunching fine ever since. I haven't changed anything about the OS, BOINC client, hardware or whatever and it certainly wasn't heat. I guess I'll never find out where those errors came from, they just seemed to appear out of nowhere...

I've seen such weird problems with brand new computers quite often. They would happen in first few days (perhaps weeks) and then they'd dissappear.

Often enough to develop a theory about it: new computer randomly misbehaves until all the electrons inside the computer get the correct spin.

Metod ...

Joined: 28 Mar 06
Posts: 21
Credit: 395978
RAC: 0

RE: No, thanks a lot... as

Message 45901 in response to message 45899

Quote:
No, thanks a lot... as a poor student I'm not crazy about playing with my CPU's life ;-)

Hehe, that made me laugh. I'm still at university too. :-D
Currently the prices of Socket 939 CPU are dropping, so I might be going with an X2 in about half a year when they are really cheap.

Hopefully it won't act up on me then. Besides, I thought back when I started my machine early August of last year and I didn't have any crashed WUs. Once because of optimized client and missing app_info.xml and another one because I overdid it with overclocking and results became inaccurate.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.