results with linux clients are marked invalid

josep

Joined: 9 Mar 05

Posts: 63

Credit: 1156542

RAC: 0

Hi, I'm new here, but I have

12 Mar 2005 20:30:24 UTC

Message 7838

(moderation:

)

Hi, I'm new here, but I have noticed that your observations are true, Peter.

In my Windows 2000 box, results in the "Fstats.Ha" file seem to have less precision than they have in a second PC I'm also running with SuSE Linux 9.2. (Windows results always end with two zeros).

In a Google's search, I have found a post in gcc's forum, explaining this question. I paste the text here, perhaps it may help.

----(Text copied from http://gcc.gnu.org/ml/gcc/1999-03n/msg00643.html)-------

Re: Rounding errors using doubles?

* To:
* Subject: Re: Rounding errors using doubles?
* From: "Ross Smith"
* Date: Fri, 19 Mar 1999 14:45:38 +1300

From: Sam Lantinga
>
>I have to thank everyone here on this list for your responsiveness.
>The problem was caused by unexpected values in code using doubles
>that was ported from Windows. Apparently, by default, Windows uses
>53 bits of precision for it's double operations. Linux uses 64 bits.
>
>Since the code relies on the exact behavior of doubles quite extensively,
>we are setting the precision of the FPU to 53 bits using the fldcw
>instruction. :)
>
>BTW, does anyone know what precision is used on the PPC and Alpha
>architectures?

It's not really a difference between operating systems or (to a first
approximation) CPUs.

There's a standard called IEC 559 (formerly IEEE 754), which specifies
three standard floating-point arithmetic formats. Two of them are
32-bit and 64-bit formats (with 24-bit and 53-bit precision,
respectively); modern C/C++ compilers almost universally equate these
to float and double. The third is an 80-bit (64-bit precision) format
used for intermediate results in multi-step calculations.

The Intel X86 processors have can perform arithmetic in all three
modes, although it always works internally in 80-bit mode and
inserts automatic conversions for the other two. I don't know
anything about PPCs or Alphas, but the IEC standard is pretty much
universal now, and I'd be mildly amazed if either of them differed
in any important ways.

Exactly how floating-point arithmetic is done is a function of the
compiler, not the operating system. You didn't say which compiler you
were using on Windows, but Microsoft Visual C++ is the most likely.
Both MSVC and EGCS perform intermediate calculations in 80-bit internal
registers wherever possible, but allow values to spill into 64-bit
memory when the compiler can't manage to fit the entire calculation
into registers (a common problem on the register-poor X86
architecture). (The details of exactly how EGCS should handle this
were the subject of heated debate on this list not so long ago.)

It looks like what happened was that, at some critical point in your
program, MSVC allowed a value to spill to 64 bits (thus truncating it
to 53-bit precision) while EGCS was able to keep it in an 80-bit
register (retaining 64-bit precision). (Incidentally, because people
tend to attach far too much importance to this sort of observation, I
hasten to add that this implies nothing about the relative performance
of the two compilers in general; perhaps they were using different
optimisation settings, or perhaps they simple made different decisions
about which intermediate results should be kept in registers and which
sacrificed.)

Both compilers have options to force pure 64-bit arithmetic (53-bit
precision) throughout (at some cost in speed): -ffloat-store on EGCS,
/Op on MSVC.

--Ross Smith ................................... mailto:ross.s@ihug.co.nz
.............. The Internet Group, Auckland, New Zealand ..............
"The award for the Most Effective Promotion of Linux
goes to Microsoft." -- Nicholas Petreley

----------------end of copied text----------

guenterhb

Joined: 5 Mar 05

Posts: 1

Credit: 965879

RAC: 0

Well, I've just stopped

13 Mar 2005 9:02:42 UTC

Message 7839 in response to message 7834

(moderation:

)

Well, I've just stopped Einstein@Home because 2 of 4 wu's are marked as invalid. I'm waiting until this issue is fixed. (Sorry, I'm not interested in fiddling with wine).

Greetings, Guenter

Jordan Wilberding

Joined: 19 Feb 05

Posts: 162

Credit: 715454

RAC: 0

*Bump I'm hoping and admin

14 Mar 2005 11:04:43 UTC

Message 7840

(moderation:

)

*Bump

I'm hoping and admin or someone higher up will read this and answer the question once and for all...

such things just should not be writ so please destroy this if you wish to live 'tis better in ignorance to dwell than to go screaming into the abyss worse than hell

Bruce Allen

Moderator

Joined: 15 Oct 04

Posts: 1119

Credit: 172127663

RAC: 0

Teviet Creighton is currently

14 Mar 2005 18:32:48 UTC

Message 7841

(moderation:

)

Teviet Creighton is currently doing some work on the validator and studying some of these results. He'll respond to this thread when he's learned a bit more.

Bruce

Director, Einstein@Home

josep

Joined: 9 Mar 05

Posts: 63

Credit: 1156542

RAC: 0

Well, I suppose it may be

16 Mar 2005 17:52:42 UTC

Message 7842

(moderation:

)

Well, I suppose it may be helpful for the developers that we report also positive results, so here is my experience:

I joined Einstein@home just a week ago, with 2 Athlon machines, one of them runnig Windows 2000 and the other one runnig SuSE Linux 9.2

During this time every machine has completed over twelve WU's, and both have gotten valid results, with granted credit.

There are some WU's completed but still pending for granted credit (specially in the linux machine, that seems to have to wait longer for granted credit), but none of the WU's completed has been rejected as "invalid".

I observed, as has been said in this thread, that results in my Windows machine have less precission (results in "Fstats.Ha" file always end with two zeros) than they have in my Linux box.

But results of both machines are accepted as valid.

So, as far as I can see, in my particular case, it seems that the validator is running now OK.

Here are my results:

http://einsteinathome.org/account/tasks

rklein

Joined: 24 Feb 05

Posts: 4

Credit: 146362

RAC: 0

> So, as far as I can see, in

17 Mar 2005 10:22:06 UTC

Message 7843 in response to message 7842

(moderation:

)

> So, as far as I can see, in my particular case, it seems that the validator is
> running now OK.
>
> Here are my results:
>
> http://einsteinathome.org/account/tasks

I am affraid, that you will experience "0 credit granted" very soon.

josep

Joined: 9 Mar 05

Posts: 63

Credit: 1156542

RAC: 0

Excuse me, I recover this

29 Mar 2005 19:38:53 UTC

Message 7844

(moderation:

)

Excuse me, I recover this thread to add a probably very basic question. And written in my very poor English (lysdexia, come on, you have a lot of work here...)

Prof. Bruce Allen has said that this problem is already been studied by the developers.

But today I have found in gcc's online manual (http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html#i386-and-x86_002d64-Options)
the following optimization option:

--------------------------------------
-mfpmath=unit
Generate floating point arithmetics for selected unit unit. The choices for unit are:

`387'
Use the standard 387 floating point coprocessor present majority of chips and emulated otherwise. Code compiled with this option will run almost everywhere. The temporary results are computed in 80bit precision instead of precision specified by the type resulting in slightly different results compared to most of other chips. See -ffloat-store for more detailed description.
This is the default choice for i386 compiler.

`sse'
Use scalar floating point instructions present in the SSE instruction set. This instruction set is supported by Pentium3 and newer chips, in the AMD line by Athlon-4, Athlon-xp and Athlon-mp chips. The earlier version of SSE instruction set supports only single precision arithmetics, thus the double and extended precision arithmetics is still done using 387. Later version, present only in Pentium4 and the future AMD x86-64 chips supports double precision arithmetics too.
For the i386 compiler, you need to use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default.
The resulting code should be considerably faster in the majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80bit.
This is the default choice for the x86-64 compiler.

`sse,387'
Attempt to utilize both instruction sets at once. This effectively double the amount of available registers and on chips with separate execution units for 387 and SSE the execution resources too. Use this option with care, as it is still experimental, because the GCC register allocator does not model separate functional units well resulting in instable performance.

----------------------------------------------

Well, I have not developed or even compiled scientific sofware since the time I was a physics student, more than 13 years ago. And then I used only Borland Turbo C 2.0, in a MS-DOS environment, in my old 386SX (with an 387SX, of course). I have now no idea of using gcc in a modern Linux environment.

But it seems to me that here could be the reason for differences in speed and numerical precission between Einstein's Linux client and Windows client (and the resultant validation problems for Linux clients). If this is not the case, I apologise for wasting your valuable time.

I suppose that, for a project like Einstein@home, using the floating point unit at maximum precission (80 bit data lenght) is preferable. And this is gcc's default. I don't know if Einstein@home developers have changed this default at compiling time.

Perhaps they have not. And perhaps Microsoft's compiler is using SSE instructions. This should produce faster code (specially in Pentium 4 processors) but less numerical precission. And that is just what has been observed here by Peter Koek in "Fstats.Ha" file.

The option in Visual C++ to control this stuff is:

(from http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore/html/vclrfArchMinimumCPUArchitecture.asp)

-------------------------------------

/arch:[SSE|SSE2]

The compiler supports generation of code using the Streaming SIMD Extensions (SSE) and Streaming SIMD Extensions 2 (SSE2) instructions. The SSE instructions exist in various Pentium processors as well as in AMD Athlon processors. The SSE2 instructions currently only exists on the Pentium 4 processor.

For example, /arch:SSE allows the compiler to use the SSE instructions, and /arch:SSE2 allows the compiler to use the SSE2 instructions.

The optimizer will choose when and how to make use of the SSE and SSE2 instructions when /arch is specified. Currently SSE and SSE2 instructions will be used for some scalar floating-point computations, when it is determined that it is faster to use the SSE/SSE2 instructions and registers rather than the x87 floating-point register stack. As a result your code will actually use a mixture of both x87 and SSE/SSE2 for floating-point computations. Additionally, with /arch:SSE2, SSE2 instructions may be used for some 64-bit integer operations.

In addition to making use of the SSE and SSE2 instructions, the compiler will also make use of other instructions that are present on the processor revisions that support SSE and SSE2. An example of this is the CMOV instruction that first appeared in the PentiumPro revision of the Intel processors.

Specifying /arch with one of the /G options that specifies an older processor will be accepted without warning, but /G option will be silently ignored in favor of optimizing for the chip revision that corresponds to /arch. So, if /arch:SSE2 is specified with /G6, the compiler will optimize as if /G7 was specified. Similarly, if /arch:SSE is specified with /G5, the compiler optimize as if /G6 was specified.

When compiling with /clr, / arch will have no effect on code generation for managed functions; /arch only affects code generation for native functions.

/arch and /QIfist can not be used on the same compiland.

/Op in combination with /arch may in some cases provide different results than /Op without /arch. This is because with /Op alone individual expressions are evaluated on the x87 stack which can potentially mean a larger significand & exponent will be used than what is available in the SSE/SSE2 registers.

In particular if the user doesn't use _controlfp to modify the FP control word, the runtime startup code will set the x87 FPU control word precision-control field to 53-bits, so all float and double operations within an expression will occur with 53-bit significand and 15-bit exponent. All SSE single-precision operations will however use a 24-bit significand/8-bit exponent, and SSE2 double-precision operations will use a 53-bit significand/11-bit exponent.

---------------------------------------------

My Linux machine continues processing WU's, slowly but constantly, and all wu's are still obtaining granted credit, no single invalid result by the moment. But I suppose a lot of Linux users here would appreciate very much a solution for this problem...

Robert Somerville

Joined: 11 Nov 04

Posts: 27

Credit: 21819

RAC: 0

> Teviet Creighton is

29 Mar 2005 21:51:15 UTC

Message 7845 in response to message 7841

(moderation:

)

> Teviet Creighton is currently doing some work on the validator and studying
> some of these results. He'll respond to this thread when he's learned a bit
> more.
>
> Bruce
>
>
any word on the validation problem ..... ???

Robert Somerville

results with linux clients are marked invalid

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports