I asked Akos to write an efficient implementation of the hot-loop in x87 assembler to be independent of the compiler; he agreed to do this, but I haven't received any code yet.
Oops... I will make up it as fast as possible.
(I thought you want to optimize a bigger part of the code.)
I asked Akos to write an efficient implementation of the hot-loop in x87 assembler to be independent of the compiler; he agreed to do this, but I haven't received any code yet.
Oops... I will make up it as fast as possible.
(I thought you want to optimize a bigger part of the code.)
No problem. I actually wasn't sure, but a simple "hot-loop" will probably do for a start. We are still experimenting with how get the compilers to do the float->int conversion in the sin/cos routine efficiently, but if we find we need assembler here, too, we could probably add it later anyway.
I asked Akos to write an efficient implementation of the hot-loop in x87 assembler to be independent of the compiler; he agreed to do this, but I haven't received any code yet.
Oops... I will make up it as fast as possible.
(I thought you want to optimize a bigger part of the code.)
No problem. I actually wasn't sure, but a simple "hot-loop" will probably do for a start. We are still experimenting with how get the compilers to do the float->int conversion in the sin/cos routine efficiently, but if we find we need assembler here, too, we could probably add it later anyway.
BM
The ultimate goal is still SSE and not x87, right? x87 is an intermediate step?
The ultimate goal is still SSE and not x87, right? x87 is an intermediate step?
Sure. In any case a non-SSE, x87 only version is required for those hosts where SSE cannot be assumed, so the effort will not be wasted. An AMD Athlon can use every bit of optimization to make the deadline :-)
The ultimate goal is still SSE and not x87, right? x87 is an intermediate step?
Sure. In any case a non-SSE, x87 only version is required for those hosts where SSE cannot be assumed, so the effort will not be wasted. An AMD Athlon can use every bit of optimization to make the deadline :-)
Bikeman
Yeah, I know it needs to be done anyway... Hopefully it will help even out the differences in compilers... AMD processors have had penalties here since S5R2 started. First it was AMD in general, but now it appears to be specifically AMD/Windows.
Which brings up a question that I'll ask over in the other thread...
I get FPEs on Pentium III (Coppermine) [Family 6 Model 8 Stepping 10][fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse].
I am new and the first version I've got was 4.31 for this host. I don't know whether comparing to older versions would be a good thing, because trapping FPE is usually a good thing to do - and shouldn't be ignored (at least not in scientific calculations, Lorenz comes to mind).
I get FPEs on Pentium III (Coppermine) [Family 6 Model 8 Stepping 10][fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse].
I am new and the first version I've got was 4.31 for this host. I don't know whether comparing to older versions would be a good thing, because trapping FPE is usually a good thing to do - and shouldn't be ignored (at least not in scientific calculations, Lorenz comes to mind).
This particular fault (FPE) is really hard to produce with software bugs (other than compiler bugs), the most likely explanation is failing hardware. After all, a PIII Coppermine must be how old by now? 6 year? 7 years?
Most of the time it's not the CPU itself but things like failing fans, swollen capacitors on the motherboard, glitches in the power supply... Gary will be able to expand on this better than me. The E@H app has now reached a significant level of optimization and squeezes quite a bit of performance out of the FPU, so it's not surprising taht E@H is the first app to show symptoms of hardware failure.
This particular fault (FPE) is really hard to produce with software bugs (other than compiler bugs), the most likely explanation is failing hardware. After all, a PIII Coppermine must be how old by now? 6 year? 7 years?
Most of the time it's not the CPU itself but things like failing fans, swollen capacitors on the motherboard, glitches in the power supply... Gary will be able to expand on this better than me. The E@H app has now reached a significant level of optimization and squeezes quite a bit of performance out of the FPU, so it's not surprising taht E@H is the first app to show symptoms of hardware failure.
CU
Bikeman
One should also consider, that in previous versions certain (if not all?) FPEs appear to be ignored -- thus we might see design/programming flaws today (unless those traps appear only on faulty hardware and never on faulty software design (which is okay, and human, and must happen)). But then - if it would be flawed design, then it shouldn't only appear here.
From the release info:
throws floating-point exception on NaNs and FPU stack errors
Of course I could verify that by running an older version without that new traps, but this might mean incorrect/drifting data - so my decision would then rather be "this old, p3 driven host cannot participate in einstein". (would be okay. It would be interesting whether other users with the same CPU get the same FPEs, but then I think those old p3 coppermine users haven't detected the latest version yet, and run old versions)
Update: the 4.35 SSE version seems also to produce errors:
2008-02-23 10:15:34 [Einstein@Home] Resuming task h1_0851.85_S5R3__372_S5R3b_1 using einstein_S5R3 version 435
2008-02-23 10:15:46 [Einstein@Home] Deferring communication for 1 min 0 sec
2008-02-23 10:15:46 [Einstein@Home] Reason: Unrecoverable error for result h1_0851.85_S5R3__372_S5R3b_1 (process exited with code 99 (0x63))
RE: I asked Akos to write
)
Oops... I will make up it as fast as possible.
(I thought you want to optimize a bigger part of the code.)
RE: RE: I asked Akos to
)
No problem. I actually wasn't sure, but a simple "hot-loop" will probably do for a start. We are still experimenting with how get the compilers to do the float->int conversion in the sin/cos routine efficiently, but if we find we need assembler here, too, we could probably add it later anyway.
BM
BM
RE: RE: RE: I asked
)
The ultimate goal is still SSE and not x87, right? x87 is an intermediate step?
RE: The ultimate goal is
)
Sure. In any case a non-SSE, x87 only version is required for those hosts where SSE cannot be assumed, so the effort will not be wasted. An AMD Athlon can use every bit of optimization to make the deadline :-)
Bikeman
RE: RE: The ultimate
)
Yeah, I know it needs to be done anyway... Hopefully it will help even out the differences in compilers... AMD processors have had penalties here since S5R2 started. First it was AMD in general, but now it appears to be specifically AMD/Windows.
Which brings up a question that I'll ask over in the other thread...
I published the 4.31. Let's
)
I published the 4.31. Let's see how things go. I definitely need to fix the signal 11 problem "officially".
BM
BM
I get FPEs on Pentium III
)
I get FPEs on Pentium III (Coppermine) [Family 6 Model 8 Stepping 10][fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse].
I am new and the first version I've got was 4.31 for this host. I don't know whether comparing to older versions would be a good thing, because trapping FPE is usually a good thing to do - and shouldn't be ignored (at least not in scientific calculations, Lorenz comes to mind).
See: host 1117633
Results on other projects:
Rosetta: 1 Segfault, 1 negative seed (48568
LHC: no problems 9667879
SETI: no results (yet) 4221327
Btw, would a x86_64 version improve things (since sse/sse2 and mmx would be default on on current gccs)?
RE: I get FPEs on Pentium
)
I am trying 4.35 SSE now and will see whether this produces useable results and no FPEs.
Okay, have read the forum - the 64bit "power-user" port isn't that optimized.
Hi! This particular fault
)
Hi!
This particular fault (FPE) is really hard to produce with software bugs (other than compiler bugs), the most likely explanation is failing hardware. After all, a PIII Coppermine must be how old by now? 6 year? 7 years?
Most of the time it's not the CPU itself but things like failing fans, swollen capacitors on the motherboard, glitches in the power supply... Gary will be able to expand on this better than me. The E@H app has now reached a significant level of optimization and squeezes quite a bit of performance out of the FPU, so it's not surprising taht E@H is the first app to show symptoms of hardware failure.
CU
Bikeman
RE: Hi! This particular
)
One should also consider, that in previous versions certain (if not all?) FPEs appear to be ignored -- thus we might see design/programming flaws today (unless those traps appear only on faulty hardware and never on faulty software design (which is okay, and human, and must happen)). But then - if it would be flawed design, then it shouldn't only appear here.
From the release info:
throws floating-point exception on NaNs and FPU stack errors
Of course I could verify that by running an older version without that new traps, but this might mean incorrect/drifting data - so my decision would then rather be "this old, p3 driven host cannot participate in einstein". (would be okay. It would be interesting whether other users with the same CPU get the same FPEs, but then I think those old p3 coppermine users haven't detected the latest version yet, and run old versions)
Update: the 4.35 SSE version seems also to produce errors:
2008-02-23 10:15:34 [Einstein@Home] Resuming task h1_0851.85_S5R3__372_S5R3b_1 using einstein_S5R3 version 435
2008-02-23 10:15:46 [Einstein@Home] Deferring communication for 1 min 0 sec
2008-02-23 10:15:46 [Einstein@Home] Reason: Unrecoverable error for result h1_0851.85_S5R3__372_S5R3b_1 (process exited with code 99 (0x63))