Windows S5R2 App 4.38 available for Beta Test

anders n

Joined: 29 Aug 05

Posts: 123

Credit: 1656300

RAC: 0

On ths P4 2,8 the slowdown is

27 Aug 2007 3:53:20 UTC

Message 72069

(moderation:

)

On ths P4 2,8 the slowdown is about 20%.

Validated ok.
Anders n

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250548555

RAC: 34396

RE: Interesting: The

27 Aug 2007 11:24:18 UTC

Message 72070 in response to message 72068

(moderation:

)

Quote:

Interesting: The initial Wingman crashed as well, but in a different phase of the computation. Maybe there's something wrong with the input data. Bernd will be interested in this one, I guess.

I am. Actually that output is one of the reasons why I put in some checks that apparently make the App a little slower. Maybe they can be taken out before the next "official" App.

RandyC

Joined: 18 Jan 05

Posts: 6608

Credit: 111139797

RAC: 0

First 'pure' 4.38 result

27 Aug 2007 11:57:32 UTC

Message 72071

(moderation:

)

First 'pure' 4.38 result returned and validated. Looks like about 9.4% slowdown (glad it's not more) on that XP4600 AM2 system.

Seti Classic Final Total: 11446 WU.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250548555

RAC: 34396

RE: Interesting: The

27 Aug 2007 12:41:41 UTC

Message 72072 in response to message 72068

(moderation:

)

Quote:

Interesting: The initial Wingman crashed as well, but in a different phase of the computation. Maybe there's something wrong with the input data.

I don't think so.

The other Task ran out of memory (probably was too fragmented). The parameters of result #86540103 (what's following "non-finite Dphi_alpha:"), however, are quite confusing, there's something really weird going on on this computer. There's no way I could see this happening from the source code. Either there's a serious bug in what the compiler makes of it, or some hardware problem on the machine (e.g. bad memory or overheated CPU).

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 728482662

RAC: 1165932

RE: RE: Interesting: The

27 Aug 2007 17:38:00 UTC

Message 72073 in response to message 72072

(moderation:

)

Quote:

Quote:
Interesting: The initial Wingman crashed as well, but in a different phase of the computation. Maybe there's something wrong with the input data.

I don't think so.

The other Task ran out of memory (probably was too fragmented). The parameters of result #86540103 (what's following "non-finite Dphi_alpha:"), however, are quite confusing, there's something really weird going on on this computer. There's no way I could see this happening from the source code. Either there's a serious bug in what the compiler makes of it, or some hardware problem on the machine (e.g. bad memory or overheated CPU).

BM

Hmmm, but same pattern here:

One wingman crashes on setting up the stacks, one while computing Fstats , (both Windows) everything fine on Mac

wuid=34525532.

This is wierd, right?

CU
H-BE

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250548555

RAC: 34396

RE: One wingman crashes on

27 Aug 2007 18:51:54 UTC

Message 72074 in response to message 72073

(moderation:

)

Quote:

One wingman crashes on setting up the stacks, one while computing Fstats , (both Windows) everything fine on Mac

wuid=34525532.

This is wierd, right?

It's not different from what I would expect.
The machine that errored out "setting up stacks" clearly has a broken data file (and a Client < 5.6 that doesn't check the files before starting the App).
I wish the other machine had ran the 4.38 App, so I could get a better impression of what's wrong.
It's not an error in the data or the basic algorithm, this '[-8,8]' problem is specific to Windows, probably due to the VC compiler, and to certain machines (and maybe even the current state of the memory there).

I bet you'll find more of the same errors in the result history of both machines.

Ziran

Joined: 26 Nov 04

Posts: 194

Credit: 611790

RAC: 1114

RE: RE: Interesting: The

27 Aug 2007 19:20:29 UTC

Message 72075 in response to message 72072

(moderation:

)

Quote:

Quote:
Interesting: The initial Wingman crashed as well, but in a different phase of the computation. Maybe there's something wrong with the input data.

I don't think so.

The other Task ran out of memory (probably was too fragmented). The parameters of result #86540103 (what's following "non-finite Dphi_alpha:"), however, are quite confusing, there's something really weird going on on this computer. There's no way I could see this happening from the source code. Either there's a serious bug in what the compiler makes of it, or some hardware problem on the machine (e.g. bad memory or overheated CPU).

BM

The error occurred as you could se at 16.45.04 CET. Then I posted about the problem at 00.04.53 UTC the host was still running. After the error, the host had downloaded and done a 4H Rosetta result before downloading and started a new Einstein result. This host have run like clockwork for the last year. Havenâ€™t had a problem for the last 2 days since the error. So, if its something with my hardware, it hade to be some temporary problem or something that happens very rarely.

As I sad, the only problem I could think of, is the possibility that C: could have run out of space then I was unpacking things with Winrar. So too see if I could replicate this error I suspended the current result and downloaded a new one.
http://einsteinathome.org/task/86613787

After nearly filling C: I started unpacking things, but even then I completely filled the disk, I couldnâ€™t get it to error out. BTW then I aborted the result, the Einstein application asked for internet access.

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

Stick

Joined: 24 Feb 05

Posts: 790

Credit: 33131015

RAC: 1144

RE: RE: My early

28 Aug 2007 1:19:08 UTC

Message 72076 in response to message 72055

(moderation:

)

Quote:

Quote:
My early indications say that v4.38 will be significantly slower than v4.33. I have it running on 2 hosts. Both started with "To completion" times similar to their last v4.33 results, but after a couple of hours of processing, those "To completion" times have actually increased a little and "Progress" is barely at 2%. If my extrapolation is accurate, that means my v4.33 times of 57 and 67 hours will go up to around 80 and 90 hours with v4.38.

It's still early, but things may not be as bad (slow) as my first estimate (above). After a couple of more hours of prcoessing it look like my WU's are likely to be only 10 to 12 hours slower than the v4.33 times would have been. That is, v4.38 may only be 15% (+/-) slower (vs. the 25% (+) I reported earlier).

My Intel(R) Pentium(R) 4 CPU 2.40GHz just finished http://einsteinathome.org/task/86542040 in about 77 hours. It a "pure v4.38" result. As I indicated above, other "monster" units on this host had been taking about 67 hours - but direct comparisons are no longer possible since its previous results have already been deleted from the DB.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7224834931

RAC: 1039063

RE: Bernd, is there

30 Aug 2007 0:32:10 UTC

Message 72077 in response to message 72064

(moderation:

)

Quote:

Bernd, is there anything specific we can do to provide you useful feedback on this beta? With the exception of errors I generated in switching from beta to production 4.33, my machines have not been generating errors for weeks--since I got the CPU voltages up high enough to fully handle my overclock (and since your code releases removed some of the more common code-related problems)

If you'd like to see the error-trap code operate, and possibly are interested in what may be a typical signature for a Core 2 part running this application just slightly faster than its capability, I could slightly drop the CPU voltage on my machines to a level that will probably error within a few hours, but not right away.

Bernd did not indicate any interest in the deliberate speed/voltage error test, but I've tried to do it anyway, as I've long suspected that some modest part of the troubling result errors might come from Core 2 overclocked machines which were running just a bit to fast for the Einstein ap.

After running for over two days, I've failed to generate a speed/voltage Einstein error on either my Core 2 Duo E6600 or my Core 2 Quad Q6600. I dropped the voltage an increment per hour until I was stopped by being unable to boot (the Duo), or had a system freeze less than two minutes after getting into Windows (the Quad), then raised the votage one minimum increment, and have run there since. The Duo actually generated one SETI error, but otherwise both systems seem to have run stably.

In the case of the Quad, I had been running at 1.35V 3.006 GHz, and these last two days have been at 1.31875 (As BIOS requested). The Duo was running 1.40 and has been running 1.35625.

I'm bothering to report this because it think it means something has changed in the Einstein ap, certainly since the version current about April 2007 (when I did my Duo setup, and found Einstein required several more increments of CPU voltage than booting, or running SETI), and probably since early July 2007 (when I tuned up the Quad, and believe I recall seeing at least one of the same Einstein error code syndrome).

If this is really true, it is good news, as the risk to the project of seeing these errors from Core 2 overclocked hosts usually running SETI and visiting here during SETI outages is less, and the risk that users will be annoyed that Einstein won't run at conditions which work fine for SETI is also down.

Please note that even if true, likely these conclusions only apply to Core 2, possibly only on Windows XP, and quite likely only to B step, not the newer G step.

I've reverted to production 4.33, and plan to continue running at the same reduced voltage margins for up to a week. If that runs clean, most likely the accidental fix happened before 4.33, but if it bombs in the first few hours, it may be that somehow 4.38 changes happened to alter this behavior.

RandyC

Joined: 18 Jan 05

Posts: 6608

Credit: 111139797

RAC: 0

RE: I've reverted to

30 Aug 2007 1:49:46 UTC

Message 72078 in response to message 72077

(moderation:

)

Quote:

I've reverted to production 4.33, and plan to continue running at the same reduced voltage margins for up to a week. If that runs clean, most likely the accidental fix happened before 4.33, but if it bombs in the first few hours, it may be that somehow 4.38 changes happened to alter this behavior.

I think 4.38 was compiled with a different compiler than was used previously.

See this thread.

Seti Classic Final Total: 11446 WU.

Windows S5R2 App 4.38 available for Beta Test

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner