Windows S5R2 App 4.38 available for Beta Test

anders n
anders n
Joined: 29 Aug 05
Posts: 123
Credit: 1656300
RAC: 0

On ths P4 2,8 the slowdown is

On ths P4 2,8 the slowdown is about 20%.

Validated ok.
Anders n

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250554161
RAC: 34528

RE: Interesting: The

Message 72070 in response to message 72068

Quote:
Interesting: The initial Wingman crashed as well, but in a different phase of the computation. Maybe there's something wrong with the input data. Bernd will be interested in this one, I guess.


I am. Actually that output is one of the reasons why I put in some checks that apparently make the App a little slower. Maybe they can be taken out before the next "official" App.

BM

BM

RandyC
RandyC
Joined: 18 Jan 05
Posts: 6609
Credit: 111139797
RAC: 0

First 'pure' 4.38 result

First 'pure' 4.38 result returned and validated. Looks like about 9.4% slowdown (glad it's not more) on that XP4600 AM2 system.

Seti Classic Final Total: 11446 WU.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250554161
RAC: 34528

RE: Interesting: The

Message 72072 in response to message 72068

Quote:
Interesting: The initial Wingman crashed as well, but in a different phase of the computation. Maybe there's something wrong with the input data.


I don't think so.

The other Task ran out of memory (probably was too fragmented). The parameters of result #86540103 (what's following "non-finite Dphi_alpha:"), however, are quite confusing, there's something really weird going on on this computer. There's no way I could see this happening from the source code. Either there's a serious bug in what the compiler makes of it, or some hardware problem on the machine (e.g. bad memory or overheated CPU).

BM

BM

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 728842918
RAC: 1188668

RE: RE: Interesting: The

Message 72073 in response to message 72072

Quote:
Quote:
Interesting: The initial Wingman crashed as well, but in a different phase of the computation. Maybe there's something wrong with the input data.

I don't think so.

The other Task ran out of memory (probably was too fragmented). The parameters of result #86540103 (what's following "non-finite Dphi_alpha:"), however, are quite confusing, there's something really weird going on on this computer. There's no way I could see this happening from the source code. Either there's a serious bug in what the compiler makes of it, or some hardware problem on the machine (e.g. bad memory or overheated CPU).

BM

Hmmm, but same pattern here:

One wingman crashes on setting up the stacks, one while computing Fstats , (both Windows) everything fine on Mac

wuid=34525532.

This is wierd, right?

CU
H-BE

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250554161
RAC: 34528

RE: One wingman crashes on

Message 72074 in response to message 72073

Quote:

One wingman crashes on setting up the stacks, one while computing Fstats , (both Windows) everything fine on Mac

wuid=34525532.

This is wierd, right?


It's not different from what I would expect.
The machine that errored out "setting up stacks" clearly has a broken data file (and a Client < 5.6 that doesn't check the files before starting the App).
I wish the other machine had ran the 4.38 App, so I could get a better impression of what's wrong.
It's not an error in the data or the basic algorithm, this '[-8,8]' problem is specific to Windows, probably due to the VC compiler, and to certain machines (and maybe even the current state of the memory there).

I bet you'll find more of the same errors in the result history of both machines.

BM

BM

Ziran
Ziran
Joined: 26 Nov 04
Posts: 194
Credit: 611790
RAC: 1114

RE: RE: Interesting: The

Message 72075 in response to message 72072

Quote:
Quote:
Interesting: The initial Wingman crashed as well, but in a different phase of the computation. Maybe there's something wrong with the input data.

I don't think so.

The other Task ran out of memory (probably was too fragmented). The parameters of result #86540103 (what's following "non-finite Dphi_alpha:"), however, are quite confusing, there's something really weird going on on this computer. There's no way I could see this happening from the source code. Either there's a serious bug in what the compiler makes of it, or some hardware problem on the machine (e.g. bad memory or overheated CPU).

BM


The error occurred as you could se at 16.45.04 CET. Then I posted about the problem at 00.04.53 UTC the host was still running. After the error, the host had downloaded and done a 4H Rosetta result before downloading and started a new Einstein result. This host have run like clockwork for the last year. Haven’t had a problem for the last 2 days since the error. So, if its something with my hardware, it hade to be some temporary problem or something that happens very rarely.

As I sad, the only problem I could think of, is the possibility that C: could have run out of space then I was unpacking things with Winrar. So too see if I could replicate this error I suspended the current result and downloaded a new one.
http://einsteinathome.org/task/86613787

After nearly filling C: I started unpacking things, but even then I completely filled the disk, I couldn’t get it to error out. BTW then I aborted the result, the Einstein application asked for internet access.

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

Stick
Stick
Joined: 24 Feb 05
Posts: 790
Credit: 33131015
RAC: 1144

RE: RE: My early

Message 72076 in response to message 72055

Quote:
Quote:
My early indications say that v4.38 will be significantly slower than v4.33. I have it running on 2 hosts. Both started with "To completion" times similar to their last v4.33 results, but after a couple of hours of processing, those "To completion" times have actually increased a little and "Progress" is barely at 2%. If my extrapolation is accurate, that means my v4.33 times of 57 and 67 hours will go up to around 80 and 90 hours with v4.38.

It's still early, but things may not be as bad (slow) as my first estimate (above). After a couple of more hours of prcoessing it look like my WU's are likely to be only 10 to 12 hours slower than the v4.33 times would have been. That is, v4.38 may only be 15% (+/-) slower (vs. the 25% (+) I reported earlier).

My Intel(R) Pentium(R) 4 CPU 2.40GHz just finished http://einsteinathome.org/task/86542040 in about 77 hours. It a "pure v4.38" result. As I indicated above, other "monster" units on this host had been taking about 67 hours - but direct comparisons are no longer possible since its previous results have already been deleted from the DB.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225024931
RAC: 1045765

RE: Bernd, is there

Message 72077 in response to message 72064

Quote:


Bernd, is there anything specific we can do to provide you useful feedback on this beta? With the exception of errors I generated in switching from beta to production 4.33, my machines have not been generating errors for weeks--since I got the CPU voltages up high enough to fully handle my overclock (and since your code releases removed some of the more common code-related problems)

If you'd like to see the error-trap code operate, and possibly are interested in what may be a typical signature for a Core 2 part running this application just slightly faster than its capability, I could slightly drop the CPU voltage on my machines to a level that will probably error within a few hours, but not right away.


Bernd did not indicate any interest in the deliberate speed/voltage error test, but I've tried to do it anyway, as I've long suspected that some modest part of the troubling result errors might come from Core 2 overclocked machines which were running just a bit to fast for the Einstein ap.

After running for over two days, I've failed to generate a speed/voltage Einstein error on either my Core 2 Duo E6600 or my Core 2 Quad Q6600. I dropped the voltage an increment per hour until I was stopped by being unable to boot (the Duo), or had a system freeze less than two minutes after getting into Windows (the Quad), then raised the votage one minimum increment, and have run there since. The Duo actually generated one SETI error, but otherwise both systems seem to have run stably.

In the case of the Quad, I had been running at 1.35V 3.006 GHz, and these last two days have been at 1.31875 (As BIOS requested). The Duo was running 1.40 and has been running 1.35625.

I'm bothering to report this because it think it means something has changed in the Einstein ap, certainly since the version current about April 2007 (when I did my Duo setup, and found Einstein required several more increments of CPU voltage than booting, or running SETI), and probably since early July 2007 (when I tuned up the Quad, and believe I recall seeing at least one of the same Einstein error code syndrome).

If this is really true, it is good news, as the risk to the project of seeing these errors from Core 2 overclocked hosts usually running SETI and visiting here during SETI outages is less, and the risk that users will be annoyed that Einstein won't run at conditions which work fine for SETI is also down.

Please note that even if true, likely these conclusions only apply to Core 2, possibly only on Windows XP, and quite likely only to B step, not the newer G step.

I've reverted to production 4.33, and plan to continue running at the same reduced voltage margins for up to a week. If that runs clean, most likely the accidental fix happened before 4.33, but if it bombs in the first few hours, it may be that somehow 4.38 changes happened to alter this behavior.

RandyC
RandyC
Joined: 18 Jan 05
Posts: 6609
Credit: 111139797
RAC: 0

RE: I've reverted to

Message 72078 in response to message 72077

Quote:

I've reverted to production 4.33, and plan to continue running at the same reduced voltage margins for up to a week. If that runs clean, most likely the accidental fix happened before 4.33, but if it bombs in the first few hours, it may be that somehow 4.38 changes happened to alter this behavior.

I think 4.38 was compiled with a different compiler than was used previously.

See this thread.

Seti Classic Final Total: 11446 WU.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.