Windows S5R3 App 4.25 available for Beta Test

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 694,860,751
RAC: 127,614

Just like the Linux version,

Just like the Linux version, the inclusion of the faster sin/cos code seems to cause the MS Compiler to make some misjudgements when tryining to optimize the rest of the code.

Well, this is a temporary problem, once Bernd has finished porting the hand-crafted SSE hotloop for the Microsoft compiler, the linear sin/cos function will definitely show its full performance, because the compilers will not be able to mess up that code.

Bikeman

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 248,985,085
RAC: 34,001

Ok, it's roughly what I

Ok, it's roughly what I expected - basically the same phenomenon that I saw with gcc on the 4.24 Linux App. Too bad.

Well, I'm still following the agenda I posted here. So the 4.24 and 4.25 releases are mainly to improve reliability of the Apps. I guess that I'll soon make them official to reduce the overall error rates of the project, probably together with the 4.22 Intel Mac App.

I still have a somewhat strange problem on MacOS PPC on my list that I need to look into which currently prevents me from building any new Apps for this platform, but apart from that I'm now putting more effort into optimization.

Akos, it looks like I could use a hand-crafted version of the "hot loop" for x87 FPU assembler, too. Can you make one and send it to me? I'm not sure whether this should include the code for the second sin_cos_lut, too, to make sure the compiler doesn't insert a costly function call there (we should discuss further details by eMail).

My next step will probably be a pair of Linux APIv6 Apps (x87 and SSE versions) to bring the CPU-feature dependent delivery of Apps to work on the server side, so ordinary participants can benefit from SSE-optimizations, too (*). Then I'll do the same for Windows.

BM

(*) the old-style CPU-feature switching in the App itself which we had in S4 Apps doesn't work very well with the new code

BM

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282,700
RAC: 0

RE: My next step will

Message 77351 in response to message 77350

Quote:

My next step will probably be a pair of Linux APIv6 Apps (x87 and SSE versions) to bring the CPU-feature dependent delivery of Apps to work on the server side, so ordinary participants can benefit from SSE-optimizations, too (*). Then I'll do the same for Windows.

Who are "ordinary participants"? Do you mean "ordinary Linux participants", i.e. those people who decide not to use the Linux Power Users app, either because they don't know about them or are not willing to try?

Anyway, I think you know my take on this, that 84% of your user base is Windows, and we wouldn't mind a little help. For 10 months, both during the S5R2 and S5R3 runs, the only problems I had were with two tasks. Both of those were due to power outages, which clearly your science application can't prevent from happening. If it can, then you might want to patent that and retire... ;-)
That said, I do realize that there were problems encountered, just I never did encounter them...

Brian

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 248,985,085
RAC: 34,001

RE: RE: My next step

Message 77352 in response to message 77351

Quote:
Quote:

My next step will probably be a pair of Linux APIv6 Apps (x87 and SSE versions) to bring the CPU-feature dependent delivery of Apps to work on the server side, so ordinary participants can benefit from SSE-optimizations, too (*). Then I'll do the same for Windows.

Who are "ordinary participants"? Do you mean "ordinary Linux participants", i.e. those people who decide not to use the Linux Power Users app, either because they don't know about them or are not willing to try?


I simply need something to start with. This requires modifications to the scheduler that, if they go wrong, will affect 84% of all participants if I try this for Windows, and 12% if I try this with Linux (and many of these are in the LSC, which helps not getting too angry on this project). In addition, the build and code for SSE Apps is already working on Linux, but not on Windows. So Linux looks the better thing to start with.

BM

BM

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282,700
RAC: 0

RE: I simply need

Message 77353 in response to message 77352

Quote:

I simply need something to start with. This requires modifications to the scheduler that, if they go wrong, will affect 84% of all participants if I try this for Windows, and 12% if I try this with Linux (and many of these are in the LSC, which helps not getting too angry on this project).

Yes, I notice I get paired up with Steffen quite often... Since I assume there is no "sandbox" that you can play in with the scheduler, has any thought been given to something that I mentioned WAY back when, which was to set up your own separate beta project? The separate project would have its' own scheduler and thus you could feel free to let loose and not worry, as all participants would fully know that they were involved with a beta and that "stuff" happens from time to time...

Quote:
In addition, the build and code for SSE Apps is already working on Linux, but not on Windows. So Linux looks the better thing to start with.

Perhaps, but the longer no apparent activity happens with the Windows app, the more attentive people tend to start feeling neglected. It is likely that the only way the general Windows user populace is going to know there was a change in this app is noticing a drop in their RAC, as most of the people who were having problems with the graphics crashing would either have been visiting these fora or would've stopped running the project... This is not to say that your efforts are not appreciated. If I didn't care to help test it, I wouldn't have stopped what I was doing with Cosmology and SETI and picked up the beta app. I'm going to run some results with just your app running, then bring Cosmology into the mix, and finally have it switching between all 3, just to say I tested it good... :-)

Edit: The only other main benefit for fixing the graphics problem right now and making it as the standard official app is for new users attaching, however they won't see a drop in their RAC. It is a tough call... 7% is steep though. It also increases the amount of time needed to process an app, and there have been continual mutterings about how "unfair" the deadlines are as it is...

Let me ask again: Can you extend deadlines out to 18 days temporarily until you can get SSE over into the Windows application? My gut feeling is that this could help reduce the occurances of some reissues and lost credit due to turning a task in after the newly assigned host has already reported back, as well as increase your "plays well with others" rating between other projects being shared on a single host...

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4,527,270
RAC: 0

RE: Akos, it looks like I

Message 77354 in response to message 77350

Quote:
Akos, it looks like I could use a hand-crafted version of the "hot loop" for x87 FPU assembler, too. Can you make one and send it to me?

Yes, of course.

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282,700
RAC: 0

RE: RE: Akos, it looks

Message 77355 in response to message 77354

Quote:
Quote:
Akos, it looks like I could use a hand-crafted version of the "hot loop" for x87 FPU assembler, too. Can you make one and send it to me?

Yes, of course.

FYI Akos, I'm now running h1_0712.40_S5R2__0_S5R3a, which according to the graph that archae86 posted should be either maximum runtime or close to it. The current progress is defintely slower than the past two that I've run, with my crude estimation technique saying that it should take about 51,000 seconds, which would put it at a guess of about 12% slower than 4.15 on my system...

Astro
Astro
Joined: 18 Jan 05
Posts: 257
Credit: 1,000,560
RAC: 0

I'm not sure how this comes

I'm not sure how this comes into play or not. maybe I just wanna post something??

In the 5 days I've been back attached to Einstein. I've completed and returned 30 wus. 21 of the 30 were done with 4.15 for windows, the other 9 were done with 4.24 for Linux.

I have 17 wus which are "pending"(56.66%). 5 of the 17 pending wus(29.4%) are pending because of computation errors on the part of my wingman.

Out of all wus returned, in 9 of the 30 (30%), I had atleast one "wingman" return a "computation error". I've not had a computation error.

I wonder what Bernd sees at % of wus returned with computation errors, or even if it's these errors he's trying to eradicate? If so, then that would lower the quantity of pending and thereby decrease the delay in granting of credits.

While I was bored, I created a chart of "credit" as I see it for the "big three" (Seti, Einstein, and Rosetta). From my perspective and machines Rosetta, Einstein, and Seti (stock) pay about the same to the "non optimized" masses. I acknowledge there'll be fluctuations at einstein depending on wu and that my sample size it low. I'm sure I'll take another look at this later.

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282,700
RAC: 0

RE: I wonder what Bernd

Message 77357 in response to message 77356

Quote:

I wonder what Bernd sees at % of wus returned with computation errors, or even if it's these errors he's trying to eradicate? If so, then that would lower the quantity of pending and thereby decrease the delay in granting of credits.

That's pretty much the point of 4.24 and 4.25. 4.24 addresses the signal 11 problem for Linux, while 4.25 is supposed to address the graphics problems in Windows. A 7-12% drop in performance may be a bit much of a hit for the Windows app though, seeing as how people were already being upset about deadlines being tight for slower and/or not-always-on hosts...

Quote:
I acknowledge there'll be fluctuations at einstein depending on wu and that my sample size it low. I'm sure I'll take another look at this later.

I can already tell that your Linux OS (is it the same machine, just dual booted?) looks to have probably pulled the high end of the runtime. I don't know about the Windows tasks for sure, but it looks like from archae86's chart the tasks at 376-380 wouldn't be at max runtime, so the cr/hr may be actually either the same or in favor of Linux... Additionally, since the 6000+ Linux setup is running 4.24, while the Windows setup is still running 4.15, and since we have confirmed slowdowns for Linux 4.24, your data is skewed in favor of Windows, since the Linux app experienced a slowdown...

It is my current personal belief that the current official Linux app (4.20) is faster than the current official Windows app (4.15) when run on the same processor, at least for AMDs. It would be interesting to see a continuance of 4.24 vs. 4.25 on the 6000+, assuming that the Linux side is actual hardware and not a VM.

Astro
Astro
Joined: 18 Jan 05
Posts: 257
Credit: 1,000,560
RAC: 0

Brian, I PMed Richard 3 days

Brian, I PMed Richard 3 days ago asking about what the numbers mean. Perhaps you can tell me? he pointed to two sets of numbers within the wu name. One which has a decimal (the first numbers which appear to be some sort of frequency), and a seconds set which I have NO idea what it is. I've seen the first number referred to as a "template" by an Archae86 chart. Basically, I don't know the terminology. I see the nifty inverted "full wave" DC chart "like" graphs, but don't know what the bottom line (x axis) represent. Is the x axis the second number? I I were to create a column for each in my script, what titles should I give them??

Yes, Dual boot for all my systems (except the wifes laptop).
the linux wus had 101 and 102 from 0715.25 as the second set of numbers. Windows wus were 375,376, 378, 379, and 380 from 0794.15. hmm, is there a range for these numbers??

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.