Although I do not notice much, if any speed increase, yet.
It should be there. You probably are seeing the cyclic nature of the runtime. If you look at this thread there are pointers in there for figuring out the runtimes... Now, since you're running Core2 / Xeon processors, it would be interesting to see if the gains are less as you move into the newer processor architectures...
Now, since you're running Core2 / Xeon processors, it would be interesting to see if the gains are less as you move into the newer processor architectures...
Most definitely not less. Much of the speed increase originates from going back to an older version of the Microsoft compiler that avoids generating some instruction patterns that are extremely costly on at least P6, K8 and even recent Core(2) microarchitecture CPUs, a so called store-forwarding stall.
For the curious:
Simplified, that code pattern would look like this:
...
- write a double word (32 bits) to address x
- write another double word to address x+4
- maybe do something else, but at most a few instructions
- read a quad word (64 bits) from address x
...
(so basically: reading 64 bits from a memory location that was written to shortly before, but in two separate chunks of 32 bits each).
This is deadly for performance. At the time of reading, the CPU internal memory I/O logic is clever enough to notice if that address was written to only "recently" (as frequently happens for memory representing local variables in higher programming languages) and can then take a shortcut and get that data from a buffer where pending store operations are kept.
So the CPU doesn't have to wait until the data is actually written to memory (which is still comparatively slow).
However, the CPU is not clever enough to find this shortcut if the memory location in question was written to in several partial chunks. In that case, it has to wait until the preceding store operations are finished, somewhat stalling the whole flow of processing instructions.
Maybe the latest Penryns can deal with this better? But at least the pre-Penryn Core 2s should see a significant speed increase from avoiding this particular code pattern generated by the MS compiler (bug) involved.
Xeon 2,4G HT Result 91784162, time 35056.19, credit 237.83 - App 4.26 doesn't show Freq? Result 91608265, time 34890.67, credit 237.83, Freq=791.284527395 - App 4.15
However, that's not particularly important. The important bit is the sequence number which tells you what part of the cycle you are in. You really need to do a bit of recent reading :).
The cycle for 791.10 has a period of 128.9
Peaks in runtime will therefore be at seq#s of 0, 129, 258, 387, ...
Troughs in runtime will therefore be at seq#s of 64, 193, 322, 451, ...
Your 4.15 crunched result had a seq# of 316 which is near the trough at 322.
Your 4.26 crunched result had a seq# of 246 which is near the peak at 258.
As you accumulate more data, you will find that Bernd has done you a big favour by making the 4.26 app official. You should go seek out Mike Hewson's ReadyReckoner as documented in this thread.
Since this is now the main App, should this be removed from sticky?
My idea was to leve this thread at top for a few more days for the people that got the 4.26 new now and want to comment on it.
However I'm not sure that in this case they'll find the thread at all. Any better ideas for where to announce and discuss new Apps, especially "public" ones?
Any better ideas for where to announce and discuss new Apps, especially "public" ones?
BM
It might be helpful if you could clean up the Applications page - sometime in the future, not an urgent priority. I personally find it confusing that there are four sets of applications showing, all version 4.xx (with overlapping subversion numbers), and with two pairs of identical descriptions.
The clue is in the datestamp, of course, and the cognoscenti will know that only the bottom block is relevent - I hope it's not off the bottom of the screen for too many users. Perhaps the true 'current' block could be moved to the top (reversing the present order), and the apps for S4/S5R2 etc. could be flagged as superceded?
It might be helpful if you could clean up the Applications page
Oh that page...
This is automatically generated from the DB, and regarding the DB it's so difficult to get certain things in that no one even dares to think about how to get things out once they're obsolete.
But a reversal of the display order sounds like a good idea and should easily be possible.
Originally not planned for a release I now published the 4.26 as first "official" "separate graphics" App. I hope that we'll see the benefits of this approach in this App soon, i.e. graphics on Vista working and a decrease of computation errors as graphics problems shouldn't affect the computation any more.
Originally not planned for a release I now published the 4.26 as first "official" "separate graphics" App. I hope that we'll see the benefits of this approach in this App soon, i.e. graphics on Vista working and a decrease of computation errors as graphics problems shouldn't affect the computation any more.
BM
So, I should remove the app_info.xml file?
You could, but you don't need to. At least one more "power app" is in the pipeline and will hopefully come out next week.
RE: Well, this seems to now
)
Announcement was made here in the same thread
It should be there. You probably are seeing the cyclic nature of the runtime. If you look at this thread there are pointers in there for figuring out the runtimes... Now, since you're running Core2 / Xeon processors, it would be interesting to see if the gains are less as you move into the newer processor architectures...
RE: Now, since you're
)
Most definitely not less. Much of the speed increase originates from going back to an older version of the Microsoft compiler that avoids generating some instruction patterns that are extremely costly on at least P6, K8 and even recent Core(2) microarchitecture CPUs, a so called store-forwarding stall.
For the curious:
Simplified, that code pattern would look like this:
...
- write a double word (32 bits) to address x
- write another double word to address x+4
- maybe do something else, but at most a few instructions
- read a quad word (64 bits) from address x
...
(so basically: reading 64 bits from a memory location that was written to shortly before, but in two separate chunks of 32 bits each).
This is deadly for performance. At the time of reading, the CPU internal memory I/O logic is clever enough to notice if that address was written to only "recently" (as frequently happens for memory representing local variables in higher programming languages) and can then take a shortcut and get that data from a buffer where pending store operations are kept.
So the CPU doesn't have to wait until the data is actually written to memory (which is still comparatively slow).
However, the CPU is not clever enough to find this shortcut if the memory location in question was written to in several partial chunks. In that case, it has to wait until the preceding store operations are finished, somewhat stalling the whole flow of processing instructions.
Maybe the latest Penryns can deal with this better? But at least the pre-Penryn Core 2s should see a significant speed increase from avoiding this particular code pattern generated by the MS compiler (bug) involved.
CU
Bikeman
Xeon 2,4G HTResult 91784162,
)
Xeon 2,4G HT
Result 91784162, time 35056.19, credit 237.83 - App 4.26 doesn't show Freq?
Result 91608265, time 34890.67, credit 237.83, Freq=791.284527395 - App 4.15
RE: ... App 4.26 doesn't
)
Yes it does - the frequency is the same - 791.10
However, that's not particularly important. The important bit is the sequence number which tells you what part of the cycle you are in. You really need to do a bit of recent reading :).
The cycle for 791.10 has a period of 128.9
Peaks in runtime will therefore be at seq#s of 0, 129, 258, 387, ...
Troughs in runtime will therefore be at seq#s of 64, 193, 322, 451, ...
Your 4.15 crunched result had a seq# of 316 which is near the trough at 322.
Your 4.26 crunched result had a seq# of 246 which is near the peak at 258.
As you accumulate more data, you will find that Bernd has done you a big favour by making the 4.26 app official. You should go seek out Mike Hewson's ReadyReckoner as documented in this thread.
Cheers,
Gary.
Since this is now the main
)
Since this is now the main App, should this be removed from sticky?
RE: Since this is now the
)
My idea was to leve this thread at top for a few more days for the people that got the 4.26 new now and want to comment on it.
However I'm not sure that in this case they'll find the thread at all. Any better ideas for where to announce and discuss new Apps, especially "public" ones?
BM
BM
RE: Any better ideas for
)
It might be helpful if you could clean up the Applications page - sometime in the future, not an urgent priority. I personally find it confusing that there are four sets of applications showing, all version 4.xx (with overlapping subversion numbers), and with two pairs of identical descriptions.
The clue is in the datestamp, of course, and the cognoscenti will know that only the bottom block is relevent - I hope it's not off the bottom of the screen for too many users. Perhaps the true 'current' block could be moved to the top (reversing the present order), and the apps for S4/S5R2 etc. could be flagged as superceded?
RE: It might be helpful if
)
Oh that page...
This is automatically generated from the DB, and regarding the DB it's so difficult to get certain things in that no one even dares to think about how to get things out once they're obsolete.
But a reversal of the display order sounds like a good idea and should easily be possible.
BM
BM
RE: Originally not planned
)
So, I should remove the app_info.xml file?
RE: RE: Originally not
)
You could, but you don't need to. At least one more "power app" is in the pipeline and will hopefully come out next week.
BM
BM