I was looking at some of my results and was suprised by the example I quote above. The faster i7 took more then twice as long as my Core 2 quad. Now the i7´s hyperthreading certainly accounts for some of that but that result was a real suprise.
Copyright © 2025 Einstein@Home. All rights reserved.
Intel i7 performance suprise.
)
Hmmm...maybe this particular i7 has slow memory or somethinmg like that, because there are other i7 here that are much faster, e.g. http://einsteinathome.org/host/1104966/tasks. It's best to compare runtime for the ABP1 jobs (those that get 250 credits) because their runtime does not depend on the individual workunit. The two i7 have rather differnt runtimes for those units (ca 30k sec vs 20k sec). This can hardly be explained by overclocking alone, can it?
Bikeman
RE: Hmmm...maybe this
)
The problem should be named as Vista. This piece of software is known for causing performance impacts and many other things.
Another point would be. The comparison in single-core performance between a Core i7 and a C2Q wins the C-i7 only with an average of 5-10% by same clockfrequency! But the troughput of all 8 cores would be much higher than the 4 cores of a C2Q.
A Core i7 with HT-activated should need 1.5x of time for a unit that the same host would need without HT, but with HT you are crunching 8 units parallel and in accumulation over all units, you would calculate more units in a given timeframe.
RE: Example I was looking
)
Another contributing factor is what was the task "paired" with at run time... there is a fairly old Trak ticket out there asking that UCB put some effort into making the resource scheduler a little smarter when it schedules tasks so that the most efficient pairings are used. As an example running Prime Grid tasks alongside EaH tasks so that the FP Heavy EaH task would not block the PG tasks as much and vice versa...
Sadly, UCB has taken the stance that this is not important and in fact has been drifting towards making the "mix" less and less "interesting" which means that you are going to see more and more contention and less and less efficiency as versions increase.
This is especially sad as it looks like HT is going to be standard now across almost all of the Intel product line from the i5 to the coming i9 ...
RE: ... Sadly, UCB has
)
That indeed might sometimes be bad for the HT CPUs, but that should give a performance boost for true physical multiple cores to reduce memory bandwidth contention.
Out of curiosity, any comment as to why UCB has gone that route?
Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
RE: That indeed might
)
In that this is a separate and equally compelling issue it too is not addressed. Or to put it another way, pairing up memory intense tasks, or Memory bandwidth tasks, or the previously mentioned FP vs INt tasks you would think that some attention would be paid to these issues.
As to the reason?
My simplest guess is that, based on the evidence, few of the developers (if any) or the people that the developers listen to are running systems quad or better. I saw behaviors years ago with my first Quad (a dual Xeon with HT) that to some extent still happens today. I avoid it mostly by setting TSI to 12 hours (720 Minutes) but other instabilities still exist.
Again, partly it is also the "if I squint hard enough I cannot see it so it does not exist ..." because seeing the issues would mean that they would have to address them. So, the mind model in the design is still of a single processor working on a single stream of tasks. The limitations of this model are visible on a 4 core but you have to look closely and watch the behaviors over time. These artifacts are far more noticeable on 8 or more processing elements.
Sadly the voices of some is far louder than others and years pass ...
The instance here is with the chaotic behavior that Richard Haslegrove noted and prior to that I had ... and part of the problem is that internal routines are run far more often than they need to be (Schedule and Enforce) with the main justification for this practice being a real-time project that has been defunct for a long time ... most interesting to me with this justification is that BOINC was purpose built to be a batch oriented system making it completely unsuitable for real-time processing ...
Oh, and the second justification was that repetitive running of routines will not result in chaotic behavior... because the rules are the same don't you see ... sadly this ignores decades or research in Fractals and Chaos where it has been demonstrated over and over that even simple systems can easily act in chaotic ways ...
Anyway, a failure of imagination ...
RE: Hmmm...maybe this
)
Not sure what you people are talking about. My i7 can crunch the S6 WU at about 13k sec each. BTW, I'm running on Linux x86_64.
RE: RE: Hmmm...maybe this
)
Your computers are hidden, so nobody can verify what you are writing about. ;-)
And there is an influence of about 5% depending on what mixture of E@H-apps is running on an i7.
RE: Your computers are
)
Here's an example task that looks typical:
http://einsteinathome.org/task/147343691
I run 4 other projects, so the mix is quite random. RAM is DDR3 1600MHz @ CAS 7. No OC either.
How much does the hyperthreading affect RAC?
RE: RE: Your computers
)
Past experiments show as little as 10% to as high as 40% improvement in throughput over the same CPU with HT off. You will see longer individual run times on tasks running on the system.
RE: RE: Your computers
)
Ok, I see it's an i7 920. My root server is an i7 920 running 64 bit Linux too.
You can NOT compare GW(S5) tasks with each other, because some tasks take up to 50% longer for the same credits. ABP1 tasks have pretty constant runtimes, so you can do comparisons with them. If my i7 runs a mix of GW an ABP1 tasks, the ABP1 tasks generate about 43.x credit/h. When 8 ABP1 tasks are running c/h goes down to about 42 - no big difference. Afaik GW tasks make intensive use of SSE2 while ABP1 tasks do a bit SSE and some FPU work. This means they can run together pretty good. The arihtmetic units in a HT cpu are not doubled, but the register set is. If one task uses one specific arithmetic unit, an other task cant, but this will probably only happen with extreme applications(if ever).
I see no alternative to HT enabled. I cant change it on my server anyway.
HT will probably always raise your RAC.
cu,
Michael