Intel i7 performance suprise.

adrianxw

Joined: 21 Feb 05

Posts: 242

Credit: 322654862

RAC: 0

19 Nov 2009 15:26:44 UTC

Topic 194628

(moderation:

)

Example

I was looking at some of my results and was suprised by the example I quote above. The faster i7 took more then twice as long as my Core 2 quad. Now the i7Â´s hyperthreading certainly accounts for some of that but that result was a real suprise.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 830202577

RAC: 1247718

Intel i7 performance suprise.

19 Nov 2009 16:00:11 UTC

Message 95530

(moderation:

)

Quote:

Example

I was looking at some of my results and was suprised by the example I quote above. The faster i7 took more then twice as long as my Core 2 quad. Now the i7Â´s hyperthreading certainly accounts for some of that but that result was a real suprise.

Hmmm...maybe this particular i7 has slow memory or somethinmg like that, because there are other i7 here that are much faster, e.g. http://einsteinathome.org/host/1104966/tasks. It's best to compare runtime for the ABP1 jobs (those that get 250 credits) because their runtime does not depend on the individual workunit. The two i7 have rather differnt runtimes for those units (ca 30k sec vs 20k sec). This can hardly be explained by overclocking alone, can it?

Bikeman

rroonnaalldd

Joined: 12 Dec 05

Posts: 116

Credit: 537221

RAC: 0

RE: Hmmm...maybe this

19 Nov 2009 16:53:07 UTC

Message 95531 in response to message 95530

(moderation:

)

Quote:

Hmmm...maybe this particular i7 has slow memory or somethinmg like that, because there are other i7 here that are much faster, e.g. http://einsteinathome.org/host/1104966/tasks. It's best to compare runtime for the ABP1 jobs (those that get 250 credits) because their runtime does not depend on the individual workunit. The two i7 have rather differnt runtimes for those units (ca 30k sec vs 20k sec). This can hardly be explained by overclocking alone, can it?

Bikeman

The problem should be named as Vista. This piece of software is known for causing performance impacts and many other things.

Another point would be. The comparison in single-core performance between a Core i7 and a C2Q wins the C-i7 only with an average of 5-10% by same clockfrequency! But the troughput of all 8 cores would be much higher than the 4 cores of a C2Q.
A Core i7 with HT-activated should need 1.5x of time for a unit that the same host would need without HT, but with HT you are crunching 8 units parallel and in accumulation over all units, you would calculate more units in a given timeframe.

Paul D. Buck

Joined: 17 Jan 05

Posts: 754

Credit: 5385205

RAC: 0

RE: Example I was looking

19 Nov 2009 20:25:00 UTC

Message 95532

(moderation:

)

Quote:

Example

I was looking at some of my results and was suprised by the example I quote above. The faster i7 took more then twice as long as my Core 2 quad. Now the i7Â´s hyperthreading certainly accounts for some of that but that result was a real suprise.

Another contributing factor is what was the task "paired" with at run time... there is a fairly old Trak ticket out there asking that UCB put some effort into making the resource scheduler a little smarter when it schedules tasks so that the most efficient pairings are used. As an example running Prime Grid tasks alongside EaH tasks so that the FP Heavy EaH task would not block the PG tasks as much and vice versa...

Sadly, UCB has taken the stance that this is not important and in fact has been drifting towards making the "mix" less and less "interesting" which means that you are going to see more and more contention and less and less efficiency as versions increase.

This is especially sad as it looks like HT is going to be standard now across almost all of the Intel product line from the i5 to the coming i9 ...

ML1

Joined: 20 Feb 05

Posts: 347

Credit: 86563414

RAC: 0

RE: ... Sadly, UCB has

20 Nov 2009 16:48:03 UTC

Message 95533 in response to message 95532

(moderation:

)

Quote:

... Sadly, UCB has taken the stance that this is not important and in fact has been drifting towards making the "mix" less and less "interesting" which means that you are going to see more and more contention and less and less efficiency as versions increase.

This is especially sad as it looks like HT is going to be standard now across almost all of the Intel product line from the i5 to the coming i9 ...

That indeed might sometimes be bad for the HT CPUs, but that should give a performance boost for true physical multiple cores to reduce memory bandwidth contention.

Out of curiosity, any comment as to why UCB has gone that route?

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

Paul D. Buck

Joined: 17 Jan 05

Posts: 754

Credit: 5385205

RAC: 0

RE: That indeed might

21 Nov 2009 16:59:29 UTC

Message 95534 in response to message 95533

(moderation:

)

Quote:

That indeed might sometimes be bad for the HT CPUs, but that should give a performance boost for true physical multiple cores to reduce memory bandwidth contention.

Out of curiosity, any comment as to why UCB has gone that route?

In that this is a separate and equally compelling issue it too is not addressed. Or to put it another way, pairing up memory intense tasks, or Memory bandwidth tasks, or the previously mentioned FP vs INt tasks you would think that some attention would be paid to these issues.

As to the reason?

My simplest guess is that, based on the evidence, few of the developers (if any) or the people that the developers listen to are running systems quad or better. I saw behaviors years ago with my first Quad (a dual Xeon with HT) that to some extent still happens today. I avoid it mostly by setting TSI to 12 hours (720 Minutes) but other instabilities still exist.

Again, partly it is also the "if I squint hard enough I cannot see it so it does not exist ..." because seeing the issues would mean that they would have to address them. So, the mind model in the design is still of a single processor working on a single stream of tasks. The limitations of this model are visible on a 4 core but you have to look closely and watch the behaviors over time. These artifacts are far more noticeable on 8 or more processing elements.

Sadly the voices of some is far louder than others and years pass ...

The instance here is with the chaotic behavior that Richard Haslegrove noted and prior to that I had ... and part of the problem is that internal routines are run far more often than they need to be (Schedule and Enforce) with the main justification for this practice being a real-time project that has been defunct for a long time ... most interesting to me with this justification is that BOINC was purpose built to be a batch oriented system making it completely unsuitable for real-time processing ...

Oh, and the second justification was that repetitive running of routines will not result in chaotic behavior... because the rules are the same don't you see ... sadly this ignores decades or research in Fractals and Chaos where it has been demonstrated over and over that even simple systems can easily act in chaotic ways ...

Anyway, a failure of imagination ...

DJStarfox

Joined: 25 Mar 07

Posts: 10

Credit: 2484242

RAC: 0

RE: Hmmm...maybe this

23 Nov 2009 16:17:28 UTC

Message 95535 in response to message 95530

(moderation:

)

Quote:

Hmmm...maybe this particular i7 has slow memory or somethinmg like that, because there are other i7 here that are much faster, e.g. http://einsteinathome.org/host/1104966/tasks. It's best to compare runtime for the ABP1 jobs (those that get 250 credits) because their runtime does not depend on the individual workunit. The two i7 have rather differnt runtimes for those units (ca 30k sec vs 20k sec). This can hardly be explained by overclocking alone, can it?

Not sure what you people are talking about. My i7 can crunch the S6 WU at about 13k sec each. BTW, I'm running on Linux x86_64.

M. Schmitt

Joined: 27 Jun 05

Posts: 478

Credit: 15872262

RAC: 0

RE: RE: Hmmm...maybe this

23 Nov 2009 20:24:50 UTC

Message 95536 in response to message 95535

(moderation:

)

Quote:

Quote:
Hmmm...maybe this particular i7 has slow memory or somethinmg like that, because there are other i7 here that are much faster, e.g. http://einsteinathome.org/host/1104966/tasks. It's best to compare runtime for the ABP1 jobs (those that get 250 credits) because their runtime does not depend on the individual workunit. The two i7 have rather differnt runtimes for those units (ca 30k sec vs 20k sec). This can hardly be explained by overclocking alone, can it?

Not sure what you people are talking about. My i7 can crunch the S6 WU at about 13k sec each. BTW, I'm running on Linux x86_64.

Your computers are hidden, so nobody can verify what you are writing about. ;-)
And there is an influence of about 5% depending on what mixture of E@H-apps is running on an i7.

DJStarfox

Joined: 25 Mar 07

Posts: 10

Credit: 2484242

RAC: 0

RE: Your computers are

23 Nov 2009 21:22:19 UTC

Message 95537 in response to message 95536

(moderation:

)

Quote:

Your computers are hidden, so nobody can verify what you are writing about. ;-)
And there is an influence of about 5% depending on what mixture of E@H-apps is running on an i7.

Here's an example task that looks typical:
http://einsteinathome.org/task/147343691

I run 4 other projects, so the mix is quite random. RAM is DDR3 1600MHz @ CAS 7. No OC either.

How much does the hyperthreading affect RAC?

Paul D. Buck

Joined: 17 Jan 05

Posts: 754

Credit: 5385205

RAC: 0

RE: RE: Your computers

24 Nov 2009 20:35:18 UTC

Message 95538 in response to message 95537

(moderation:

)

Quote:

Quote:
Your computers are hidden, so nobody can verify what you are writing about. ;-)
And there is an influence of about 5% depending on what mixture of E@H-apps is running on an i7.

Here's an example task that looks typical:
http://einsteinathome.org/task/147343691

I run 4 other projects, so the mix is quite random. RAM is DDR3 1600MHz @ CAS 7. No OC either.

How much does the hyperthreading affect RAC?

Past experiments show as little as 10% to as high as 40% improvement in throughput over the same CPU with HT off. You will see longer individual run times on tasks running on the system.

M. Schmitt

Joined: 27 Jun 05

Posts: 478

Credit: 15872262

RAC: 0

RE: RE: Your computers

25 Nov 2009 7:19:40 UTC

Message 95539 in response to message 95537

(moderation:

)

Quote:

Quote:
Your computers are hidden, so nobody can verify what you are writing about. ;-)
And there is an influence of about 5% depending on what mixture of E@H-apps is running on an i7.

Here's an example task that looks typical:
http://einsteinathome.org/task/147343691

Ok, I see it's an i7 920. My root server is an i7 920 running 64 bit Linux too.

You can NOT compare GW(S5) tasks with each other, because some tasks take up to 50% longer for the same credits. ABP1 tasks have pretty constant runtimes, so you can do comparisons with them. If my i7 runs a mix of GW an ABP1 tasks, the ABP1 tasks generate about 43.x credit/h. When 8 ABP1 tasks are running c/h goes down to about 42 - no big difference. Afaik GW tasks make intensive use of SSE2 while ABP1 tasks do a bit SSE and some FPU work. This means they can run together pretty good. The arihtmetic units in a HT cpu are not doubled, but the register set is. If one task uses one specific arithmetic unit, an other task cant, but this will probably only happen with extreme applications(if ever).
I see no alternative to HT enabled. I cant change it on my server anyway.

Quote:

I run 4 other projects, so the mix is quite random. RAM is DDR3 1600MHz @ CAS 7. No OC either.

How much does the hyperthreading affect RAC?

HT will probably always raise your RAC.

cu,
Michael

Intel i7 performance suprise.

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner