P4: to hyperthread or not to hyperthread

d4rkm47r
d4rkm47r
Joined: 8 Oct 06
Posts: 2
Credit: 22758242
RAC: 0
Topic 191952

what's the common wisdom on P4 hyperthreading? i seem
to be getting mixed results!

i enabled HT on a dual socket/single core P4 xeon 2.8GHz/1MB/800MHz FSB
and saw no throughput improvement running 4 WUs
with HT vs. 2 WU with HT off:

http://einsteinathome.org/host/760773

WUs ran just a hair over 2x as long, and 2x as many
were produced... basically, no performance impact.
i'm disabling HT on this system!

i ran the same experiment on a dual socket/single core
P4 xeon 2.4GHz/512KB/400MHz FSB and saw something like a 25 - 30% boost in throughput.

http://einsteinathome.org/host/761261

WUs take ~ 60% longer but 2x as many get done in that
time.

with HT off, relative throughput is pretty much
proportional to the clock speed ratio of the two
systems - that's the expected result. however, note that with HT enabled, the 2.4GHz system mops up the 2.8GHz system! HUH???

it seems i'm better off enabling HT on the 2.4GHz
system and disabling it on the 2.8GHz system... and
this way, the 2.4GHz system has better throughput!

can anyone make sense of this???

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7063454931
RAC: 1224806

P4: to hyperthread or not to hyperthread

Quote:
what's the common wisdom on P4 hyperthreading?


You may wish to review this thread, on which I posted my hyperthreading observations for reasonably current software. Other engaged in some comment. For other (mostly older) results you can search the forum archive.

previous hyperthreading thread

At a fairly late stage in akosf's S4 application development, one key change meant that subsequent versions actually ran slower when paired on HT machines that the same ap on the same machines run non-HT. However we are now on S5 and those versions are part of history, not present.

Other than that case, we've not had consistent reports of failure to improve in HT, so far as I recall. I think reported throughput improvement for Einstein has clustered pretty near 20%.

Certainly not all HT machines are created equal, but your 2.8 GHz Xeon would be expected to improve. Possibly you had a non-matched sample of WU's, or possibly there was non-comparable non-normal system activity from other sources in the two cases. HT reporting times are particularly easy to influence (i.e. falsify) by the behavior of other aps. I've seen both cases: the other ap can make reported time be either higher or lower than the norm.

If Einstein productivity is your key issue, I'd turn on HT and leave it on. I suspect a more careful or better controlled experiment will confirm this on your own rig if you feel the need.

Semmel
Semmel
Joined: 16 Oct 06
Posts: 4
Credit: 124159
RAC: 0

RE: with HT off, relative

Quote:

with HT off, relative throughput is pretty much
proportional to the clock speed ratio of the two
systems - that's the expected result. however, note that with HT enabled, the 2.4GHz system mops up the 2.8GHz system! HUH???

it seems i'm better off enabling HT on the 2.4GHz
system and disabling it on the 2.8GHz system... and
this way, the 2.4GHz system has better throughput!

can anyone make sense of this???

P4 is not P4

There are different Cores.
Your P4@2,4GHz is a Northwood and the P4@2,8 GHz is a Prescott core!

In spite of the fact that the Prescott is newer, the Northwood is more efficient. That's why the Northwood is faster.

keyboards
keyboards
Joined: 2 Mar 06
Posts: 3
Credit: 80519
RAC: 0

RE: RE: with HT off,

Message 48446 in response to message 48445

Quote:
Quote:

with HT off, relative throughput is pretty much
proportional to the clock speed ratio of the two
systems - that's the expected result. however, note that with HT enabled, the 2.4GHz system mops up the 2.8GHz system! HUH???

it seems i'm better off enabling HT on the 2.4GHz
system and disabling it on the 2.8GHz system... and
this way, the 2.4GHz system has better throughput!

can anyone make sense of this???

P4 is not P4

There are different Cores.
Your P4@2,4GHz is a Northwood and the P4@2,8 GHz is a Prescott core!

In spite of the fact that the Prescott is newer, the Northwood is more efficient. That's why the Northwood is faster.

Running a P4 Prescott 2.8GHz and found that the WUs take about 60% longer but produce twice as many results. Net result is an improvement with HT on.

!!Stupidity should be PAINFUL!!

mray
mray
Joined: 23 Dec 05
Posts: 5
Credit: 312412712
RAC: 28376

There is one other issue with

There is one other issue with hyperthreading and BOINC that can be quite annoying. The OS seems to schedule priorities by CPU. So even if BOINC has a low priority, another single-thread app with normal priority will not cause BOINC to release both CPUs. BOINC keeps one and lets the other app get the other one. This can significantly impact performance of your applications since the other processor is not completely independant of the first one.

Hope that makes sense, it's what I appear to be seeing on my HT machines.


archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7063454931
RAC: 1224806

RE: The OS seems to

Message 48448 in response to message 48447

Quote:
The OS seems to schedule priorities by CPU. So even if BOINC has a low priority, another single-thread app with normal priority will not cause BOINC to release both CPUs.


Yes-that is an OS issue, not a BOINC issue. It is the obvious consequence of dealing with HT by having the OS treat the chip as two independent CPUs. It allowed reuse of lots of existing code, but slips where it fails to represent reality.

Your priority application gets _all_ of the one (of two virtual) CPUs it is running on. If that is not enough for you, don't run HT.

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86316101
RAC: 316

RE: RE: The OS seems to

Message 48449 in response to message 48448

Quote:
Quote:
The OS seems to schedule priorities by CPU. So even if BOINC has a low priority, another single-thread app with normal priority will not cause BOINC to release both CPUs.

Yes-that is an OS issue, not a BOINC issue. It is the obvious consequence of dealing with HT by having the OS treat the chip as two independent CPUs. It allowed reuse of lots of existing code, but slips where it fails to represent reality.

Your priority application gets _all_ of the one (of two virtual) CPUs it is running on. If that is not enough for you, don't run HT.


The Linux OS scheduler has special programming to deal sensibly with the Intel HT shenanigans.

Give one of the recent Linux distros a try to see what performance you get?

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

Metod, S56RKO
Metod, S56RKO
Joined: 11 Feb 05
Posts: 135
Credit: 810280326
RAC: 62564

RE: Give one of the recent

Message 48450 in response to message 48449

Quote:
Give one of the recent Linux distros a try to see what performance you get?


I'm running BOINC on one dual-Xeon (Prestonia) on my Debian Sarge.
What I observe is when there's a normal-priority memory-intensive app running on one half of a processor, the other half of that physical processor is mostly idle (like 95% idle) even though there are a couple of BOINC science apps waiting for CPU attention.

Which, IMHO, shows that linux CPU scheduller works like a charm.

Metod ...

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86316101
RAC: 316

RE: RE: Give one of the

Message 48451 in response to message 48450

Quote:
Quote:
Give one of the recent Linux distros a try to see what performance you get?

I'm running BOINC on one dual-Xeon (Prestonia) on my Debian Sarge.
What I observe is when there's a normal-priority memory-intensive app running on one half of a processor, the other half of that physical processor is mostly idle (like 95% idle) even though there are a couple of BOINC science apps waiting for CPU attention.

Which, IMHO, shows that linux CPU scheduller works like a charm.


And that also shows that your system is memory bandwidth limited! :-(

But yes, good to see that the priorities work as they should.

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

Metod, S56RKO
Metod, S56RKO
Joined: 11 Feb 05
Posts: 135
Credit: 810280326
RAC: 62564

RE: RE: RE: Give one of

Message 48452 in response to message 48451

Quote:
Quote:
Quote:
Give one of the recent Linux distros a try to see what performance you get?

I'm running BOINC on one dual-Xeon (Prestonia) on my Debian Sarge.
What I observe is when there's a normal-priority memory-intensive app running on one half of a processor, the other half of that physical processor is mostly idle (like 95% idle) even though there are a couple of BOINC science apps waiting for CPU attention.

Which, IMHO, shows that linux CPU scheduller works like a charm.


And that also shows that your system is memory bandwidth limited! :-(

But yes, good to see that the priorities work as they should.

I'm sorely aware of this fact and for the new systems I currently try to buy AMD systems. These don't show the same phenomenon. A quick test of an Intel Core2 Duo system showed much improvement over Xeons in this area also.

Metod ...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.