To HT or not to HT. That is the question

Winterknight
Winterknight
Joined: 4 Jun 05
Posts: 1221
Credit: 312196270
RAC: 653849

RE: RE: RE: The gain

Message 27287 in response to message 27282

Quote:
Quote:
Quote:
The gain will though however only show up SIGNIFICANTLY when using it for something like this (and other) projects. Meaning that some off the code are re-used when running HT and the 2 cores are running the same app.

I beg to differ. It is oft reported that SETI runs much faster when paired with Einstein in a hyperthreaded machine than when paired with another SETI. On my Gallatin the difference is that a typical SETI units consumes 45 CPU minutes when paired with an Einstein run, compared to about 75 CPU minutes when paired with another SETI.

On the other hand, the Einstein half of the pair seems to take the same time in these two cases.

Cache problem. SETI needs the whole cache, but not Einstein.


But it works with two separate cpu's like my dual p3, so not solely a cache defined thing.

Andy

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4527270
RAC: 0

RE: RE: RE: RE: The

Message 27288 in response to message 27287

Quote:
Quote:
Quote:
Quote:
The gain will though however only show up SIGNIFICANTLY when using it for something like this (and other) projects. Meaning that some off the code are re-used when running HT and the 2 cores are running the same app.

I beg to differ. It is oft reported that SETI runs much faster when paired with Einstein in a hyperthreaded machine than when paired with another SETI. On my Gallatin the difference is that a typical SETI units consumes 45 CPU minutes when paired with an Einstein run, compared to about 75 CPU minutes when paired with another SETI.

On the other hand, the Einstein half of the pair seems to take the same time in these two cases.

Cache problem. SETI needs the whole cache, but not Einstein.

But it works with two separate cpu's like my dual p3, so not solely a cache defined thing.

Andy

Why?
Two separated cpus have two separated caches.

Reuben Gathright
Reuben Gathright
Joined: 15 Feb 06
Posts: 23
Credit: 6851215
RAC: 0

I have two Xeon 3.0

I have two Xeon 3.0 Irwindales on Asus Ncch-DL boards running this app. Only one chip per board. Same 220FSB for each board, S-39L.

Xeon with Hyper(2 threads working):
6240 seconds average for 10 workunits

Xeon with Hyper off(only 1 thread avaiable to windows, working):
3960 seconds average for 10 workunits

Note: If you look at my results... they may change from the time of this post because I adding more CPUs.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023934931
RAC: 1805324

RE: Xeon with Hyper(2

Message 27290 in response to message 27289

Quote:

Xeon with Hyper(2 threads working):
6240 seconds average for 10 workunits

Xeon with Hyper off(only 1 thread avaiable to windows, working):
3960 seconds average for 10 workunits

Good to see fresh data on this topic.

Here is a little from the last two days on my one hyperthreaded machine.

It is a Gallatin 3.2 GHz P4 EE.
As such it has a 3-level cache:
L1: 8K data 12kuop code
L2: 512K
L3: 2048K

My Einstein Science ap is akosf S-39L, and my SETI science ap crunch3r's current version:
$Build: Windows SSE2 Intel Pentium 4 V2.10 by Crunch3r $
$Rev: 166.10 Windows SSE2 Intel Pentium 4 V2.10 $

My recent Einstein results have been from z1_1228.5
with most reported CPU times in HT mode around 125 minutes.

For this observation, I turned off HT and ran two pure results, logging 72.1 and 72.2 minutes. The first two results after resuming HT, also done pure (no SETI, no mixing with the non-HT state) logged 125.3 and 125.6 minutes.

So my observed production rate gain from using HT for Einstein on this machine is 15%. As always, possible inconsistencies in the WU's and possible variations in the machine conditions put some error haze around this observation.

Unfortunately my current stock of SETI results is unusually inconsistent in required CPU time from result to result. I will report, however, that for this particular machine the SETI HT gain is far higher than that I am reporting for Einstein, with an especially high benefit when SETI is one thread and Einstein is the other. I'll look for a run of results with consistent CPU times and rerun that part of the observation and report here. The results I observed were that "SETI/Einstein" HT gave SETI CPU times indistinguishable from non-HT SETI times, with a very modest (less than 3%) degradation in the Einstein time. Even I think this sounds too good to be true.

There is quite a variety of HT-supporting machines out there, differing in cache size, levels, and performance, not to mention the more basic Willamette vs. Prescott parentage differences. It would be great to see recent measured data from more machines.

networkman
networkman
Joined: 22 Jan 05
Posts: 98
Credit: 7140649
RAC: 0

At the moment I do not have

At the moment I do not have the time to be switching HT on/off to check the benchmark times of workunits. Aside from the sheer science of what E@H is, another distinct advantage that I see for this project is that the client(s)/project is just so darn stable - such that I can leave machines unattended for weeks with no worries about their productivity. :)

I hope I'm not jinxing anything by saying that.. even though I'm sure "Murphy" is lurking somewhere nearby!

"Chance is irrelevant. We will succeed."
- Seven of Nine

Winterknight
Winterknight
Joined: 4 Jun 05
Posts: 1221
Credit: 312196270
RAC: 653849

RE: RE: RE: RE: Quote

Message 27292 in response to message 27288

Quote:
Quote:
Quote:
Quote:
Quote:
The gain will though however only show up SIGNIFICANTLY when using it for something like this (and other) projects. Meaning that some off the code are re-used when running HT and the 2 cores are running the same app.

I beg to differ. It is oft reported that SETI runs much faster when paired with Einstein in a hyperthreaded machine than when paired with another SETI. On my Gallatin the difference is that a typical SETI units consumes 45 CPU minutes when paired with an Einstein run, compared to about 75 CPU minutes when paired with another SETI.

On the other hand, the Einstein half of the pair seems to take the same time in these two cases.

Cache problem. SETI needs the whole cache, but not Einstein.

But it works with two separate cpu's like my dual p3, so not solely a cache defined thing.

Andy

Why?
Two separated cpus have two separated caches.


You are the computer expert you tell me. 11k secs for Seti running Seti and Einstein, 15k sec running two * Seti.
Andy

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4527270
RAC: 0

RE: RE: RE: But it

Message 27293 in response to message 27292

Quote:
Quote:
Quote:
But it works with two separate cpu's like my dual p3, so not solely a cache defined thing.
Why?
Two separated cpus have two separated caches.
You are the computer expert you tell me. 11k secs for Seti running Seti and Einstein, 15k sec running two * Seti.

Ok. I understand you. As far as I know SETI uses 1-2 MB (or bigger) memory blocks at a time, Einstein needs only 20-30 kB. Probably, If you run two SETI applications at once then your memory cannot serve out both processors as fast as they need the datas. Crunch3r's SSE application is very hungry. :-)

The same problem appears on the dual core CPUs also, but it seems to be a cache problem, because the cache is between the CPU and the main memory. So, If you want faster SETI processing then you need bigger memorybandwith.

edit: If you run Einstein and SETI together then the needed memorybandwidht is less.

Winterknight
Winterknight
Joined: 4 Jun 05
Posts: 1221
Credit: 312196270
RAC: 653849

Akosf, Thanks for the

Akosf,
Thanks for the explaination, I've wondered for some time why it happens.

Andy

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.