A performance question

B52
B52
Joined: 19 Feb 05
Posts: 45
Credit: 273,899
RAC: 0
Topic 188413

Have been trying to raise this at s@h, but got no answers that I could use.

Presently I have 2 pc's doing this project

1. My new Intel Prescott 3ghz running XP Pro and HT
2. My old and trustworthy 1.2 ghz amd tbird running 98 se

The strange thing is that the 1.2 tbird is much more faster on this project than the p4 is.

This is a standard result from the Prescott HT'ed

http://einsteinathome.org/workunit/433146

Result # 2

This a standard result from the good old tbird

http://einsteinathome.org/workunit/433103

Result # 3

How come that difference is really that big ????

In the benchmarks, the Prescott should prevail, but nope

Any explanation m8's

Cheers

Jordan Wilberding
Jordan Wilberding
Joined: 19 Feb 05
Posts: 162
Credit: 715,454
RAC: 0

A performance question

Easy, your P4 has HT technology, therefore it is running as two processors.

So when you are doing two WUs during 47,238.27, where as your AMD machine is doing one WU in 39,411.86.

So with your P4 you are really have 47,200/2 = 23,500 per WU, which a little under twice the speed of your AMD.

such things just should not be writ so please destroy this if you wish to live 'tis better in ignorance to dwell than to go screaming into the abyss worse than hell

John Hunt
John Hunt
Joined: 4 Mar 05
Posts: 1,227
Credit: 501,906
RAC: 0

Strange results indeed! I

Strange results indeed!

I have 2 PCs here -

No. 1 is a Pentium 4, 3GHz with 512MB RAM
No. 2 is a Celeron (?), 2.4GHz with 128MB RAM

WU turnover times (approx) for Einstein are (No. 1) 8 hrs, (No. 2) 10.5 hrs
Seti (No. 1) 2.5 hrs, (No. 2) 9.5 hrs

Both PCs are running Windows XP Home.........

Ageless
Joined: 26 Jan 05
Posts: 2,949
Credit: 5,374,792
RAC: 0

John, your Seti results are

Message 7755 in response to message 7754

John, your Seti results are so slow because the Celeron lacks Level 2 cache on the CPU. It's got only 128 kilobytes, whereas the P4 has a full 512 kilobytes to 1 megabytes. Seti thrives on high L2 cache.

John Hunt
John Hunt
Joined: 4 Mar 05
Posts: 1,227
Credit: 501,906
RAC: 0

> John, your Seti results are

Message 7756 in response to message 7755

> John, your Seti results are so slow because the Celeron lacks Level 2 cache on
> the CPU. It's got only 128 kilobytes, whereas the P4 has a full 512 kilobytes
> to 1 megabytes. Seti thrives on high L2 cache.
>

Thanks Ageless! By the way, the Celeron PC was a cheapo job I purchased about a year ago as my first 'proper' PC. My only computing experience before this time last year was on a state-of-the-art Digital Rainbow ( 10MB hard disk, 128k RAM, running MS-DOS v.2.02 ) That was almost 30 years ago.....

B52
B52
Joined: 19 Feb 05
Posts: 45
Credit: 273,899
RAC: 0

Hi all I'm Indeed very

Hi all

I'm Indeed very much clear about what goes on, on my pc's. The question was more like why does one project faver a certain type of cp over another.

Let me give you more stats, perhaps more helpfull

The stats for Einstain have all been given, but the same pc does the opposite on s@h shows the reverse

The bóttom line is

The Prescott does MORE AND FASTER work on s@h
The 1.2 tbird does MORE AND FASTER work on Einstein

These 2 compared per WU

Still an explanation needed, and thats why I called it performance

senator2
senator2
Joined: 11 Nov 04
Posts: 19
Credit: 41,547
RAC: 0

> John, your Seti results are

Message 7758 in response to message 7755

> John, your Seti results are so slow because the Celeron lacks Level 2 cache on
> the CPU. It's got only 128 kilobytes, whereas the P4 has a full 512 kilobytes
> to 1 megabytes. Seti thrives on high L2 cache.

Which "core" the processor is is also important. AFAIK all of the Celerons use the older, shorter pipeline core (20 stages) while the newer P4s use the Northwood (31 stage) core. The number of stages effects the penalty for branch misprediction and cache misses. I'd bet if you compare a Prescott core 3Ghz with a Northwood 3Ghz core you'll find a noticable difference even with the same size cache.

Ageless
Joined: 26 Jan 05
Posts: 2,949
Credit: 5,374,792
RAC: 0

> Which "core" the

Message 7759 in response to message 7758

> Which "core" the processor is is also important. AFAIK all of the Celerons
> use the older, shorter pipeline core (20 stages) while the newer P4s use the
> Northwood (31 stage) core. The number of stages effects the penalty for
> branch misprediction and cache misses. I'd bet if you compare a Prescott core
> 3Ghz with a Northwood 3Ghz core you'll find a noticable difference even with
> the same size cache.
>
True that, plus I neglected to say that older Celerons use 256kb of L2 cache, so they may even outpace you (errm and me. Got a Celeron 2.3GHz based on the P4 Northwood). Even worse is that the new P4 Celeron range is also using 256kb L2 cache again, so it would seem we bought the "cheapo" at the wrong time, John. ;)

senator2
senator2
Joined: 11 Nov 04
Posts: 19
Credit: 41,547
RAC: 0

> The Prescott does MORE AND

Message 7760 in response to message 7757

> The Prescott does MORE AND FASTER work on s@h
> The 1.2 tbird does MORE AND FASTER work on Einstein
>
> These 2 compared per WU
>
> Still an explanation needed, and thats why I called it performance
    Simple analogy: The P4 is a dragster, the Athlon is a sports car. On a quarter mile track the P4 rules, on a Rally Course the Athlon wins.

   The Prescott can perform more operations per second than the Athlon under ideal conditions. The real world is a different story. The P4 has a slow floating point unit (because the designers counted on SSE/2/3 speeding up most tasks...E@H is not one of those), about half the speed of the Athlon, so even under ideal conditional a 3Ghz P4 is only a match for a 1.5Ghz Athlon. To further complicate matters the information that the processor needs must be available but often is not, that can be caused by a Cache miss (which requires pulling the information from either L2 cache or main memory which are tens or hundreds of times slower). Due to the higher clock speed and longer pipeline (more operations being processed at the same time) the P4 suffers a higher penalty for misses. Branches are also be a problem since the processor does not know in advance which way the program will go. Both processors use "branch prediction" to try to make an educated guess as to which way the program will go and have those instructions ready to go (incorrect guesses often suffer a cache miss and flush of "speculative" operations from the incorrectly assumed path). Both processors have different branch predictors so they will get different "hit" rates for the same program, and again the Athlon suffers a smaller penalty for misses.

B52
B52
Joined: 19 Feb 05
Posts: 45
Credit: 273,899
RAC: 0

Well thx m8 That was kinda

Well thx m8

That was kinda the explanation I was looking for.

A well founded tech explanation

Owe you 1 m8

John Hunt
John Hunt
Joined: 4 Mar 05
Posts: 1,227
Credit: 501,906
RAC: 0

Thanks from me too,

Thanks from me too, Senator!
I always thought a processor was a processor! Nice way you explain things too; great for non-techies like me!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.