A performance question

B52

Joined: 19 Feb 05

Posts: 45

Credit: 273899

RAC: 0

9 Mar 2005 18:16:22 UTC

Topic 188413

(moderation:

)

Have been trying to raise this at s@h, but got no answers that I could use.

Presently I have 2 pc's doing this project

1. My new Intel Prescott 3ghz running XP Pro and HT
2. My old and trustworthy 1.2 ghz amd tbird running 98 se

The strange thing is that the 1.2 tbird is much more faster on this project than the p4 is.

This is a standard result from the Prescott HT'ed

http://einsteinathome.org/workunit/433146

Result # 2

This a standard result from the good old tbird

http://einsteinathome.org/workunit/433103

Result # 3

How come that difference is really that big ????

In the benchmarks, the Prescott should prevail, but nope

Any explanation m8's

Cheers

Jordan Wilberding

Joined: 19 Feb 05

Posts: 162

Credit: 715454

RAC: 0

A performance question

9 Mar 2005 18:21:25 UTC

Message 7753

(moderation:

)

Easy, your P4 has HT technology, therefore it is running as two processors.

So when you are doing two WUs during 47,238.27, where as your AMD machine is doing one WU in 39,411.86.

So with your P4 you are really have 47,200/2 = 23,500 per WU, which a little under twice the speed of your AMD.

such things just should not be writ so please destroy this if you wish to live 'tis better in ignorance to dwell than to go screaming into the abyss worse than hell

John Hunt

Joined: 4 Mar 05

Posts: 1227

Credit: 501906

RAC: 0

Strange results indeed! I

9 Mar 2005 18:27:06 UTC

Message 7754

(moderation:

)

Strange results indeed!

I have 2 PCs here -

No. 1 is a Pentium 4, 3GHz with 512MB RAM
No. 2 is a Celeron (?), 2.4GHz with 128MB RAM

WU turnover times (approx) for Einstein are (No. 1) 8 hrs, (No. 2) 10.5 hrs
Seti (No. 1) 2.5 hrs, (No. 2) 9.5 hrs

Both PCs are running Windows XP Home.........

Jord

Joined: 26 Jan 05

Posts: 2952

Credit: 5893653

RAC: 248

John, your Seti results are

9 Mar 2005 18:30:46 UTC

Message 7755 in response to message 7754

(moderation:

)

John, your Seti results are so slow because the Celeron lacks Level 2 cache on the CPU. It's got only 128 kilobytes, whereas the P4 has a full 512 kilobytes to 1 megabytes. Seti thrives on high L2 cache.

John Hunt

Joined: 4 Mar 05

Posts: 1227

Credit: 501906

RAC: 0

> John, your Seti results are

9 Mar 2005 18:41:56 UTC

Message 7756 in response to message 7755

(moderation:

)

> John, your Seti results are so slow because the Celeron lacks Level 2 cache on
> the CPU. It's got only 128 kilobytes, whereas the P4 has a full 512 kilobytes
> to 1 megabytes. Seti thrives on high L2 cache.
>

Thanks Ageless! By the way, the Celeron PC was a cheapo job I purchased about a year ago as my first 'proper' PC. My only computing experience before this time last year was on a state-of-the-art Digital Rainbow ( 10MB hard disk, 128k RAM, running MS-DOS v.2.02 ) That was almost 30 years ago.....

B52

Joined: 19 Feb 05

Posts: 45

Credit: 273899

RAC: 0

Hi all I'm Indeed very

9 Mar 2005 18:58:04 UTC

Message 7757

(moderation:

)

Hi all

I'm Indeed very much clear about what goes on, on my pc's. The question was more like why does one project faver a certain type of cp over another.

Let me give you more stats, perhaps more helpfull

The stats for Einstain have all been given, but the same pc does the opposite on s@h shows the reverse

The bÃ³ttom line is

The Prescott does MORE AND FASTER work on s@h
The 1.2 tbird does MORE AND FASTER work on Einstein

These 2 compared per WU

Still an explanation needed, and thats why I called it performance

senator2

Joined: 11 Nov 04

Posts: 19

Credit: 41547

RAC: 0

> John, your Seti results are

9 Mar 2005 19:46:53 UTC

Message 7758 in response to message 7755

(moderation:

)

Which "core" the processor is is also important. AFAIK all of the Celerons use the older, shorter pipeline core (20 stages) while the newer P4s use the Northwood (31 stage) core. The number of stages effects the penalty for branch misprediction and cache misses. I'd bet if you compare a Prescott core 3Ghz with a Northwood 3Ghz core you'll find a noticable difference even with the same size cache.

Jord

Joined: 26 Jan 05

Posts: 2952

Credit: 5893653

RAC: 248

> Which "core" the

9 Mar 2005 19:52:15 UTC

Message 7759 in response to message 7758

(moderation:

)

> Which "core" the processor is is also important. AFAIK all of the Celerons
> use the older, shorter pipeline core (20 stages) while the newer P4s use the
> Northwood (31 stage) core. The number of stages effects the penalty for
> branch misprediction and cache misses. I'd bet if you compare a Prescott core
> 3Ghz with a Northwood 3Ghz core you'll find a noticable difference even with
> the same size cache.
>
True that, plus I neglected to say that older Celerons use 256kb of L2 cache, so they may even outpace you (errm and me. Got a Celeron 2.3GHz based on the P4 Northwood). Even worse is that the new P4 Celeron range is also using 256kb L2 cache again, so it would seem we bought the "cheapo" at the wrong time, John. ;)

senator2

Joined: 11 Nov 04

Posts: 19

Credit: 41547

RAC: 0

> The Prescott does MORE AND

9 Mar 2005 20:05:41 UTC

Message 7760 in response to message 7757

(moderation:

)

> The Prescott does MORE AND FASTER work on s@h
> The 1.2 tbird does MORE AND FASTER work on Einstein
>
> These 2 compared per WU
>
> Still an explanation needed, and thats why I called it performance
Simple analogy: The P4 is a dragster, the Athlon is a sports car. On a quarter mile track the P4 rules, on a Rally Course the Athlon wins.

The Prescott can perform more operations per second than the Athlon under ideal conditions. The real world is a different story. The P4 has a slow floating point unit (because the designers counted on SSE/2/3 speeding up most tasks...E@H is not one of those), about half the speed of the Athlon, so even under ideal conditional a 3Ghz P4 is only a match for a 1.5Ghz Athlon. To further complicate matters the information that the processor needs must be available but often is not, that can be caused by a Cache miss (which requires pulling the information from either L2 cache or main memory which are tens or hundreds of times slower). Due to the higher clock speed and longer pipeline (more operations being processed at the same time) the P4 suffers a higher penalty for misses. Branches are also be a problem since the processor does not know in advance which way the program will go. Both processors use "branch prediction" to try to make an educated guess as to which way the program will go and have those instructions ready to go (incorrect guesses often suffer a cache miss and flush of "speculative" operations from the incorrectly assumed path). Both processors have different branch predictors so they will get different "hit" rates for the same program, and again the Athlon suffers a smaller penalty for misses.

B52

Joined: 19 Feb 05

Posts: 45

Credit: 273899

RAC: 0

Well thx m8 That was kinda

9 Mar 2005 20:16:35 UTC

Message 7761

(moderation:

)

Well thx m8

That was kinda the explanation I was looking for.

A well founded tech explanation

Owe you 1 m8

John Hunt

Joined: 4 Mar 05

Posts: 1227

Credit: 501906

RAC: 0

Thanks from me too,

9 Mar 2005 20:21:34 UTC

Message 7762

(moderation:

)

Thanks from me too, Senator!
I always thought a processor was a processor! Nice way you explain things too; great for non-techies like me!

A performance question

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner