Quad core vs dual core

Mahoujin Tsukai

Joined: 21 Jul 07

Posts: 7

Credit: 348900

RAC: 0

15 Aug 2007 5:03:15 UTC

Topic 193043

(moderation:

)

How much performance gain does one get in Einstein@Home when a quad core CPU is used in place of a dual core CPU?

Assume that all other factors are the same (motherboard, RAM, etc.) and both CPUs run at the same frequency.

I'm just curious as to how much faster a PC will crunch when a quad core CPU is used.

Akos Fekete

Joined: 13 Nov 05

Posts: 561

Credit: 4527270

RAC: 0

Quad core vs dual core

15 Aug 2007 5:07:17 UTC

Message 71263

(moderation:

)

Quote:

How much performance gain does one get in Einstein@Home when a quad core CPU is used in place of a dual core CPU?

Quad CPU means double performance.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2142

Credit: 2780916004

RAC: 740517

In theory, yes - as in, each

15 Aug 2007 8:59:32 UTC

Message 71264

(moderation:

)

In theory, yes - as in, each WU takes the same amount of time, but you can do twice as many because you'll be running four at once.

But in practice, I find there are bottlenecks - probably with the FSB and memory access.

It depends on the application. The current 'relaxed' (i.e. non-optimised) Einstein app is quite good, but the optimised SETI apps get in each other's way badly. During the previous SETI run (neglecting the current changes to multibeam), I could do the shortest single WU in about 25 minutes. If I ran 8 of them at the same time (2 x Xeon 5320 quad processors), each WU took up to 40 minutes.

Akos will be able to advise us whether the next round of optimisations is likely to put extra strain on the memory bus: if it does, quad cores might be most efficient running a variety of projects, rather than every core running the same program.

archae86

Joined: 6 Dec 05

Posts: 3146

Credit: 7073884931

RAC: 1307276

I've got one E6600 Dual Core

15 Aug 2007 12:12:58 UTC

Message 71265

(moderation:

)

I've got one E6600 Dual Core and one Q6600 Quad Core. Same generation of the Conreo chip, currently running at the same moderate overclocked speed (3.006 GHz).

For this configuration, for the current Einstein, and the current SETI application, I think I really do get double performance out of the Quad, in any proportion between SETI and Einstein from 0 to four.

While it is true that memory interface sharing implies some potential conflict, cache sharing implies some potential benefits, and the massive caches of the Conroe chips relative to the SETI working set mean that most folks who have checked have seen almost no performance impact of memory parameters, and negligible failure to get full ratio.

That result does not necessarily apply to other processors and other configurations. But I'll think you'd find a Q6600 at current prices a great buy and be very pleased with its performance. And if you are a serious overclocker, you'll wind up going faster than I am.

Future algorithm revisions on either SETI or Einstein could make my comments out-of-date. In a previous version of the project, Akos once released Einstein code which had the interesting consequence that my Gallatin actually ran less productively hyperthreaded than single-threaded. However I think the Conroe multi-processors are much more robust against such effects than the Gallatin implementation of hyperthreading is.

hotze33

Joined: 10 Nov 04

Posts: 100

Credit: 368387400

RAC: 35

Hi, as long as there 4

15 Aug 2007 17:02:24 UTC

Message 71266

(moderation:

)

Hi,
as long as there 4 independent task (like boinc), a quadcore is twice as fast as a dualcore.
If all 4 cores working on the same problem (like computational fluid dynamic,..) then the intel quadcore scales not so well. Typical you get x1.5 the performance and not 2. This will hopefully change with the new amd processors (*true quadcore*.

so long hotze

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3562358667

RAC: 40

RE: Future algorithm

16 Aug 2007 0:39:35 UTC

Message 71267 in response to message 71265

(moderation:

)

Quote:

Future algorithm revisions on either SETI or Einstein could make my comments out-of-date. In a previous version of the project, Akos once released Einstein code which had the interesting consequence that my Gallatin actually ran less productively hyperthreaded than single-threaded. However I think the Conroe multi-processors are much more robust against such effects than the Gallatin implementation of hyperthreading is.

That's because HT isn't dual core even though it can fake it. It relied on the fact that very few applications can actually keep work in half the cores execution units at any given time even when running at a 100% resource share. When a heavily optimized application is able to use the full CPU capacity there aren't any free operations to share to a second thread and instead all that you have happening is that you're hitting your cache twice as hard.

Dave Burbank

Joined: 30 Jan 06

Posts: 275

Credit: 1548376

RAC: 0

RE: RE: Future algorithm

16 Aug 2007 4:07:31 UTC

Message 71268 in response to message 71267

(moderation:

)

Quote:

Quote:
Future algorithm revisions on either SETI or Einstein could make my comments out-of-date. In a previous version of the project, Akos once released Einstein code which had the interesting consequence that my Gallatin actually ran less productively hyperthreaded than single-threaded. However I think the Conroe multi-processors are much more robust against such effects than the Gallatin implementation of hyperthreading is.

That's because HT isn't dual core even though it can fake it. It relied on the fact that very few applications can actually keep work in half the cores execution units at any given time even when running at a 100% resource share. When a heavily optimized application is able to use the full CPU capacity there aren't any free operations to share to a second thread and instead all that you have happening is that you're hitting your cache twice as hard.

Interesting, I have a question sort of related to this, sorry if its a bit off topic.

Before the app was optimized last year it would still claim 100% CPU usage on my A64 3700+. But clearly it wasn't using all, truly 100% of the CPUs full potential.

Once the app was optimized it still claimed 100% CPU usage, but was doing the work in a fraction of the time.

So here's my question, is the CPU actually working harder (more transistors are firing), or is it just the code is more efficient and is using "faster" instructions...or something?

There are 10^11 stars in the galaxy. That used to be a huge number. But it's only a hundred billion. It's less than the national deficit! We used to call them astronomical numbers. Now we should call them economical numbers. - Richard Feynman

archae86

Joined: 6 Dec 05

Posts: 3146

Credit: 7073884931

RAC: 1307276

RE: Before the app was

16 Aug 2007 5:01:16 UTC

Message 71269 in response to message 71268

(moderation:

)

Quote:

Before the app was optimized last year it would still claim 100% CPU usage on my A64 3700+. But clearly it wasn't using all, truly 100% of the CPUs full potential.

Once the app was optimized it still claimed 100% CPU usage, but was doing the work in a fraction of the time.

So here's my question, is the CPU actually working harder (more transistors are firing), or is it just the code is more efficient and is using "faster" instructions...or something?

The CPU has multiple units, which can work simultaneously. The CPU usage logging does not account for them independently, but simply charges the current process one second per second of time.

So if more units are in simultaneous use, on average, most likely more power will be burned. If the usage is efficient, most likely more useful work is being done.

However it is entirely possible for a more efficient algorithm to get more work done per second _without_ using more power. Think of the slogan "work smarter, not harder" for a human analogy.

Dave Burbank

Joined: 30 Jan 06

Posts: 275

Credit: 1548376

RAC: 0

RE: The CPU has multiple

16 Aug 2007 11:05:35 UTC

Message 71270 in response to message 71269

(moderation:

)

Quote:

The CPU has multiple units, which can work simultaneously. The CPU usage logging does not account for them independently, but simply charges the current process one second per second of time.

So if more units are in simultaneous use, on average, most likely more power will be burned. If the usage is efficient, most likely more useful work is being done.

However it is entirely possible for a more efficient algorithm to get more work done per second _without_ using more power. Think of the slogan "work smarter, not harder" for a human analogy.

So it's a bit of both, with efficient app design being the ideal goal.

So I guess compromises have to be made when when writing an app to work across many different platforms, or do what Akos did last year and optimize per CPU architecture.

Thanks

GoHack

Joined: 2 Jun 05

Posts: 37

Credit: 20602963

RAC: 0

What about the bigger L2

19 Aug 2007 4:00:33 UTC

Message 71271

(moderation:

)

What about the bigger L2 caches that are accompanying the Quad Cores? Can that be a feature that can be utilized?

Also could calculations be broken up, where by, each core takes a part of the calculation. On the machine language level, taking a 32 bit code, and breaking it into four, eight bit codes, for each core to work. I don't know if we are talking parallel core at this level.

Remember too, each core also has a Math-Coprocessor, which also could be doing calculations.

Having first learned how to program computers on the machine language level, then assembly, FORTAN, etc, w/very limited memory, I learned how to use everything very efficiently, squeezing it as much as possible. Now-a-days, I see programming as very sloppy, because of all the memory resourses available. Instead of removing unused code, they just leave it. I suppose it would take too much time to remove it. Programs, as well as operating systems get bloated, as well as slowed down.

adrianxw

Joined: 21 Feb 05

Posts: 242

Credit: 322654862

RAC: 0

I also programmed from the

19 Aug 2007 10:36:39 UTC

Message 71272

(moderation:

)

I also programmed from the late '70's. Thing then was hardware was really expensive relative to staff, so yes, you squeezed everything to fit the machine and used an army of programmers to do it. Today, the situation is the reverse. If you need another GB of RAM, you put it in, if you need a new staff member, you have to think big money.

Today though, I program mostly deeply embedded systems often with very small microcontrollers, so again, it is a case of managing your resources efficiently.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Quad core vs dual core

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner