Quad core vs dual core

Mahoujin Tsukai
Mahoujin Tsukai
Joined: 21 Jul 07
Posts: 7
Credit: 348900
RAC: 0
Topic 193043

How much performance gain does one get in Einstein@Home when a quad core CPU is used in place of a dual core CPU?

Assume that all other factors are the same (motherboard, RAM, etc.) and both CPUs run at the same frequency.

I'm just curious as to how much faster a PC will crunch when a quad core CPU is used.

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4527270
RAC: 0

Quad core vs dual core

Quote:
How much performance gain does one get in Einstein@Home when a quad core CPU is used in place of a dual core CPU?


Quad CPU means double performance.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2958312917
RAC: 713081

In theory, yes - as in, each

In theory, yes - as in, each WU takes the same amount of time, but you can do twice as many because you'll be running four at once.

But in practice, I find there are bottlenecks - probably with the FSB and memory access.

It depends on the application. The current 'relaxed' (i.e. non-optimised) Einstein app is quite good, but the optimised SETI apps get in each other's way badly. During the previous SETI run (neglecting the current changes to multibeam), I could do the shortest single WU in about 25 minutes. If I ran 8 of them at the same time (2 x Xeon 5320 quad processors), each WU took up to 40 minutes.

Akos will be able to advise us whether the next round of optimisations is likely to put extra strain on the memory bus: if it does, quad cores might be most efficient running a variety of projects, rather than every core running the same program.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7224194931
RAC: 1013337

I've got one E6600 Dual Core

I've got one E6600 Dual Core and one Q6600 Quad Core. Same generation of the Conreo chip, currently running at the same moderate overclocked speed (3.006 GHz).

For this configuration, for the current Einstein, and the current SETI application, I think I really do get double performance out of the Quad, in any proportion between SETI and Einstein from 0 to four.

While it is true that memory interface sharing implies some potential conflict, cache sharing implies some potential benefits, and the massive caches of the Conroe chips relative to the SETI working set mean that most folks who have checked have seen almost no performance impact of memory parameters, and negligible failure to get full ratio.

That result does not necessarily apply to other processors and other configurations. But I'll think you'd find a Q6600 at current prices a great buy and be very pleased with its performance. And if you are a serious overclocker, you'll wind up going faster than I am.

Future algorithm revisions on either SETI or Einstein could make my comments out-of-date. In a previous version of the project, Akos once released Einstein code which had the interesting consequence that my Gallatin actually ran less productively hyperthreaded than single-threaded. However I think the Conroe multi-processors are much more robust against such effects than the Gallatin implementation of hyperthreading is.

hotze33
hotze33
Joined: 10 Nov 04
Posts: 100
Credit: 368387400
RAC: 0

Hi, as long as there 4

Hi,
as long as there 4 independent task (like boinc), a quadcore is twice as fast as a dualcore.
If all 4 cores working on the same problem (like computational fluid dynamic,..) then the intel quadcore scales not so well. Typical you get x1.5 the performance and not 2. This will hopefully change with the new amd processors (*true quadcore*.

so long hotze

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 0

RE: Future algorithm

Message 71267 in response to message 71265

Quote:
Future algorithm revisions on either SETI or Einstein could make my comments out-of-date. In a previous version of the project, Akos once released Einstein code which had the interesting consequence that my Gallatin actually ran less productively hyperthreaded than single-threaded. However I think the Conroe multi-processors are much more robust against such effects than the Gallatin implementation of hyperthreading is.

That's because HT isn't dual core even though it can fake it. It relied on the fact that very few applications can actually keep work in half the cores execution units at any given time even when running at a 100% resource share. When a heavily optimized application is able to use the full CPU capacity there aren't any free operations to share to a second thread and instead all that you have happening is that you're hitting your cache twice as hard.

Dave Burbank
Dave Burbank
Joined: 30 Jan 06
Posts: 275
Credit: 1548376
RAC: 0

RE: RE: Future algorithm

Message 71268 in response to message 71267

Quote:
Quote:
Future algorithm revisions on either SETI or Einstein could make my comments out-of-date. In a previous version of the project, Akos once released Einstein code which had the interesting consequence that my Gallatin actually ran less productively hyperthreaded than single-threaded. However I think the Conroe multi-processors are much more robust against such effects than the Gallatin implementation of hyperthreading is.

That's because HT isn't dual core even though it can fake it. It relied on the fact that very few applications can actually keep work in half the cores execution units at any given time even when running at a 100% resource share. When a heavily optimized application is able to use the full CPU capacity there aren't any free operations to share to a second thread and instead all that you have happening is that you're hitting your cache twice as hard.

Interesting, I have a question sort of related to this, sorry if its a bit off topic.

Before the app was optimized last year it would still claim 100% CPU usage on my A64 3700+. But clearly it wasn't using all, truly 100% of the CPUs full potential.

Once the app was optimized it still claimed 100% CPU usage, but was doing the work in a fraction of the time.

So here's my question, is the CPU actually working harder (more transistors are firing), or is it just the code is more efficient and is using "faster" instructions...or something?

There are 10^11 stars in the galaxy. That used to be a huge number. But it's only a hundred billion. It's less than the national deficit! We used to call them astronomical numbers. Now we should call them economical numbers. - Richard Feynman

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7224194931
RAC: 1013337

RE: Before the app was

Message 71269 in response to message 71268

Quote:

Before the app was optimized last year it would still claim 100% CPU usage on my A64 3700+. But clearly it wasn't using all, truly 100% of the CPUs full potential.

Once the app was optimized it still claimed 100% CPU usage, but was doing the work in a fraction of the time.

So here's my question, is the CPU actually working harder (more transistors are firing), or is it just the code is more efficient and is using "faster" instructions...or something?


The CPU has multiple units, which can work simultaneously. The CPU usage logging does not account for them independently, but simply charges the current process one second per second of time.

So if more units are in simultaneous use, on average, most likely more power will be burned. If the usage is efficient, most likely more useful work is being done.

However it is entirely possible for a more efficient algorithm to get more work done per second _without_ using more power. Think of the slogan "work smarter, not harder" for a human analogy.

Dave Burbank
Dave Burbank
Joined: 30 Jan 06
Posts: 275
Credit: 1548376
RAC: 0

RE: The CPU has multiple

Message 71270 in response to message 71269

Quote:

The CPU has multiple units, which can work simultaneously. The CPU usage logging does not account for them independently, but simply charges the current process one second per second of time.

So if more units are in simultaneous use, on average, most likely more power will be burned. If the usage is efficient, most likely more useful work is being done.

However it is entirely possible for a more efficient algorithm to get more work done per second _without_ using more power. Think of the slogan "work smarter, not harder" for a human analogy.

So it's a bit of both, with efficient app design being the ideal goal.

So I guess compromises have to be made when when writing an app to work across many different platforms, or do what Akos did last year and optimize per CPU architecture.

Thanks

There are 10^11 stars in the galaxy. That used to be a huge number. But it's only a hundred billion. It's less than the national deficit! We used to call them astronomical numbers. Now we should call them economical numbers. - Richard Feynman

GoHack
GoHack
Joined: 2 Jun 05
Posts: 37
Credit: 20602963
RAC: 0

What about the bigger L2

What about the bigger L2 caches that are accompanying the Quad Cores? Can that be a feature that can be utilized?

Also could calculations be broken up, where by, each core takes a part of the calculation. On the machine language level, taking a 32 bit code, and breaking it into four, eight bit codes, for each core to work. I don't know if we are talking parallel core at this level.

Remember too, each core also has a Math-Coprocessor, which also could be doing calculations.

Having first learned how to program computers on the machine language level, then assembly, FORTAN, etc, w/very limited memory, I learned how to use everything very efficiently, squeezing it as much as possible. Now-a-days, I see programming as very sloppy, because of all the memory resourses available. Instead of removing unused code, they just leave it. I suppose it would take too much time to remove it. Programs, as well as operating systems get bloated, as well as slowed down.

adrianxw
adrianxw
Joined: 21 Feb 05
Posts: 242
Credit: 322654862
RAC: 0

I also programmed from the

I also programmed from the late '70's. Thing then was hardware was really expensive relative to staff, so yes, you squeezed everything to fit the machine and used an army of programmers to do it. Today, the situation is the reverse. If you need another GB of RAM, you put it in, if you need a new staff member, you have to think big money.

Today though, I program mostly deeply embedded systems often with very small microcontrollers, so again, it is a case of managing your resources efficiently.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.