Are Bulldozers crunching here?

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 433,656,908
RAC: 11,670
Topic 196091

If so, could one post real performance facts here, please?

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 427,429,026
RAC: 134,722

Are Bulldozers crunching here?

From WUProp I get the following:

FX8150
GRPS: 7.6h
GW: 4.8h

FX8120
GRPS: 9.3h
GW: 4.7h

FX6100
GRPS: 8.3h
GW: 4.85h

However, we don't know at which clock speeds these guys run, how many samples are involved and if they run with CMT enabled or not. Using only one core in each module would improve times per tasks greatly, but reduce overall throughput.. greatly, I guess. For fp workloads it's very similar to Intels HT, nevermind the AMD marketing. It's only for integer tasks that the little kiddies are real cores.

Would be nice if someone else could add more precise data!

MrS

Scanning for our furry friends since Jan 2002

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,516
Credit: 485,631,016
RAC: 1,782

Hi! Here's one 8150 which

Hi!

Here's one 8150 which I found by randomly searching in BOINCstats host rankings:

http://einsteinathome.org/host/4242382/tasks&offset=0&show_names=0&state=3

CU
HBE

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 433,656,908
RAC: 11,670

THX for your help, it's

THX for your help, it's apreciated.
Looks like my 4Core Phenom without L3-cache is faster ..

Will wait for the Ivy-Bridge then.

Alexander

Jonatan
Jonatan
Joined: 20 Jun 10
Posts: 66
Credit: 25,768,353
RAC: 1

I read the notice of AMD

I read the notice of AMD which presents the new processors of AMD, the FX series, sincerely, I prefer in high ranges the Intel procesors, but I don´t discuss that FX series of AMD are very competitive in frequency and price...

Looking the features of Fx, I prefer Intel Core i7 2600 3.4Ghz than AMD 8150 fx. But we have to wait and see behavior crunching...

Besides, Intel has the "king" of desktop procesor: Intel Core i7-3960X Extreme 3.3Ghz...For me, the dream for crunch

What´s your opinion?

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 427,429,026
RAC: 134,722

Your Athlon II Phenom needs

Your Athlon II Phenom needs about 18 ks per S6GW-WU . The linked Bully needs about 22.5 ks per WU. So it does take longer and it's only got as many FPUs as your chip (4). But it actually crunches 8 of these WUs concurrently, so the overall throughput is much higher. It very likely also eats more power while doing so.. but dicussing this would be a moot point without knowing even the clock speeds.

However, Sandy Bridge is still the king of Einstein... by far. My i7 2600K at 4.0 GHz slices through these WUs in about 13 ks. 8 at a time.

MrS

Scanning for our furry friends since Jan 2002

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4,527,270
RAC: 0

RE: So it does take longer

Quote:
So it does take longer and it's only got as many FPUs as your chip (4).


The Bulldozer modules have extended FPUs, so they
can execute two x87 and SSEx instructions in parallel.
But the result depends on the code, because of some
strong limitations.

Jonatan
Jonatan
Joined: 20 Jun 10
Posts: 66
Credit: 25,768,353
RAC: 1

Here is the diference, about

Here is the diference, about powerfull:

Only with a core dos duo, time for computing one WU of Gravitational Wave S6 GC search v1.01 (SSE2)

CPU time core2 duo (sec): 16,834.21

The new processor of AMD, FX 8150, time to computing the same WU:

CPU time (sec): 21,697.20

The data source is here :

http://einsteinathome.org/workunit/110473224

Dave
Dave
Joined: 17 Jan 09
Posts: 11
Credit: 1,127,822
RAC: 0

I'll give the "more precise

I'll give the "more precise data" a try.

But 1st, I'd like to state the obvious: crunch times on almost ANY machine w/o a GPU will be slower than almost any machine with a GPU. It's the apples & oranges thing. Likewise with GPUs. Usually a higher numerical model will outperform a lower numerical model. But not always because the nVidia cards come in a wide range of cycle times from the factory. I have an MSI GTX460N Hawk that came overclocked at 780 mhz. If I recall correctly the base cycle time is somewhere around 650 mhz. A slower machine with a very fast GPU will outperform a very fast machine w/o a GPU or with an older slower GPU.

On 12/2 I bought an MSI 990FXA-GD65 socket AM3+ m/b and 8GB of Kingston DDR3-1333 memory. My intention was to put my 3 year old AMD Phenom II X3 720 into it and replace the 720 chip with an FX-6100 in early 2012. I had been running the 720 in an ASUS M3N78-VM m/b for 3 years. It's base speed was 2.8 ghz but I ran it at 3.4 gz on an OEM AMD 2-heatpipe air cooler that came with an AMD Athlon 6000+ X2 125W chip. The 720 is a 95W chip so the cooler worked really well. I am using it to cool my FX-6100 which is also a 95W chip. I discovered the new m/b made the 720 perform about 10% better even though I could only oc the 720 to 3255 mhz. I ran that configuration until 12/7 when I decided to upgrade immediately. I just couldn't wait and did not believe the reviews could really be so bad. So I brought home the FX-6100 and 8 GB of Corsair DDR3-2000 memory. Installed those items & fired it up. Ran fine 1st time but it felt a little sluggish at its stock 3.3 ghz. Did some testing and it really was about 7% - 10% slower than my 720. So I enabled Overclocking Genie which dialed it up to 3712 mhz. That felt much more normal. Timing runs showed it to be ALMOST, but not quite, as fast as my 720. BUT, I could run all 6 cores at that speed AND a GPU w/u at the same time without any noticeable degradation. If I ran a full load on the 720 and a GPU w/u as well, the response at the keyboard would be noticeably slower. However, now I feel almost no degradation at the keyboard while on the internet, doing e-mail or anything else while running a full load. I've decided that is a huge factor for me in a machines overall performance. So, I made a note of the setting from OC Genie and then turned it off and went to manual overclocking. I have run it as high as 4.2 ghz but it had a hard lock coincident with my furnace turning on and the lights dimming. Looks like I might need a UPS. So I backed it down to ONLY 4.0 ghz. And it runs at only 125 degrees under a full load of 6 CPU w/us and a GPU w/u while I do other things. This chip REALLY likes to be overclocked. Remember that DDR3-1333 memory I mentioned? While working out the new installation kinks I noticed memory was running at 1680 mhz for awhile instead of its stock 1066 mhz. Amazing!! My final choice for memory was 8 GB of Corsair Vengeance DDR3-1866. I decided the new memory overclocks so wildly that there was no reason to go with memory rated faster than the 1866 mhz the FX chip calls for.

Now, about some runtime data I've collected in the past few days on the FX-6100 running at various speeds from 3712 mhz to 4.0 ghz. I run 3 projects: Seti, Milkyway and Einstein.

SETI: Last night I let it crunch all night for a change. 206 units in the range of 3:04 to 3:09 with a few longer ones (about 5) ranging from 14:17 to 14:21. These were all GPU units.

Milkway: On 12/7, late in the day after initial setup, I ran 5 Milkyway units on the CPUs as well as a GPU unit while the machine was clocked at 3712 mhz by OC Genie. These units ran in the range of 2h 41m 35s to 2h 44m 43s. The 3 in the middle ran in 2h 42m 29s, 2h 42m 36s and 2h 42m 56s. I stopped a 6th unit for some reason and deleted it. Then I set my Milkyway preferences to not accept any more CPU units. Just had to try them out once. I've noticed that if I am running GPU units in the range of 3m 5s they seem to increase by about 5 to 10 seconds if I am doing something else that produces a lot of CPU activity. By this I mean I might also be running a test on a Prime Numbers program I wrote a long time ago and use for timing tests. With only 1 core running along with a single GPU unit, my CPU temp hovers between 104 & 107 degrees F. I like that a lot.

Einstein: While the machine was clocked at 3712 mhz by OC Genie, I ran 4 GPU units that turned in times ranging from 39m 36s to 39m 54s while I worked elsewhere. Then we went shopping and it continued to crunch away on the rest of those units and the uninterrupted range was 36m 25s to 36m 38s. So doing something else slowed the crunch time by about 10%. But I couldn't feel it at the keyboard. Running a similar batch of GPU units at 4.2 ghz drove the times down into the range of 35m 04s to 35m 11s. I also ran a single Gamma Ray Burst unit that took 7h 50m 48s. I aborted the rest of the GRB units that downloaded. I seem to recall doing that a long time back also.

UBUNTU 11.10: REALLY likes the overclocked FX-6100. I didn't really time anything specific but I noticed that it takes about 1/2 as long from when I press a button to start a process to when it is finished filling the screen and is waiting for me. It used to feel lazy and sluggish, now it is brisk.

My final thoughts about computing these days: I was a custom software designer, builder and maintainer for mainframes and micros for 25 years, self employed. During that time I dropped my mainframe business and concentrated on micro applications exclusively in the mid '80s. I started building my own hardware back when the 386-16s were out. I've had both Intel & AMD. My last Intel machine was a 486 DX-50 and then I switched to AMD and have been here ever since. No special reason except I've learned AMD and feel comfortable with them. Sort of like owning GM or Ford or Chrysler cars all ones life. The chips these days seem to have hit a ceiling. So performance must be gained elsewhere for now: the GPU, multi-core, memory & electronic improvements. So I've decided in my mind to stop hunting for the last mhz of CPU clocking and instead look at a machine for it's value to me and what I do. Running these distributed processing systems is a way of using my vastly under-utilized machine to help the research community discover things that are of interest to me in hopes of reading about their findings later on. So I'm comfortable with my new FX-6100 for now, at least until the next "new & improved" model is released. I'm waiting for the software to catch up to me now.

I hope this helps and I did not mean to step on anyone toes.

Dave

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,281
Credit: 150,204,889
RAC: 125,267

RE: The chips these days

Quote:
The chips these days seem to have hit a ceiling. So performance must be gained elsewhere for now: the GPU, multi-core, memory & electronic improvements.


I agree. My take : there's only so much functionality that one can stuff on a single chip and yet still retain some element of choice in the behaviour of sub-components. So with an 'open' architecture it's a matter of mix'n'match to a specification to a purpose. So if your leaning is towards 'fast' then that really means a fast system without too much thumb twiddling by one part awaiting another. Hence a ( very ) old joke about putting a V8 motor in a Mini-Minor : it works, but not for long ...... :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 427,429,026
RAC: 134,722

Thanks for taking the time to

Thanks for taking the time to write all of this down, Dave.

So now we've got:

FX6100 (from WUProp)
GRPS: 8.3h

FX6100 ((Dave)
GRPS: 7.85h

We know Dave was running at 3.71 GHz. Stock clock for that chip is 3.30 GHz. Assuming perfect scaling, starting from Daves number, would lead to an expected 8.8 h. Assuming 3.6 GHz on the other chips (1st turbo) would lead to an expected 8.1 h.

So it seems like who ever is running there has an average clock speed between these two steps.

For comparison: my i7 4.0 GHz with HT on and DDR3 1866 needs 5.7 h for these GRPS tasks. That means the Bully is only a factor of 1.28 slower per clock, per logical core. We've seen worse than this ;)

@Akos: I know you're the low-level stuff guy.. but didn't the Athlon (since XP) always feature 2 independent FP execution untis? Like e.g. shown here. The 1st is for FADD and SSE and the 2nd one for FMUL and SSE. Together with the FStore, of course and the usual 3-wide decode and dispatch.

Dave wrote:
But 1st, I'd like to state the obvious: crunch times on almost ANY machine w/o a GPU will be slower than almost any machine with a GPU.

Yeah, that's why I left out the BRP searches in my comparison.

MrS

Scanning for our furry friends since Jan 2002

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.