Are Bulldozers crunching here?

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4,527,270
RAC: 0

Hi, RE: but didn't

Hi,

Quote:
but didn't the Athlon (since XP) always feature 2 independent FP execution untis? The 1st is for FADD and SSE and the 2nd one for FMUL and SSE. Together with the FStore, of course and the usual 3-wide decode and dispatch.


You are right. The FADD + FMUL + FSTORE is the conventional
solution. But the Bulldozer consists of 2 FMAC instead of it.
That offers some advantages and lots of disadvantages...

Probably, one of the advantages can be important in our case.
The double-FMAC can execute 2 FADD or 2 FMUL simultaneously.
( I didn't do any measurements on its performance effect. )

ML1
ML1
Joined: 20 Feb 05
Posts: 340
Credit: 76,652,931
RAC: 44,233

RE: RE: but didn't the

Quote:
Quote:
but didn't the Athlon (since XP) always feature 2 independent FP execution untis? The 1st is for FADD and SSE and the 2nd one for FMUL and SSE. Together with the FStore, of course and the usual 3-wide decode and dispatch.

You are right. The FADD + FMUL + FSTORE is the conventional
solution. But the Bulldozer consists of 2 FMAC instead of it.
That offers some advantages and lots of disadvantages...

Probably, one of the advantages can be important in our case.
The double-FMAC can execute 2 FADD or 2 FMUL simultaneously.
( I didn't do any measurements on its performance effect. )


Is a Bulldozer code optimisation worthwhile?

Or should development effort be better concentrated to better utilise GPUs?

Slight aside: How do the AMD "APU"s compare? Are the APU extras easily utilised?

Happy fast crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1,359
Credit: 2,929,188,194
RAC: 2,986,531

RE: Or should development

Quote:

Or should development effort be better concentrated to better utilise GPUs?

Slight aside: How do the AMD "APU"s compare? Are the APU extras easily utilised?

The APU is a standard AMD(ATI) GPU welded onto the side of the CPU.

The S6GC application would need a major redesign to work on a GPU. GPUs only work well for applications that only need a tiny amount of memory per thread; or which have extremely sequential memory access patterns. S6GC uses a lot of memory and accesses it in a way that was sufficiently random that back when CUDA was new, shiney, and nVidia was using its engineers to help create flagship apps for promotional purposes they couldn't get it running any faster than on a CPU. Adding a bigger GPU just resulted in more of the chip stuck waiting for memory reads/writes to go through.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 427,431,026
RAC: 3,716

RE: Is a Bulldozer code

Quote:
Is a Bulldozer code optimisation worthwhile?

I don't think so. This core versus module issue is not something the actual single threaded app "sees". That's a matter entirely for the OS task scheduler, and maybe hand tuned multi-threaded apps.

What the project may be able to do is to recompile with some kind of Bully optimization (if there is anything like that), or maybe a separate code path for some hot loop. But I doubt there would be much to gain. AVX support might help (and also the Intels), but I've asked about this some time ago and apparently the potentail gain in E@H doesn't look too tempting.

MrS

Scanning for our furry friends since Jan 2002

trastikata
trastikata
Joined: 10 Dec 11
Posts: 1
Credit: 5,279,979
RAC: 0

RE: If so, could one post

Quote:
If so, could one post real performance facts here, please?

Hi, this is my profile FX-6100 and GTX 550 Ti. Most of the time I am running other tasks on my Desktop too.

http://einsteinathome.org/account/143295/computers

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320,378,898
RAC: 0

RE: Hi, this is my profile

Quote:

Hi, this is my profile FX-6100 and GTX 550 Ti. Most of the time I am running other tasks on my Desktop too.

http://einsteinathome.org/account/143295/computers

Thank you!

Here's the details of your Bulldozer/GTX560TI [Win7_64] compared to my Phenom II x6/GT240 [Ubuntu 11.04_64] (http://einsteinathome.org/host/3805542 and an I7 2600K/GTX560 HT 8 tasks at a time [Ubuntu 11.10_64] (http://einsteinathome.org/host/4237123)

Phenom Bench Marks:

Measured floating point speed 2,480.14 million ops/sec
Measured integer speed 14,953.99 million ops/sec

Bulldozer Bench Marks:

Measured floating point speed 2,418.34 million ops/sec
Measured integer speed 9,494.01 million ops/sec

Intel I7 Bench Marks:

Measured floating point speed 3,261.51 million ops/sec
Measured integer speed 12,522.66 million ops/sec

Task times (most recent 5 tasks):

Binary Radio Pulsar Search (Arecibo) v1.00 (BRP3cuda32).
  
Phenom:
CPU Sec:    896.36,   893.43,   855.47,   865.10,   878.09
Run time: 6,750.72, 6,733.69, 6,764.34, 6,828.89, 6,864.85

Bulldozer:
CPU Sec: 1,060.73, 1,080.85, 1,071.34, 1,065.27, 1,064.07
Run time: 2,854.81, 2,896.77, 3,192.96, 2,849.72, 2,866.32

I7:
CPU Sec: 577.31, 577.10, 576.26, 573.68, 571.94
Run time: 3,179.28, 3,184.89, 3,177.90, 3,162.97, 3,161.87

Gravitational Wave S6 GC search v1.01 (SSE2):

Phenom:
CPU Sec: 20,100.59, 20,281.95, 20,169.62, 20,160.31, 20,118.88
Run time: 21,658.95, 21,554.15, 21,706.22, 21,504.95, 21,571.40

Bulldozer:
CPU Sec: 22,693.31, 22,834.50, 22,800.11, 22,874.71, 22,944.72
Run time: 24,903.33, 25,173.36, 25,157.99, 25,231.34, 25,608.83

I7:
CPU Sec: 18,002.43, 18,000.19, 18,125.64, 18,080.39, 18,058.62
Run time: 20,944.07, 20,967.78, 20,928.44, 20,673.16, 20,614.86

I'll leave the interpretation of these numbers as an exercise for the reader.

Joe

ps. I hate trying to get columns to line up in bb-code.

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1,079
Credit: 341,280
RAC: 0

RE: [pre]Phenom Bench

Quote:

[pre]Phenom Bench Marks:

Measured floating point speed 2,480.14 million ops/sec
Measured integer speed 14,953.99 million ops/sec

Bulldozer Bench Marks:

Measured floating point speed 2,418.34 million ops/sec
Measured integer speed 9,494.01 million ops/sec

Intel I7 Bench Marks:

Measured floating point speed 3,261.51 million ops/sec
Measured integer speed 12,522.66 million ops/sec
[/pre]

Task times (most recent 5 tasks):

[pre]Binary Radio Pulsar Search (Arecibo) v1.00 (BRP3cuda32).

Phenom:
CPU Sec: 896.36, 893.43, 855.47, 865.10, 878.09
Run time: 6,750.72, 6,733.69, 6,764.34, 6,828.89, 6,864.85

Bulldozer:
CPU Sec: 1,060.73, 1,080.85, 1,071.34, 1,065.27, 1,064.07
Run time: 2,854.81, 2,896.77, 3,192.96, 2,849.72, 2,866.32

I7:
CPU Sec: 577.31, 577.10, 576.26, 573.68, 571.94
Run time: 3,179.28, 3,184.89, 3,177.90, 3,162.97, 3,161.87

Gravitational Wave S6 GC search v1.01 (SSE2):

Phenom:
CPU Sec: 20,100.59, 20,281.95, 20,169.62, 20,160.31, 20,118.88
Run time: 21,658.95, 21,554.15, 21,706.22, 21,504.95, 21,571.40

Bulldozer:
CPU Sec: 22,693.31, 22,834.50, 22,800.11, 22,874.71, 22,944.72
Run time: 24,903.33, 25,173.36, 25,157.99, 25,231.34, 25,608.83

I7:
CPU Sec: 18,002.43, 18,000.19, 18,125.64, 18,080.39, 18,058.62
Run time: 20,944.07, 20,967.78, 20,928.44, 20,673.16, 20,614.86
[/pre]

I'll leave the interpretation of these numbers as an exercise for the reader.

Joe

ps. I hate trying to get columns to line up in bb-code.


On some project sites, the [pre][/pre] tags actually work. ;-)

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320,378,898
RAC: 0

RE: On some project sites,

Quote:


On some project sites, the [pre][/pre] tags actually work. ;-)

Gruß,
Gundolf


Thank you.

Too late to edit this one, I'll TRY to remember that incantation for the next one.

BTW do we have a list of BB tags that work on this site?

Joe

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1,079
Credit: 341,280
RAC: 0

RE: BTW do we have a list

Quote:
BTW do we have a list of BB tags that work on this site?


Most of them do work on most sites. It's just the [pre][/pre] tag that doesn't on some sites.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

paul milton
paul milton
Joined: 16 Sep 05
Posts: 329
Credit: 35,825,044
RAC: 0

RE: BTW do we have a list

Quote:

BTW do we have a list of BB tags that work on this site?

Joe

upper left of the post window under "message", if i recall right some one had to point it out to me as well :)

Use BBCode tags to format your text

seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.