Are Bulldozers crunching here?

Akos Fekete

Joined: 13 Nov 05

Posts: 561

Credit: 4527270

RAC: 0

Hi, RE: but didn't

15 Dec 2011 13:01:31 UTC

Message 107940 in response to message 107939

(moderation:

)

Hi,

Quote:

but didn't the Athlon (since XP) always feature 2 independent FP execution untis? The 1st is for FADD and SSE and the 2nd one for FMUL and SSE. Together with the FStore, of course and the usual 3-wide decode and dispatch.

You are right. The FADD + FMUL + FSTORE is the conventional
solution. But the Bulldozer consists of 2 FMAC instead of it.
That offers some advantages and lots of disadvantages...

Probably, one of the advantages can be important in our case.
The double-FMAC can execute 2 FADD or 2 FMUL simultaneously.
( I didn't do any measurements on its performance effect. )

ML1

Joined: 20 Feb 05

Posts: 347

Credit: 86563414

RAC: 823

RE: RE: but didn't the

16 Dec 2011 1:17:16 UTC

Message 107941 in response to message 107940

(moderation:

)

Quote:

Quote:
but didn't the Athlon (since XP) always feature 2 independent FP execution untis? The 1st is for FADD and SSE and the 2nd one for FMUL and SSE. Together with the FStore, of course and the usual 3-wide decode and dispatch.

You are right. The FADD + FMUL + FSTORE is the conventional
solution. But the Bulldozer consists of 2 FMAC instead of it.
That offers some advantages and lots of disadvantages...

Probably, one of the advantages can be important in our case.
The double-FMAC can execute 2 FADD or 2 FMUL simultaneously.
( I didn't do any measurements on its performance effect. )

Is a Bulldozer code optimisation worthwhile?

Or should development effort be better concentrated to better utilise GPUs?

Slight aside: How do the AMD "APU"s compare? Are the APU extras easily utilised?

Happy fast crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3562358667

RAC: 0

RE: Or should development

16 Dec 2011 1:53:55 UTC

Message 107942 in response to message 107941

(moderation:

)

Quote:

Or should development effort be better concentrated to better utilise GPUs?

Slight aside: How do the AMD "APU"s compare? Are the APU extras easily utilised?

The APU is a standard AMD(ATI) GPU welded onto the side of the CPU.

The S6GC application would need a major redesign to work on a GPU. GPUs only work well for applications that only need a tiny amount of memory per thread; or which have extremely sequential memory access patterns. S6GC uses a lot of memory and accesses it in a way that was sufficiently random that back when CUDA was new, shiney, and nVidia was using its engineers to help create flagship apps for promotional purposes they couldn't get it running any faster than on a CPU. Adding a bigger GPU just resulted in more of the chip stuck waiting for memory reads/writes to go through.

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 577510215

RAC: 196478

RE: Is a Bulldozer code

16 Dec 2011 22:46:07 UTC

Message 107943 in response to message 107941

(moderation:

)

Quote:

Is a Bulldozer code optimisation worthwhile?

I don't think so. This core versus module issue is not something the actual single threaded app "sees". That's a matter entirely for the OS task scheduler, and maybe hand tuned multi-threaded apps.

What the project may be able to do is to recompile with some kind of Bully optimization (if there is anything like that), or maybe a separate code path for some hot loop. But I doubt there would be much to gain. AVX support might help (and also the Intels), but I've asked about this some time ago and apparently the potentail gain in E@H doesn't look too tempting.

MrS

Scanning for our furry friends since Jan 2002

trastikata

Joined: 10 Dec 11

Posts: 1

Credit: 5279979

RAC: 0

RE: If so, could one post

2 Jan 2012 0:27:20 UTC

Message 107944

(moderation:

)

Quote:

If so, could one post real performance facts here, please?

Hi, this is my profile FX-6100 and GTX 550 Ti. Most of the time I am running other tasks on my Desktop too.

http://einsteinathome.org/account/143295/computers

joe areeda

Joined: 13 Dec 10

Posts: 285

Credit: 320378898

RAC: 0

RE: Hi, this is my profile

2 Jan 2012 4:23:42 UTC

Message 107945 in response to message 107944

(moderation:

)

Quote:

Hi, this is my profile FX-6100 and GTX 550 Ti. Most of the time I am running other tasks on my Desktop too.

http://einsteinathome.org/account/143295/computers

Thank you!

Here's the details of your Bulldozer/GTX560TI [Win7_64] compared to my Phenom II x6/GT240 [Ubuntu 11.04_64] (http://einsteinathome.org/host/3805542 and an I7 2600K/GTX560 HT 8 tasks at a time [Ubuntu 11.10_64] (http://einsteinathome.org/host/4237123)

Phenom Bench Marks:

Measured floating point speed 2,480.14 million ops/sec
Measured integer speed 14,953.99 million ops/sec

Bulldozer Bench Marks:

Measured floating point speed 2,418.34 million ops/sec
Measured integer speed 9,494.01 million ops/sec

Intel I7 Bench Marks:

Measured floating point speed 3,261.51 million ops/sec
Measured integer speed 12,522.66 million ops/sec

Task times (most recent 5 tasks):

Binary Radio Pulsar Search (Arecibo) v1.00 (BRP3cuda32).
  
Phenom:
CPU Sec:    896.36,   893.43,   855.47,   865.10,   878.09
Run time: 6,750.72, 6,733.69, 6,764.34, 6,828.89, 6,864.85

Bulldozer:
CPU Sec: 1,060.73, 1,080.85, 1,071.34, 1,065.27, 1,064.07
Run time: 2,854.81, 2,896.77, 3,192.96, 2,849.72, 2,866.32

I7:
CPU Sec: 577.31, 577.10, 576.26, 573.68, 571.94
Run time: 3,179.28, 3,184.89, 3,177.90, 3,162.97, 3,161.87

Gravitational Wave S6 GC search v1.01 (SSE2):

Phenom:
CPU Sec: 20,100.59, 20,281.95, 20,169.62, 20,160.31, 20,118.88
Run time: 21,658.95, 21,554.15, 21,706.22, 21,504.95, 21,571.40

Bulldozer:
CPU Sec: 22,693.31, 22,834.50, 22,800.11, 22,874.71, 22,944.72
Run time: 24,903.33, 25,173.36, 25,157.99, 25,231.34, 25,608.83

I7:
CPU Sec: 18,002.43, 18,000.19, 18,125.64, 18,080.39, 18,058.62
Run time: 20,944.07, 20,967.78, 20,928.44, 20,673.16, 20,614.86

I'll leave the interpretation of these numbers as an exercise for the reader.

Joe

ps. I hate trying to get columns to line up in bb-code.

Gundolf Jahn

Joined: 1 Mar 05

Posts: 1079

Credit: 341280

RAC: 0

RE: [pre]Phenom Bench

2 Jan 2012 10:24:04 UTC

Message 107946 in response to message 107945

(moderation:

)

Quote:

[pre]Phenom Bench Marks:

Measured floating point speed 2,480.14 million ops/sec
Measured integer speed 14,953.99 million ops/sec

Bulldozer Bench Marks:

Measured floating point speed 2,418.34 million ops/sec
Measured integer speed 9,494.01 million ops/sec

Intel I7 Bench Marks:

Measured floating point speed 3,261.51 million ops/sec
Measured integer speed 12,522.66 million ops/sec
[/pre]

Task times (most recent 5 tasks):

[pre]Binary Radio Pulsar Search (Arecibo) v1.00 (BRP3cuda32).

Phenom:
CPU Sec: 896.36, 893.43, 855.47, 865.10, 878.09
Run time: 6,750.72, 6,733.69, 6,764.34, 6,828.89, 6,864.85

Bulldozer:
CPU Sec: 1,060.73, 1,080.85, 1,071.34, 1,065.27, 1,064.07
Run time: 2,854.81, 2,896.77, 3,192.96, 2,849.72, 2,866.32

I7:
CPU Sec: 577.31, 577.10, 576.26, 573.68, 571.94
Run time: 3,179.28, 3,184.89, 3,177.90, 3,162.97, 3,161.87

Gravitational Wave S6 GC search v1.01 (SSE2):

Phenom:
CPU Sec: 20,100.59, 20,281.95, 20,169.62, 20,160.31, 20,118.88
Run time: 21,658.95, 21,554.15, 21,706.22, 21,504.95, 21,571.40

Bulldozer:
CPU Sec: 22,693.31, 22,834.50, 22,800.11, 22,874.71, 22,944.72
Run time: 24,903.33, 25,173.36, 25,157.99, 25,231.34, 25,608.83

I7:
CPU Sec: 18,002.43, 18,000.19, 18,125.64, 18,080.39, 18,058.62
Run time: 20,944.07, 20,967.78, 20,928.44, 20,673.16, 20,614.86
[/pre]

I'll leave the interpretation of these numbers as an exercise for the reader.

Joe

ps. I hate trying to get columns to line up in bb-code.

On some project sites, the [pre][/pre] tags actually work. ;-)

GruÃŸ,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

joe areeda

Joined: 13 Dec 10

Posts: 285

Credit: 320378898

RAC: 0

RE: On some project sites,

2 Jan 2012 19:10:33 UTC

Message 107947 in response to message 107946

(moderation:

)

Quote:

On some project sites, the [pre][/pre] tags actually work. ;-)

GruÃŸ,
Gundolf

Thank you.

Too late to edit this one, I'll TRY to remember that incantation for the next one.

BTW do we have a list of BB tags that work on this site?

Joe

Gundolf Jahn

Joined: 1 Mar 05

Posts: 1079

Credit: 341280

RAC: 0

RE: BTW do we have a list

2 Jan 2012 20:39:43 UTC

Message 107948 in response to message 107947

(moderation:

)

Quote:

BTW do we have a list of BB tags that work on this site?

Most of them do work on most sites. It's just the [pre][/pre] tag that doesn't on some sites.

GruÃŸ,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

paul milton

Joined: 16 Sep 05

Posts: 329

Credit: 35825044

RAC: 0

RE: BTW do we have a list

3 Jan 2012 2:21:20 UTC

Message 107949 in response to message 107947

(moderation:

)

Quote:

BTW do we have a list of BB tags that work on this site?

Joe

upper left of the post window under "message", if i recall right some one had to point it out to me as well :)

Use BBCode tags to format your text

seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.

Are Bulldozers crunching here?

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner