Another of Moore's miracles

Matt Giwer

Joined: 12 Dec 05

Posts: 144

Credit: 6891649

RAC: 0

14 Sep 2013 22:55:44 UTC

Topic 197194

(moderation:

)

After four years to the month I got around to upgrading.

2009 3.4GHz 4core $750 13.4core*GHz 1.8/$100
2013 3.8GHz 6core $350 22.8core*GHz 6.5/$100

If you have been putting off upgrading, STOP THAT!

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 575543573

RAC: 181561

Another of Moore's miracles

15 Sep 2013 10:51:16 UTC

Message 118707

(moderation:

)

I can only see Phenom II Quads in your profile, which are rather dated indeed. Being dated is not bad in itself, but the 45 nm process is showing its age regarding energy efficiency. Is the new one a FX "6-core"? If so be aware that it's actually more like a 3-core with HT for floating point workloads (like Einstein and most other BOINC projects), just with a bit better scaling than Intels HT.

MrS

Scanning for our furry friends since Jan 2002

Matt Giwer

Joined: 12 Dec 05

Posts: 144

Credit: 6891649

RAC: 0

RE: I can only see Phenom

21 Sep 2013 3:25:22 UTC

Message 118708 in response to message 118707

(moderation:

)

Quote:

I can only see Phenom II Quads in your profile, which are rather dated indeed. Being dated is not bad in itself, but the 45 nm process is showing its age regarding energy efficiency. Is the new one a FX "6-core"? If so be aware that it's actually more like a 3-core with HT for floating point workloads (like Einstein and most other BOINC projects), just with a bit better scaling than Intels HT.

MrS

todo list
1) update profile

2) post on mystery machine which arrived on the 19th

to wit 2) mystery machine, aka famous maker, upon installing linux it has all the internals of an HP and the all the rest matches their ENVY line save RAM is 6GB vice 10. That runs $750 on Amazon. Running some 30 hours and swap space is still unused so 6GB is not an issue. An older quad would have used a little bit of so so many having all three L cache levels works. That would make up a bit for shared floating point work.

I can't comment on throughput yet but top shows six instances of boinc running. It has the usual Radeon graphics card which switch out for a CUDA card once I get some idea of its throughput. That should take care of any floating point issues.

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 575543573

RAC: 181561

RE: ... switch out for a

21 Sep 2013 12:29:22 UTC

Message 118709 in response to message 118708

(moderation:

)

Quote:

... switch out for a CUDA card once I get some idea of its throughput. That should take care of any floating point issues.

Not really for BOINC. The GPU is just another processor which runs its own tasks, almost independent of the properties of the CPU, which itself just continues to crunch the same CPU tasks as it did before. Don't know about rendering software under linux which might use CUDA as coprocessor, though.

Anyway, your first task completed in 29ks, whereas your Phenom X4 965 takes 21 - 26ks for such ones. That's actually good - one module of your new FX is finshing a WU every 14.5ks, which is much faster than one core of the old Phenom. But it's 3 modules vs. 4 cores, so overall throughput should only be about 20% faster than that Phenom.

The module architecture is working really well, though. Which is not all that surprising, since Einstein used to scale very well with HyperThreading too. It's just that you need more modules ;)

BTW: I hope 350$ was not the price of the CPU but rather of some system or upgrade kit?

MrS

Scanning for our furry friends since Jan 2002

Matt Giwer

Joined: 12 Dec 05

Posts: 144

Credit: 6891649

RAC: 0

RE: RE: ... switch out

22 Sep 2013 5:21:31 UTC

Message 118710 in response to message 118709

(moderation:

)

Quote:

Quote:
... switch out for a CUDA card once I get some idea of its throughput. That should take care of any floating point issues.

Not really for BOINC. The GPU is just another processor which runs its own tasks, almost independent of the properties of the CPU, which itself just continues to crunch the same CPU tasks as it did before. Don't know about rendering software under linux which might use CUDA as coprocessor, though.

Anyway, your first task completed in 29ks, whereas your Phenom X4 965 takes 21 - 26ks for such ones. That's actually good - one module of your new FX is finshing a WU every 14.5ks, which is much faster than one core of the old Phenom. But it's 3 modules vs. 4 cores, so overall throughput should only be about 20% faster than that Phenom.

Overhead is always a bitch to estimate which is what lead to the CUDA performance question. I figure generally there is about 10% overhead on multicores, 3.6 cores. Where I was able to test with POVRAY with their release of identical but multi-threaded version the throughput was like 2.6 cores.

Quote:

The module architecture is working really well, though. Which is not all that surprising, since Einstein used to scale very well with HyperThreading too. It's just that you need more modules ;)

I quit following architecture at least a decade ago. Best I could tell was they were faking it to make their specs look better than the other guy. But once you put a computer around the CPU damn near anything can screw it all up. Anyway I defaulted to clock*cores is good enough for government work.

Quote:

BTW: I hope 350$ was not the price of the CPU but rather of some system or upgrade kit?

MrS

Yes, price of the whole system. And despite the ad it does have 10GB RAM. Usually these bargain prices have been refurbed by switching out failed components with lesser valued ones or at times half cannibalized. This looks like the entire process was covering the HP logo, disposing of all the printed material and sticking it in a brown box. Found it on Woot. The WiFi was beyond flaky but adding a USB stick has been rock solid. That is about par for bargains. Of course keyboards and rats are the usual bottom of the line in all of them.

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c03639067&prodSeriesId=5330773

If it continues to perform I'll pick up another if I can find one.

I have been thinking of the CUDA as an MPU from the good old days when AMD wiped the floor with Intel. Under that assumption the project app just directs the math to the CUDA. It was transparent in the MPU days as it was handled by the CPU.

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 575543573

RAC: 181561

POVRay should sclae quite

22 Sep 2013 20:10:51 UTC

Message 118711 in response to message 118710

(moderation:

)

POVRay should sclae quite well, ass rendering always does, but is not very indicative of any other real world performance.

Not caring too much about CPU architecture can save you quite some time ;)
But regarding AMDs current CPUs I feel it is worth keeping in mind. Their current CPUs are just not as capable as if they put 6 or 8 or the older cores onto a chip. And never were they intended to do so in floating point apps, so it's not a bad thing.. but people have to keep their expectations in check or be disappointed by a closer look at the achieved throughput.
It's the same as not calling an Intel Quad with HT and 8-core.

If you can and want to I'd suggest going straight for the 8-"core" instead of another 6'er. They don't cost much more, especially once you factor mainboard etc. in.

Quote:

Under that assumption the project app just directs the math to the CUDA.

No, It's a really an additional app which runs along the regular CPU tasks and runs almost entirely on the GPU. Run 2 of these per GPU on all but the most high-end cards for maximum throughput and almost perfect GPU utilization.

MrS

Scanning for our furry friends since Jan 2002

Matt Giwer

Joined: 12 Dec 05

Posts: 144

Credit: 6891649

RAC: 0

RE: POVRay should sclae

23 Sep 2013 5:54:02 UTC

Message 118712 in response to message 118711

(moderation:

)

Quote:

POVRay should sclae quite well, ass rendering always does, but is not very indicative of any other real world performance.

The problem is how to test. It is hard to find apps whose only difference is threading has been added. POVRay had one RC that did only that. But given the limit in things that can be multi-threaded in any app 2.6 out of 4 isn't too shabby. Running on the same machine is the only way to get an "all else being equal" situation.

The limitation is always the nature of the app even in the best of cases. Three short calculations while waiting for one long one to complete doesn't buy much performance improvement. The major beneficiaries are people who use Windows where the OS eats up about 10% no matter what tasks are running. Just a second core does that.

In this Intel has always struck me as a benchmark test oriented design. I have had both Intel and AMD and never noticed a dime's worth of difference in performance but about $100 in purchase price. And last I looked at Intel's "8" the real throughput was more like 6 anyway. You need a specialized benchmark to see more than that. Again if you run a server farm those specialized cases are probably your bread and butter. There seems to be a competition on how many scripts can be stuck into a single web page.

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 575543573

RAC: 181561

RE: The problem is how to

24 Sep 2013 18:46:43 UTC

Message 118713 in response to message 118712

(moderation:

)

Quote:

The problem is how to test.

And why would you want to? since, as you correctly say, the result wouldn't apply to any other program anyway.

Quote:

In this Intel has always struck me as a benchmark test oriented design. I have had both Intel and AMD and never noticed a dime's worth of difference in performance but about $100 in purchase price. And last I looked at Intel's "8" the real throughput was more like 6 anyway.

Call them "efficiency oriented". This includes building efficient products, as well as making and selling them efficiently if they can get away with their prices, they'll continue to charge them.

And with Intels "8" you probabbly mean a quad core with HT. Which Intel calls exactly that, not an 8-core. And if it performs like a regular 6-core that is actually brilliant, since adding HT only costs ~5% more die space per core. So it's practically free performance :)

Or do you mean the big server CPUs?

MrS

Scanning for our furry friends since Jan 2002

mmstick

Joined: 6 Jun 12

Posts: 14

Credit: 2066411

RAC: 0

RE: I can only see Phenom

21 Oct 2013 8:23:10 UTC

Message 118714 in response to message 118707

(moderation:

)

Quote:

I can only see Phenom II Quads in your profile, which are rather dated indeed. Being dated is not bad in itself, but the 45 nm process is showing its age regarding energy efficiency. Is the new one a FX "6-core"? If so be aware that it's actually more like a 3-core with HT for floating point workloads (like Einstein and most other BOINC projects), just with a bit better scaling than Intels HT.

MrS

I wouldn't really call it HT or anything alike, as technically it is still 8 cores. The FPUs are double the size of the FPUs from Phenom II, as they are actually like two Phenom II FPUs merged into one larger one which can either execute one large 256-bit instruction or two small 128-bit instructions simultaneously. The integer units are the same size as Phenom IIs, so having 8 of them is better than having 4.

The real problem with FX lies in the fact that it only has one decode unit per module, some scheduling inefficiencies, and cache issues were mostly fixed with 2nd gen FX (Piledrivers). The third gen FX is supposed to fix the decode thing as it's adding two decode units per module, fixing the scheduling, and improving upon the cache further from 2nd gen, which could raise performance significantly for single and multithreaded tasks.

The second real problem is the effects of the Windows/Intel alliance, dirty Intel compiler tricks, and horribly coded and outdated Microsoft software. I find on Linux, AMD CPUs can compete with much more expensive Intel CPUs quite well as it's a much more fair playing ground.

However, 2nd gen FX processors actually have the same performance as Phenom II cores at the same frequency due to the cache improvements alone, but they can clock much higher than Phenom IIs as a bonus.

I'm more interested in 4th gen FX in 2015 though, since that's when they will use their 'High Density Library' which will net as much benefit as an entire node shrink, as well as a new node shrink (if 3rd gen doesn't come with a new node shrink). It should be feasible to see 16-32 core desktop CPUs, or really powerful AMD APUs which will be utilizing OpenCL 2.0, which might be their ultimate goal here.

Another of Moore's miracles

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner