Adventures with an Athlon 200GE processor + integrated Vega graphics

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 4,835
Credit: 28,398,992,931
RAC: 35,965,253
Topic 218466

Last night, I was browsing a couple of local online computer stores and ran across an interesting 'bundle' deal with a $AUD55 discount - Asus Prime A320M-E board + AMD 200GE processor + 8GB Crucial RAM for $AUD189 (equivalent to ~$US130).  I had seen favourable reviews about the processor a while ago and I was interested to see if the internal Vega graphics would crunch FGRPB1G tasks so I picked up a bundle this morning to see how it would go.

With a donated 500Gb ex Apple Mac failed drive that I 'refurbished', a 2006 vintage 250W Seasonic PSU (with a couple of new capacitors) and one of my favourite old style (ex-business PCs) desktop cases, I now have a completely operational system.  My Linux distro of choice had just released a new ISO with the latest kernel (5.0.2) so I thought I'd install that together with my home build of BOINC 7.14.2.

I also installed the extracted bits for legacy OpenCL from the Red Hat AMDGPU-PRO 18.30 package.  I do have the latest 18.50 version but I haven't got around yet to working out what has changed and how to incorporate the necessary mods for those changes into my home built OpenCL installation script, so the 18.30 version will have to do for the moment.

Seeing as I'd only ever used this procedure for providing OpenCL capability to Polaris GPUs, I thought the internal Vega GPU might be a no-op.  I was pleasantly surprised when it all fired up and started downloading some FGRPB1G tasks without a murmur of complaint.  I had enabled both that search and the O1OD1 search (I was hoping for some 'Engineering run' tasks) but only got FGRPB1G tasks (FGRP5 was disabled).

I then enabled the setting for 'non-preferred apps' to see if that would do the trick but instead I got the non-selected FGRP5 CPU tasks initially.  In a later work fetch I did get some GW engineering run so, until there is an official entry for that run, the non-preferred apps route seems to be the only option and even then you may get either O1OD1E or FGRP5 at the discretion of the servers.

The main reason for creating this thread was to report some performance numbers.  Firstly, I allowed a single FGRPB1G task to crunch to completion with nothing else running.  The crunch time was bang on 2 hrs which I thought was quite reasonable for an internal GPU.  I then allowed two tasks to co-exist, 1 CPU and 1 GPU.  The GPU task took 2 hrs 12 min (so a 12 min penalty) whilst the FGRP5 CPU task has now finished, taking 3 hrs 58 mins to get to the start of the follow-up stage and only a further 1 min 35 sec to full completion.  I was expecting a much longer follow-up stage than that :-).

Now that I also have some O1OD1E tasks, I've set up to run one of those with a FGRPB1G simultaneously from now on.  I may as well get similar information for that combination.  It's getting late in the day and I have two GW tasks with a much larger number of GPU tasks so this setup will continue through the night and I should have some results to add in the morning.  If anyone is interested, here is the details page for this host.  There are currently 4 returned tasks and a GPU task has validated - which is nice to see :-).

 

Cheers,
Gary.

koschi
koschi
Joined: 17 Mar 05
Posts: 83
Credit: 162,963,840
RAC: 618,466

Back in October last year I

Back in October last year I had a 2400G with 11 CUs that needed 40min for FGRP1G, no clue whether times back then and now are comparable though. Assuming they are somewhat in the same range, the 3 CU 200GE performs quite well!

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 4,835
Credit: 28,398,992,931
RAC: 35,965,253

Thanks for the info about the

Thanks for the info about the 2400G.  The current tasks are of the slowest variety anyway so your time from last year couldn't have been for anything even slower.  Most probably the results are directly comparable so I'm happy to see that the 200GE seems to do OK.

When paired with O1OD1E tasks, the GPU crunch time has improved from 2hr 12min to now be only about 2hr 4min and the first O1OD1E task took 8hr 26min.  After testing out the internal GPU, the plan was to see how an RX 570 would perform in this budget system.  I'll do some power measurements as is and later on with a beefier PSU and the RX 570.  Sooner or later I'll need to replace some very old 2008/9 systems as they fail and this combination seems like a reasonable replacement to have available.

 

Cheers,
Gary.

Van De Kaap
Van De Kaap
Joined: 4 Feb 17
Posts: 6
Credit: 3,495,720
RAC: 0

The internal GPU's were not

The internal GPU's were not made for crunching and will damage the CPU over time. There are loads of horror stories with people having done that mining crypto. 

Peter van Kalleveen
Peter van Kalleveen
Joined: 15 Jan 19
Posts: 20
Credit: 61,352,416
RAC: 774,038

I have to noticed on the

I have to noticed on the 2400g and 200ge that they seem to be quite efficient. Think they come a lot closer to there max theoretical flops performance than bigger systems. Very strange because I always thought memory speed was very important but ddr4 is no race horse compared to hbm

QuantumHelos
QuantumHelos
Joined: 5 Nov 17
Posts: 134
Credit: 39,348,130
RAC: 1

Interesting, i unfortunately

Interesting, i unfortunately bought a too old Athlon with no gpu but it does do linux nicely.

Peter van Kalleveen
Peter van Kalleveen
Joined: 15 Jan 19
Posts: 20
Credit: 61,352,416
RAC: 774,038

I destroyed my 2400g trying

I destroyed my 2400g trying to delid it without tools, clumpsy me.

But the new 3200g and 3400g have been partialy leaked online and seems that the vega gpu gets a hefty speed bump, so gonna buy one of those as soon as it is released.

the 2200 gpu clocked at 1000mhz, the new 3200 will clock at 1250mhz

Also the new 3000 mobility parts have 6mb L3 cache instead of 4mb previous gen, hopefully the desktop variants have this improvement as well.

Mad_Max
Mad_Max
Joined: 2 Jan 10
Posts: 135
Credit: 1,288,228,596
RAC: 699,448

I have to noticed on the

I have to noticed on the 2400g and 200ge that they seem to be quite efficient. Think they come a lot closer to there max theoretical flops performance than bigger systems. Very strange because I always thought memory speed was very important but ddr4 is no race horse compared to hbm

 

Yes, DDR4 of course is much slower compared to GDDR5 or HMB2. But if you think about proportions - integrated GPUs can actually have more RAM bandwidth on per core/compute unit basis. So can achieve real speed closer to theoretical peak.

For example:
RX 580 has 36 Compute Units and 256 bit GDDR5 running at 8 Ghz
this can give up to 256 GB/s total peak bandwidth of RAM and  7.1 GB/s per CU basis
VEGA 56 has 56 CU and 410 GB/s HBM2 RAM - about the same 7.3 GB/s per CU

Athlon 200ge have only dual channel DDR4. About 64*2/8*2.8 = 44.8 GB/s total peak bandwidth available.
But it also has only 3 CU (CU itself are pretty similar to vega/polaris CU). So 14.9 GB/s per CU
Its about double bandwidth per CU if no heavy task is running on CPU part. And still somewhat better if there is CPU load.

Also CPUs/APUs have much better (bigger and faster) RAM cache system compared to current GPU. About 2-4 times bigger and 1.5-2 times faster.

Peter van Kalleveen
Peter van Kalleveen
Joined: 15 Jan 19
Posts: 20
Credit: 61,352,416
RAC: 774,038

Awesome, thanks for you're

Awesome, thanks for you're detailed explanation and calculation. I had not thought about it in that context.

What do you mean with ram cache system at the end?

I'm fairly certain the GPU part does not have access to any of the level 1,2&3 caches of the CPU. Does the GPU part has its own extra large L2 cache compared to the graphics card version?

Wished that they did a trick like intel with iris Pro and integrate an hbm stack on chip as L4 for the top bin apuLaughing

Mad_Max
Mad_Max
Joined: 2 Jan 10
Posts: 135
Credit: 1,288,228,596
RAC: 699,448

AFAIK L1&L2 dedicated to

AFAIK L1&L2 dedicated to corresponding cores. And GPU part has own L1&L2 caches - same as discrete GPUs.
But L3 cache in Zen architecture is a shared cached and can be used by any core including GPU CUs via fast internal Infinity Fabric bus. Not sure if it actually work this way in current gen though.

Discrete GPUs do not have L3 at all.

As for integrated HM2 - such option actually exist but... not in AMD APUs yet.
Goggle for Kaby Lake-G: Intel CPU cores + Vega GPU CUs + HBM ram in single package

P.S.
Vega 8/11 in 2200G / 2400G APUs easily beats Iris Pro even without any additional caches/eDRAM with plain DDR4.
But Kaby Lake-G currently fastest integrated GPU.

Peter van Kalleveen
Peter van Kalleveen
Joined: 15 Jan 19
Posts: 20
Credit: 61,352,416
RAC: 774,038

Ah, I did not know the L3

Ah, I did not know the L3 cache in Zen is or can be shared with the GPU, although being victim cache only I don't know how much of an extra boost that will give.

I know, intel GPU's are terrible slow, even with the bells and whistles. I have the nuc with the iris pro 655, but its still far from the amd apu.

That said, architectural differences aside, even the iris pro 655 with 48 eu only is 384 streaming processors in comparison to amd that has 702 max. Intel’s gpu dies also have far les transistors then amd.

But still they are very weak.

The pain point with kaby lake G was that they used this whole advanced stacking method for the hbm to gpu, but connected the cpu and gpu over 8 normal pcie lanes. Thought this was really a lame method.

That said all amd’s image and video render options are fair superior to intel’s, so its more than just brute horsepower.

But would love to see amd with advanced stacking/packaging methods for an obviously more expensive apu.

Good news is that upcoming 3000 series apu’s will be solderd, just like ryzen. (that way I don’t have to destroy mine again delidding)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.