GPU Crunching - What setup works 'best', what does it cost, what comparison metric is best to use?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117243748693
RAC: 36169650
Topic 198031

This is a topic for your thoughts about this and for the discussion of presented ideas. Please feel free to have your say, either positive or negative. If the opinion is to proceed and some guidelines can be agreed on, there would be a followup thread for presentation of any data people wish to share.

From time to time in the past, there have been various attempts to come up with some sort of a metric for working out what hardware performs 'best' - for various definitions of 'best' :-). Without putting too much thought into it, I'm wondering if comparisons should be based on TCO - total cost of ownership in achieving some sort of output, perhaps based on awarded credit for a particular science run - the new BRP6 run when the beta app becomes standard would be ideal. I'd certainly be interested in what others think.

To me, TCO should include the capital cost of the hardware spread over some agreed upon 'expected' life. At the end of that period, the residual value could be estimated, but I think it would be simpler to assume nil residual value. This cost should only include items essential for crunching. For example, you don't need a $1K monitor - in fact you don't really need a monitor at all - so let's just ignore such items. For things like cases, once again you don't absolutely need one and there is such a wide range of potential cost, much of which is for cosmetic purposes, that the simplest thing is either to leave it out or just include a standard base amount for a simple no-frills case.

My own preference would probably be to assume the hardware cost is just coming from the motherboard, CPU, RAM, GPU, HDD and PSU, with the PSU cost being that of say an 80+ certified PSU that will power your particular hardware mix - eg higher cost for high end (or multiple) GPUs. It would also be my preference to assume that such hardware could have a 5 year life with a residual value of zero.

The other essential component of TCO would be the running cost. To do it really properly, this would include power, maintenance, and environmental factors like aircon costs or heating savings but my preference would be just to look at power consumption by the machine. The only way to do that properly is to measure it at the wall when the machine is running at full load.

As I said, I haven't put much thought into this so there could be lots of things I'm overlooking. I have lots of data on what my hosts are producing and I do own a power meter so it wouldn't be too hard to work things out, once the methodology is agreed on. I guess I'm asking if anyone else would be interested, not just in my results, but in actually contributing data as well?

If so, what do you think the ground-rules should be?

Cheers,
Gary.

Phil
Phil
Joined: 8 Jun 14
Posts: 579
Credit: 228493502
RAC: 0

GPU Crunching - What setup works 'best', what does it cost, what

@Gary

I would be highly interested in such a study, although my time is limited. I was just about finished picking out new components for some new crunchers when beta 1.52 came out and removed the pci bottleneck.

I was supposed to post some results for my machines doing 1.52 but have not got to it yet, here is some basic info.

E5400 cpu with an Nvidia 750 is now doing just shy of 40k rac. Up from 23k.
i3 cpu with Nvidia 760 is doing low 70k rac. Up from 50k.

Ground rules I think should be basically what you have already stated, with one modification. Count only those components that affect crunching. While a PSU does affect TCO, I would be more inclined to treat it like a case. More of a support item. Just my two cents. That being said, I'll go with whatever rules are defined.

I'm also researching some solar to offset electrical costs, but it will be next year before I can do anything on that front. The latter half of this year I will be in training which is a bit of a pay cut until I'm done, so some things are going on the back burner.

I think it would be very cool, especially for newcomers, to be able to look up some builds with known results, knowing what they will get in return.

Phil

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7212434931
RAC: 964357

RE: If so, what do you

Quote:
If so, what do you think the ground-rules should be?


It is an interesting idea. Better yet, it might actually be useful to some people.

To get reasonable participation, the rules need to be pretty simple, at the admitted cost of making the comparisons less perfect.

While I am hugely supportive of including lifetime power cost in cost considerations (I know this does not surprise you) there is a really basic obstacle to combining power with all other costs into a single number--the Einstein user base faces an extremely wide range of personal power costs. To those who live in "utility-supplied" apartments, or who plug into corporate power, the power cost may be deemed zero--ignoring green arguments. To our participants living with German power prices, the cost will seem much higher than people using cheap hydro in the pacific NW part of the USA, and so on.

The problem is so fundamental that at the risk of losing simplicity I suggest you actually split the cost number into two pieces--everything else, valued at acquisition cost per unit Einstein productivity over life, and power consumption per unit Einstein productivity (in power units, not in currency). This lets people apply their own local estimated price for power and their own guess at operating life to get a total cost of ownership.

Nothing wrong with suggesting that people posting rigs who care to post their assumed power rate and lifetime and then combine the two base numbers to get a total cost productivity over lifetime as you originally suggested, but I suggest that the focus should be on the two component numbers which have broadest meaning acquisition price and power cost, and separately.

Regarding some of the other problems--as this seems mainly aimed at builders, the question of what to assume is available or reused is a bit vexed. I think the guiding principle should be to include those items most sensible users would buy afresh for a new build, and leave out items which are truly optional, or quite likely to be salvaged from recently retired builds.

Based on my personal recent experience, and my guess as to the situation of others, my suggestion deviates slightly from yours.

I'd include the price of CPU, RAM, motherboard, GPU, boot drive adequate to run the system.

I'd not include case, power supply, extra fans or other cooling hardware, OS or other licenses, monitor, keyboard, optical drive or anything else people choose to include but which is an accessory from the point of view of running Einstein.

Price source is a nice question. If the goal is to guide others, it is pointless to celebrate one's personal shopping victories. Instead a readily obtained but competitive price should be used--even though not the actual price paid by the person posting the information. Speaking from a US perspective, the two pricing sources which currently seem most appropriate to me for this purpose are Newegg, or Amazon. Both have very wide selection, and for current hardware are usually representative of a current competitive price. I'd suggest that just one be chosen--perhaps one of these two, perhaps somewhere else.

One advantage of splitting power cost from the hardware cost is that it would allow you to do away with the problematic assumed lifetime, and just use acquisition Einstein productivity per unit acquisition cost. Personally, I think 5 years is on the long side. If you made me choose something without deep thought, and wanted to stick with an assumed life, I'd suggest three years. I personally have two perfectly good GTX460 cards sitting on my floor, pushed out by the superior power productivity of GTX 660 cards which replaced them, with the nail in their coffin coming from the greatly superior power productivity of the Maxwell generation cards. If Nvidia gets to the TSMC finFET processes anytime soon, even the current Maxwells may look hard to justify running in well under five years since purchase.

[Totally off-topic--while I've had an intention to eBay my 460s, I've procrastinated endlessly. If someone here wants to use them for Einstein, make me a (low) offer by personal message. Probably I'll accept for not much above shipping cost from New Mexico, USA. More details on request]

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1885
Credit: 1396581460
RAC: 1125220

Depends on what is considered

Depends on what is considered to be "cheap" as far as power used running these Einstein GPU tasks.

I live in the NW of Washington State and it costs me $75 a month just to run my GPU Einstein tasks which adds up when you do it several years non-stop.

$900 a year is quite a bit to pretty much everyone here if they have to pay it.

Phil
Phil
Joined: 8 Jun 14
Posts: 579
Credit: 228493502
RAC: 0

@Gary I urge you to

@Gary

I urge you to consider moving the study/builds conversation to a new thread. :-)

Phil

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117243748693
RAC: 36169650

RE: I urge you to consider

Quote:

I urge you to consider moving the study/builds conversation to a new thread. :-)


You are quite right - I should have been more thoughtful about how I started this. Hopefully all fixed now - apologies to all who find their contributions not quite exactly where they left them :-).

Cheers,
Gary.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RE: I think it would be

Quote:

I think it would be very cool, especially for newcomers, to be able to look up some builds with known results, knowing what they will get in return.

My 0.02 GBP... or about 2300 BRP6 cobblestones at current power costs.

I do agree, some time back a league table of BRP4 times by GPU was published, and i found that useful.

Things it lacked were a link to the computer / host which generated the result, and date.

I don´t know if it is possible, but it would be very useful to be able to search http://einstein.phys.uwm.edu/results.php for all validated tasks for a specific application, matching some user parameters such as CPU type, Coprocessors, Operating System etc and then show some sort of basic statistic summary and then allow the user to find the hosts with lowest time.

Of course it gets muddled with running other projects and tasks, and running x1 to x2 but at least you can find ¨similar hosts¨.

This then would be up to date with the application versions, and new better hardware will become automatically visible, and old hardware eventually disappear.

This would also help the large numbers of CPU only crunchers.

The major cost for most folks will be incremental power usage (from idle running) and obtaining this wattage information is non-trivial, but valuable.

Measuring at the wall is good, but the PSU efficiency can vary by I guess 20% or so, if power usage recorded the PSU model and efficiency should be noted, and wattage recorded at idle and running.

I´m not so sure about recording monetary values.

Hardware costs in the IT equipment world, will have a very short lifetime of usefulness, not to mention changing exchange rates.

Likewise, power supply costs are variable, from place to place and from time to time.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117243748693
RAC: 36169650

Peter, Thank you very much

Peter,

Thank you very much for taking the trouble to give such a detailed response. It's much appreciated.

Quote:
... there is a really basic obstacle to combining power with all other costs into a single number ...


I entirely agree with all your points about power "cost" and am very happy to see kWH as the unit rather than actual $$. If people want to give themselves a potential fright, they can quite simply multiply the units used by the local cost of a unit :-).

Quote:
The problem is so fundamental that at the risk of losing simplicity I suggest you actually split the cost number into two pieces--everything else, valued at acquisition cost per unit Einstein productivity over life, and power consumption per unit Einstein productivity (in power units, not in currency).


I agree, and I would consider there's little risk of losing any simplicity.

Quote:
... I think the guiding principle should be to include those items most sensible users would buy afresh for a new build, and leave out items which are truly optional, or quite likely to be salvaged from recently retired builds ... I'd include the price of CPU, RAM, motherboard, GPU, boot drive adequate to run the system ...


I would have no problem agreeing with that list of components and ignoring all else.

Quote:
... a readily obtained but competitive price should be used ...


Absolutely -- and I would have no trouble with using newegg for all the reasons you suggest. There would probably be higher costs to many others who don't live in or have easy access to US prices but that is quite unimportant from a comparative point of view. Chances are that whatever the 'premium' for a particular product and location, it would be the same for competing products so that the comparison or ranking of different products should be largely unaffected. Newegg appeals to me because I have often browsed it and like its layout. It also avoids fluctuating conversion rates for other currencies if a contributor wanted to use local product prices.

Quote:
One advantage of splitting power cost from the hardware cost is that it would allow you to do away with the problematic assumed lifetime, and just use acquisition Einstein productivity per unit acquisition cost.


Indeed!!

Quote:
Personally, I think 5 years is on the long side.


It's certainly too long for calculating data about current, state-of-the-art hardware. I guess I was thinking more about the situation where you want to understand what it's going to cost you if you decided to keep running the same hardware for as long as reasonably possible. I have plenty of examples of 6 year old hardware still running, such as Q8400 CPUs and AMD HD4850 GPUs, the latter on Milkyway. When purchased, I hoped they would last 5 years, and so they did :-). However, I agree that the best course is to remove lifetime altogether by calculating a productivity figure (credits per day??) you will achieve for this hardware cost plus that power consumption (kWH per day). Is this the sort of thing you have in mind?

It strikes me that this would also suit 'upgraders' equally as well as 'builders'. It should be possible to fairly accurately determine the 'cost/benefit' for adding/upgrading using a particular type of GPU. You would just need to measure power consumption and calculate credits per day, both with and without the GPU. This is more what I've had in mind for a while now. I've quite often seen examples of machines that were built for standard 'office' type use, using the onboard GPU or a relatively low end external card like the GT 600/700 series, when the cost increment in going to a low to medium 'GTX' version is quite modest. I presume there will be a GTX950 in the wings somewhere (or even the GTX750) which might be an ideal upgrade target for someone who is crunching CPU tasks and GPU tasks (very slowly) on a standard 'office' type machine with something like a GT630.

Once again, thanks very much for sharing your thoughts.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117243748693
RAC: 36169650

Thanks very much for taking

Thanks very much for taking the time to send a response. Lots of good points for discussion.

Quote:
I don´t know if it is possible, but it would be very useful to be able to search http://einstein.phys.uwm.edu/results.php for all validated tasks for a specific application, matching some user parameters such as CPU type, Coprocessors, Operating System etc and then show some sort of basic statistic summary and then allow the user to find the hosts with lowest time.


I would be surprised if that were even partially possible, even in the latest iterations of the server software. With Einstein running on a customised old version, you would need to convince the Devs to invest more time in further customisations. I don't know how difficult the coding would be - I'm not a programmer - but even if it were a simple job, there are other issues. The 'retention time' in the online database is quite short so you would get a short duration snapshot of the current situation. The search required to extract the data matching the suggested parameters and covering all the hosts, sounds like something computationally expensive so there are potential 'extra hardware required' issues as well. It certainly would be nice to have so maybe the BOINC Devs could be convinced to have something like this on their 'To Do' list, if it's not already there :-).

Quote:
The major cost for most folks will be incremental power usage (from idle running) and obtaining this wattage information is non-trivial, but valuable.


It's reasonably trivial, since adequate power meters exist at quite low cost. I would image you would just measure (with no GPU) the power draw at idle and again at full CPU crunching load. Then add the GPU and measure the same again, with full load being both CPU tasks and GPU tasks - of course having started the exercise with a PSU quite capable of powering the lot.

Quote:
Measuring at the wall is good, but the PSU efficiency can vary by I guess 20% or so, if power usage recorded the PSU model and efficiency should be noted, and wattage recorded at idle and running.


I'm not sure this would be much of an issue, even if the PSU efficiency wasn't disclosed. People who choose their PSU deliberately would be highly likely to know its efficiency profile. Stock standard 'office' type machines, either a 'name brand' unit or something assembled from parts by a reasonably tech savvy system builder, would likely contain a PSU approaching (if not already exceeding) 80% at the midpoint of its power range. So, where the PSU model is not disclosed, an assumption of say 78% efficiency is likely to be quite conservative. In any case, using power at the wall just ensures that a worst case scenario is reported and that someone going for 80+ platinum is going to have a lower power draw than the reported value.

Quote:

I´m not so sure about recording monetary values.

Hardware costs in the IT equipment world, will have a very short lifetime of usefulness, not to mention changing exchange rates.

Likewise, power supply costs are variable, from place to place and from time to time.


As long as the date and the price from a reputable (disclosed identity) seller is used, recording the capital cost at the time the data is being published should be mandatory. People can then make their own adjustments for price variations over time. Stating the kWH consumption per day avoids the variable power prices. You can't really rank different manufacturers and models if you don't include the current typical dollar costs. If you avoid 'new product' premiums and EOL discounts, there is probably a middle period where prices are relatively static for a while.

Again, thanks very much for your thoughts.

Cheers,
Gary.

mikey
mikey
Joined: 22 Jan 05
Posts: 12657
Credit: 1839054099
RAC: 4412

RE: As long as the date

Quote:

As long as the date and the price from a reputable (disclosed identity) seller is used, recording the capital cost at the time the data is being published should be mandatory. People can then make their own adjustments for price variations over time. Stating the kWH consumption per day avoids the variable power prices. You can't really rank different manufacturers and models if you don't include the current typical dollar costs. If you avoid 'new product' premiums and EOL discounts, there is probably a middle period where prices are relatively static for a while.

I have been following this and think there are ALOT of variables that make it hard to come up with a definitive table. ie which gpu card maker, which ram maker, the speed of the ram it is actually running at, the Operating System, etc, etc. YES I think a chart could be helpful, as in buying this gpu over that gpu will give you more bang for you buck, or buying more than this type of ram is not cost effective, but it would only apply to Einstein and the current workunits. The next version of the application software could optimize this or that, throwing a wrench into the efficiency numbers.

At PrimeGrid for instance the faster the ram the faster you crunch, also at PG an Intel cpu is much faster than an AMD cpu, since alot of people run multiple projects your data would be useful but only be specific to Einstein. At some projects under-clocking the gpu is better, at others over-clocking is better.

The one other wrench is the other software people run on their pc's, yes you Gary have a bunch of Boinc only machines, as do I, but not everyone does, that is another variable to consider when deciding efficiency of a component. Does running MS Outlook in the background all day long slow down the crunching more on a GTX-760 then it does on a GTX-960? How about when running the speakers that are playing your favorite music? What if you using Pandora to do it?

YES your chart could be a GREAT guideline, but there are sooo many variables that it can't be the final answer. It's alot like this chart I use:
Nvidia 760-yes 1152
Nvidia 770-yes 1536
Nvidia 780-yes 2880
Nvidia 790-yes 3072
Nvidia 960-yes 1024
Nvidia 970-yes 1664
Nvidia 980-yes 2048
ATI 4650 AGP-no 320
ATI 4670-no 320
ATI 5550-no 320
AMD 5770-no 800
AMD 5870-yes 1600

It tells me that an Nvidia 770 is faster than an Nvidia 760, but is it really, as there are many more factors than Shaders or Stream Processors to crunching. YOUR chart would help sort those things out. BTW the 'yes' and 'no' is whether it has dual precision or not.

Anonymous

RE: .... Please feel free

Quote:
.... Please feel free to have your say, either positive or negative.

OK. When someone new to distributing computing joins a project I believe their primary consideration is in putting together a machine that can generate the maximum RAC. I know that we say we aren't interested in credits and that we really want to find a solution for the particular project but really, let us be honest. Power consumption is a consideration but again does not fit the "primary consideration".

We should establish some basic operating parameters so that we are as close to comparing apples to apples. For example:

1. a dedicated machine crunching E@H WUs only (no time slicing between multiple projects) - can be a mix of E@H CPU/GPU WUs.
2. this machine runs 24 hours a day.
3. any operating system is acceptable
4. single GPU

The data to be displayed can be similar to the following:

[pre]
Motherboard CPU Memory GPU/memory OS Boinc Version Avg. Credit Concurrent GPUs Concurrent CPUs

Asus Z87-A I7-4770 31gig AMD Radeon Ubuntu 7.2.42 110,427 5 4
HD 7850/7870
2048MB

Asus P8Z77-V I7-3770 15gig NVIDIA GeForce Ubuntu 7.2.42 80,497 4 8
GTX 770
(2047MB)
[/pre]

The above is taken from my "hosts" page with additional information provided. Now which configuration would you choose based upon primary consideration?

This gives a fairly straight forward look at GPU performance. Now if power consumption/heat factors into a member's environment then other considerations would need to be factored in as to choice of hardware. Also pay attention to case size so that you can handle the larger GPUs.

There are many other factors to consider but if we make the thread too complicated then you will loose readers and possibly crunchers.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.