To celebrate my host reaching the Top 10 (named users) for the first time, I wanted to share a budget build guide. Many of the top hosts here at E@H are quite impressive machines, boasting a fleet of very expensive GPUs, and some of them would cost more than my car to build. I wanted to start contributing in a more substantial manner to E@H, but as a graduate student I do not have the means to build such a machine. Yet, I have managed to (however briefly) get a spot on the Top 10. So here is my guide to building the most bang-for-your-buck E@H computer.
Using a second hand website like Craigslist/Ebay/Kijiji etc,
1.) Find an old workstation PC with as many PCIe slots as possible, preferably dual socket.
2.) Find the highest core count CPUs that exist for that platform.
3.) At least 32-64GB of memory, the speed is not so important.
4.) Get a bunch of natively single-slot GPUs
The hardware does not need to be new. My version of this has hardware over 10 years old that still works great. The point of this build is to not spend a lot of money, so GPU waterblocks/riser cables/mining rigs are to be avoided, hence the single slot GPUs. In fact, you might be surprised that 2 single slot cards often have better performance and less power draw than a double slot card. I will assume this to be a dedicated E@H machine, so we will aim to use all of the CPUs and GPUs we can muster. Right now, workstations based on the LGA2011 Intel Xeon socket are pretty cheap. Examples of workstation platforms for LGA 2011-v3 socket:
HP Z840 (dual socket LGA2011, 5 directly useable PCIe, one half length PCIe, one 1x PCIe)
Lenovo P710 (dual socket LGA2011, 5 directly useable PCIe, one PCI slot)
Dell T7810 (dual socket LGA2011, 4 directly useable PCIe, one PCI slot, one 1x PCIe)
Some of these PCIe slots are 16x, some 8x, some even 4x, but that hardly matters to us as the All-Sky GW apps don't really care about PCIe bandwidth, and workstations usually have open-ended slots, so any card will theoretically fit. Even 1x slots can be used with the addition of a 1x to 16x PCIe mining riser, assuming you can fit that in the case somewhere, but that might be pushing the limit on PCIe bandwidth. PCI to PCIe 1x adapters also exist. Sometimes a full length GPU won't fit due to offset CPU sockets, so keep that in mind. You also need to be very careful that the power supply that comes in a system can indeed handle the total power draw of all the devices you plan to use.
The best CPU you can find for this socket is the Intel Xeon E5 2699A v4, which is easy to find second hand for a reasonable price in my experience, since they were used in datacenters years ago. With dual sockets, this gives us 44 total cores to feed the GPU apps as well as Gamma Ray CPU tasks.
As for the GPUs, Nvidia has a line of workstation grade single-slotable GPUs, the Quadro X4000 series. These are the Quadro M4000, P4000, RTX 4000, and A4000, etc. The highest I would dare to go on this list from a price standpoint is the P4000. Due to GPU boost 3.0, the P4000 gets a core clock uplift of 2.5x from the M4000, so it makes a lot of sense not to go lower on the list either.
I spent a while shopping around and bartering, and with quite a bit of luck, here is what I picked up:
1.) An HP Z840 workstation (enough DDR4 memory included) ($750 CAD, ~$550 USD)
2.) Intel Xeon E5-2699a v4 CPUs ($175 CAD, ~130 USD) each (knocked off $30CAD by trading the CPU the workstation came with)
3.) 5 Nvidia Quadro P4000 GPUs ($100CAD, ~$70USD) each
Together with some accessories like some PCIe 6pin to dual PCIe 6pin adapters I spent around $1650 CAD or around $1200 USD on my machine. In this HP Z840 chassis, 5 of these GPUs fit in the 3 16x and 2 8x PCIe slots, and the power supply can handle all of them and more according to the spec sheet.
My HP Z840 came with an Nvidia K620 GPU that I have fit in the top half length slot for display output (which is why BOINC reports 6 GPUs). It can't do much with the 2GB of VRAM except the BRP7 app. I find that I can keep all of the temps below 70C at less than 100% fans. It pulls a total of about 800 Watts during crunching, good thing I run it on university electricity ;)
I did get lucky with the seller on the P4000 GPUs, who accepted my offer of a bulk discount. These GPUs can be found for $200-300 CAD on ebay, so your milage may vary. I currently run this machine with 3x All-Sky GW tasks each (the most tasks I can fit on the 8GB of VRAM), which uses 15 of the cores and the rest are dedicated to Gamma-Ray CPU tasks. I find that running as many GPU tasks at once is crucial to overcome the CPU bottleneck in the Recalc region of the app, so that the GPUs stay almost always busy. If you want to replicate this build, be sure you get a CPU that can at least handle the GPU tasks you plan to run, as those have the best credit. Make sure to use Linux so that you get the CUDA version of the app. I haven't experimented with the GPU-only CUDA version yet.
Do you have any better ideas or suggestions? Feel free to discuss your budget builds!
Copyright © 2024 Einstein@Home. All rights reserved.
I will caution something that
)
I will caution something that burned me with my build. If you use an HP z840 workstation, DO NOT attempt to use the bottom PCIe 1x slot while all of the other slots are populated. You WILL destroy your motherboard. This happened to me, and I had to get another one. I have no idea why this happens, but it is likely related to the fact that this is the only slot that is wired though the chipset when 2 CPUs are installed.
taketwicedailey wrote: To
)
Thanks for sharing. Very impressive. During the time that E@H still supported external GPUs connected to Intel Macs, my budget uber-cruncher consisted of a hex-core i7 MacMini connected to four AMD RX 6800 XT eGPUs via Thunderbolt 3. Because I always have several eGPU enclosures lying around, the cost of replicating that setup today would be about $2100. My best recollection is that rig had an E@H RAC of about 3 million credits a day. Unfortunately, it made my office too hot (~30 C) to be habitable. Worse yet, the office next door was controlled by the thermostat in MY office, so my colleague was freezing. So, my boss made me dismantle that setup.
My fastest rig now consists of a 2019 MacPro with an internal AMD RX 580X MPX, two internal AMD RX 6800 XTs, and an AMD RX 6800 XT in an eGPU enclosure. My Amicable Numbers at Home RAC is approximately 4.5 million credits a day. Better yet, it produces a fraction of the heat of the uber MacMini setup. Because I already had some RX 6800 XTs sitting around, this setup cost me only $1900 (for the 2019 MacPro with 3.2 GHz 16-core Xeon W, 96 GB RAM, 2 TB SSD, and RX 580X MPX).
"I was born in a small town, and I live in a small town." - John Mellencamp
Single slot GPU's. So I could
)
Single slot GPU's. So I could get 7 p4000's into my AsRock Epycd8 server motherboard.
You reported 3 tasks per GPU. So I could gain 1 task total and if I sold the 4 Titan V's I might gain a couple hundred bucks back. If I could replicate your bargain basement p4000 pricing.
It's an intriguing idea. I would also have make a guestimate about the level of production of each card. They would pretty much need to exceed 1.1M RAC per card to make the change over a break even deal in production.
Something to ponder.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
One potential speedup is
)
One potential speedup is limit your total CPU load to no more than 50 percent of your available threads.
Another is try Nvidia's MPS server.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
My rig with 5 p4000's is
)
My rig with 5 p4000's is settling at 4M RAC, so not quite your cutoff, but closeish.
I do have the CPU threads
)
I do have the CPU threads fixed to 50%.
I have tried MPS server, but I have run into some issues running it. I can't seem to get all 5 GPUs to like it, and then I get errors. Ultimately Ian&Steve said that it isn't that great on Pascal anyway.
it should be fine on pascal,
)
it should be fine on pascal, it's just "better" on Volta and newer.
are you setting CUDA_VISIBLE_DEVICES to all 5 GPUs?
a simple MPS setup script for your 5x P4000s (assuming they are the only GPUs in the system) could be:
this will configure MPS to use all 5 GPUs
sets the active thread percentage to 40%
starts MPS
you can play with the active thread percentage to suit your needs. I run 40% with 5x tasks per GPU on the 1.08(1.15) app. but if you're only running 1x task per GPU, there's no benefit to running MPS. 70% could be appropriate when running 2 or 3x per GPU. it just really depends, and you'll have to play with it.
also be cautious with MPS settings with the 1.14 (GPU recalc) app, it has a higher percentage of invalid results with certain MPS configurations. but the 1.08/1.15 app with CPU recalc basically has no invalids.
_________________________________________________________________________
Yeah, my experience is that 4
)
Yeah, my experience is that 4 out of 5 GPUs act normally, but a 5th one "crashed". I had to completely reinstall drivers and some linux utilities. I think this has something to do with the fact that running nvidia-smi returns a warning:
WARNING: infoROM is corrupted at GPU 0000:04:00.0
I read about some solutions to this. I have another GPU I could swap with this one, I just haven't gotten around to it.
Had similar way for only GPU
)
Had similar way for only GPU E@h:
Running it 24/7...as a main PC & workstation for BOINC.
non-profit org. Play4Life in Zagreb, Croatia, EU
Found a p4000 on bid, low
)
Found a p4000 on bid, low enough to take a shot at it.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!