Most RAC for least electricity

tictoc
tictoc
Joined: 1 Jan 13
Posts: 33
Credit: 6020523566
RAC: 5083349

One of my Radeon VIIs is

One of my Radeon VIIs is power capped at 180W.  Actual power usage is ≈160W-170W.  Daily average ppd on that GPU is ≈1.7M ppd.  That's within range of the TitanV, and it still has some headroom for increased clocks.  GPU is in a machine with 3 other Radeon VIIs, so it would take a bit of work to get exact numbers.

 

I haven't properly profiled or tuned that card for Einstein, but it can be very efficient running Einstein if the power is dialed back from the default limit. 

Exard3k
Exard3k
Joined: 25 Jul 21
Posts: 66
Credit: 56155179
RAC: 0

When talking about CPU RAC

When talking about CPU RAC (although their RAC is minor in comparison), either the mobile CPUs or the very high end Threadripper/EPYC have best compute per watt. Ryzen's new Eco mode is very good in competing with these two classes, but in general you want 20-30W CPUs or 64 core CPUs.

I heard Intels nextgen desktop lineup features some 35W variants that are worth a look when considering compute per watt. 80W less thermals to worry about in your case might allow for an extra GPU without major investments into airflow/alternative cooling.

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 216
Credit: 8430436878
RAC: 1814746

Good afternoon! I am also new

Good afternoon! I am also new to all of this but wanted to post here since it was asked by someone. I also just posted in the WCG forum looking for advice on their work units and I will also ask the same questions. Here is some background: 

I am a high school science teacher at a public school. Over a year ago, I had a crazy idea to bring a "slice of supercomputing" into my high school for students and teachers to use. I knew that I would never be able to actually set up an honest "supercomputer" that is massively parallel, but I believed that I could bring in some of the same hardware that could actually be found in various high-end computers/supercomputers. Students could utilize the workstation for ANY academic purpose (they can schedule times on it for AI, coding, design work, etc) and teachers could remote login during the day and use it for any classroom purpose. Free access for all.

I did not know how much money I could raise for this project, but we ended up being extremely successful (but I want the project to keep expanding!). The workstation is now fully operational and now we are working to implement it fully for student use. We are also putting in a 220v line to maximize electrical efficiency and trying our best to come up with a great cooling solution (it has to be in a secured case).

My goal of this project is to ALWAYS have this workstation benefitting someone, with an obvious focus on student/teacher use. That being said, it is too great of a tool to sit idle when not in use. Because I am a science teacher, I was already aware of WCG. I have formally received the "okay" to implement this software in classrooms in my school with a focus on the science of the projects and the computing aspects of it.

Although I have tested the workstation on the WCG, I am now working to "fine tune" the performance. We want to maximize the science being completed. Because of the lack of GPU work on WCG, we are also participating in EAH. 

Here is my question- what advice can you all offer to maximize science output of this new workstation? This workstation has only one purpose- benefit the community and world. So, how can we best do that? Here are the specs:

Chassis: Dell Precision Tower 7920
CPU: DUAL Intel Xeon Gold 6258R 2.7GHz,(4.0GHz Turbo, 28 core) = 56 cores, 112 threads
RAM: 512GB DDR4 2933MHz ECC
GPU: DUAL Nvidia RTX A6000, 48GB each, soon NVLinked = 96GB
Storage: Four (4) M.2 1TB PCIe NVMe Class 50 Solid State Drive in RAID 10 along with a storage array that features ten (10) 1.92TB SATA AG Enterprise Solid State Drives in RAID 10
Display: 8K, 32 inches

I know some of you might wonder why I went Intel- we did not have much of a choice when it came to allowable options.

Thank you for any advice you can offer. We also have two other workstations with RTX 6000s in them. I am comfortably running 2 work units per GPU on all of them. The A6000s get really, really warm (hot) with 3 or more. I have tried it and have seen great results but it's just too hot. As I stated, I am working on a cooling solution but probably won't have it for a month or two. 

It's a monster. Let's do science!

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33817864301
RAC: 37828735

Boca Raton Community HS

Boca Raton Community HS wrote:
GPU: DUAL Nvidia RTX A6000, 48GB each, soon NVLinked = 96GB

dont bother running NVLink unless you have other software that will use this. BOINC will not use this, and will always treat both GPUs as individual GPUs. tasks will run independently.

Boca Raton Community HS wrote:

Thank you for any advice you can offer. We also have two other workstations with RTX 6000s in them. I am comfortably running 2 work units per GPU on all of them. The A6000s get really, really warm (hot) with 3 or more. I have tried it and have seen great results but it's just too hot. As I stated, I am working on a cooling solution but probably won't have it for a month or two. 

It's a monster. Let's do science!

For Einstein Gamma Ray tasks, Nvidia GPUs do not benefit from running multiple tasks per GPU. and in most cases, this actually slightly slows down overall production (tasks run slower than half speed).

I would recommend moving back to single tasks per GPU to maximize production.

I would also recommend power limiting these a little. by default they have a 300W power limit, and by enforcing a power limit to say 275-285W. you will see a small drop in total performance, but not proportional to the power reduction. this will help with power efficiency as well as heat produced.

Lastly, Einstein GR tasks benefit greatly from from memory performance, and in particular, latency. increasing the clock speed of your GPU memory might boost production a little. overclock in small increments. if you find that it becomes unstable or starts producing a lot of errors or invalids, then back off the memory overclock a bit. you might find that you can't reliably OC the mem, which I can underestand since the back-side memory modules (This GPU has 12x modules on the back of the PCB, opposite the cooler) are sure to get VERY hot and start throttling without exceptional air flow directly over the back of the card.

It's a shame that those A6000s only have GDDR6 memory (albeit @ 16Gbps) vs the 19Gbps GDDR6X memory on the higher end Geforce RTX cards. my well cooled RTX GDDR6X cards run very stably at 19.5Gbps on the memory

_________________________________________________________________________

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 216
Credit: 8430436878
RAC: 1814746

Thanks for the advice! I know

Thanks for the advice! I know the NVLink is not the most helpful piece of hardware since the GPUs already have a massive amount of memory. I think we will have the NVLink handy in case we ever need it, but I am skeptical. 

I am going to record amount of times for work units and test running 1, 2, 3 and 4 at a time (for a short time when running 3/4). I am curious to see times and efficiency. Right now, I am running 2 work units per GPU and it is taking ~5 min per work unit. 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33817864301
RAC: 37828735

One of my systems has been

One of my systems has been running with a 2080Ti + 3060Ti for a while now. But I also just picked up a 3070Ti to compare

2080Ti @ 230W (14Gbps GDDR6) = 192s/task = ~450 tasks/day = ~1.95 tasks/day/Watt

3060Ti @ 200W (14Gbps GDDR6) = 211s/task = ~409.5 tasks/day = ~2.05 tasks/day/Watt

Making the 3060Ti about 10% slower than a 2080Ti, but about 5% more power efficient.

 

Moving to the 3070Ti @ 230W (19 Gbps GDDR6X) = 165s/task = ~523 tasks/day = ~2.27 tasks/day/Watt

Making the 3070Ti about 16% faster than the 2080Ti at the same power, and hence about 16% more power efficient. 

also the 3070Ti is about 27% faster than the 3060Ti, and about 10% more power efficient.

_________________________________________________________________________

Joseph Stateson
Joseph Stateson
Joined: 7 May 07
Posts: 173
Credit: 2921580977
RAC: 1451596

tictoc wrote: One of my

tictoc wrote:

One of my Radeon VIIs is power capped at 180W.  Actual power usage is ≈160W-170W.  Daily average ppd on that GPU is ≈1.7M ppd.  That's within range of the TitanV, and it still has some headroom for increased clocks.  GPU is in a machine with 3 other Radeon VIIs, so it would take a bit of work to get exact numbers.

 

I haven't properly profiled or tuned that card for Einstein, but it can be very efficient running Einstein if the power is dialed back from the default limit. 

 

I have an app that can make the calculations easier if you are using Boinctasks to control your system.  The app is listed on my profile page here.

 

My S9000 AMD cards are power limited to 175 according to GPU-Z.  Looking at the wall power meter, the three cards and motherboard are drawing 420-460 watts on 220v so not hitting the maximum.

Clearly, Ian&Steve-C's NVidia systems win the efficiency hands down.  I will throw out my stats for the S9000 board.  It is an older board but its FP64 is superior to even the rtx3090.  Statistics for two projects are shown (assuming 146 watts per card)

Milkyway doing 4 concurrent tasks per card at
    total of 2997 credits per watt
Einstein doing 2 concurrent tasks per card at
    total of 3227 credits per watt

Note that the work unit credit and power usage are comparable showing the projects using the same metric to generate points (IMHO).  I ran Einstein for only 4 hours straight before switching to Milkyway.  

 

Einstein Stats

Milkyway Stats

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.