How much VRAM for 2x Gravitational wave work units?

Ben Scott
Ben Scott
Joined: 30 Mar 20
Posts: 54
Credit: 1753029793
RAC: 3152040
Topic 229563

Does anyone know how much VRAM I would need to reliably run two Gravitational Wave work units at the same time? Thank you in advance.

 

alex
alex
Joined: 8 Apr 21
Posts: 6
Credit: 2608884916
RAC: 5806473

In my experience you need

In my experience you need around 10 GB to be safe. I think most of the time 8 GB is enough, but I observed multiple cases when it uses more than 4 GB for each task. I have RX 6600 as well and so I stick to only one concurrent task which is not ideal because it doesn't utilize the GPU completely.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118621300705
RAC: 18125632

alex wrote:... I have RX 6600

alex wrote:
... I have RX 6600 as well and so I stick to only one concurrent task which is not ideal because it doesn't utilize the GPU completely.

You have two machines showing AMD GPUs.  One has 3 GPUs with the 'most capable' being shown as RX 5700XT.  Is your RX 6600 GPU one of the other two GPUs on that machine?

FGRPB1G crunch times seem to vary a lot.  Are you running CPU tasks (from other projects) on your CPU cores?  If so, how many do you run concurrently?  The only reason for asking is to understand why the large variation for FGRPB1G tasks which are normally quite uniform.  Perhaps CPU tasks are impacting GPU performance.  The variations seem too large to be just due to two different GPU models.

Cheers,
Gary.

alex
alex
Joined: 8 Apr 21
Posts: 6
Credit: 2608884916
RAC: 5806473

You have two machines showing

You have two machines showing AMD GPUs.  One has 3 GPUs with the 'most capable' being shown as RX 5700XT.  Is your RX 6600 GPU one of the other two GPUs on that machine?

Indeed, on this computer I have 6600 as the main GPU, a 5700xt is connected to a secondary PCIE 3.0 4x and another 5700xt connected via raiser.

FGRPB1G crunch times seem to vary a lot.

You are raising a very good point and that's partially why I'm here on the forum. Last several months I just let boinc do its work and I didn't look closely at tasks execution time. Recently I've bought myself a nicely priced used 6800xt, which is a complete beast in 4k gaming by the way, and decided to compare performance of 6600, 5700xt, 6800xt and 2060s (I have a couple of them too, but right now on different projects) in boinc. And just yesterday I noticed that the 5700xt running on a riser appears to perform twice as bad in Einstein (not sure so far about other projects, but Amicable numbers seems fine).

Exact measurements are a bit complicated because my Einstein GPU tasks run together with Amicable numbers tasks (0.5 amicable numbers + 2 * 0.25 Einstein usually) which actually makes Amicable number tasks to run 2-3 times more slow, so I'm not sure if it is the best arrangement or not.

Also yes, I run a lot of different CPU tasks from a bunch of projects plus it is my work computer so I compile code, run an IDE and do other CPU heavy work-related stuff often, but I don't think it affects GPU tasks much.

I'm planning to investigate it more thoughtfully in the near future by running only one project at time and collecting the statistic. I'm especially interested in how AMD GPUs compare to Nvidia's, so far I see that 2060s appears to be faster in Amicable numbers (considerably), Einstein (probably, but not sure) and PrimeGrid (it uses cuda so no surprise I guess) comparing to 5700xt, which is very sad considering that 5700xt should be on 2070/2070s level of performance. At least it is so in games. Unfortunately I'm not sure if it is correct to compare tasks for amd-opencl and nvidia-opencl even if the have the same name and same credit. It's especially weird for me because when checking theoretical computing performance on sites like techpowerup, RDNA GPUs have really great FP16, FP32 and FP64 numbers usually much higher than their NVIDIA counterparts.

I would very much appreciate if you have any thoughts on the above.

Thanks!

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6662
Credit: 9688139752
RAC: 2506463

Part of the confusion may be

Part of the confusion may be because most of the top performing Nvidia GPU systems are running highly optimized versions of the grp#1 application.

This customized app is not available for Amd Radeon GPU's.

It is not clear if the open-cl implementations are equally efficient across both brands.

If I understand it right only the single precision floating point is being used in the GPU's.

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4079
Credit: 48558386673
RAC: 34663580

AMD does not usually have

AMD does not usually have "much higher" FP16 or FP32 numbers. and only sometimes have better FP64 numbers (depends on model). but nvidia and AMD have  been pulling back on FP64 performance, more so nvidia. but only 1 or two projects rely heavily on FP64 anyway. Einstein is mostly FP32, as are most projects.

nvidia performs better in most cases. even without the highly optimized application. the v1.28 nvidia app allows nvidia cards to have a slight lead on the amd cards in the same product category. the optimized app allows nvidia to walk away from AMD.

_________________________________________________________________________

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118621300705
RAC: 18125632

alex wrote:...Last several

alex wrote:
...Last several months I just let boinc do its work and I didn't look closely at tasks execution time.

A very good recipe for always having sub-optimal performance:-).

It doesn't take very long to run (individually) a few of each task type to get a performance baseline.  Without doing that you will never know what you are aiming for, performance wise.

alex wrote:
Exact measurements are a bit complicated because my Einstein GPU tasks run together with Amicable numbers tasks (0.5 amicable numbers + 2 * 0.25 Einstein usually) which actually makes Amicable number tasks to run 2-3 times more slow, so I'm not sure if it is the best arrangement or not.

Exact measurements are not complicated.  Even if you want to run multiple tasks per device, test out with a single task first and then try multiples to see if its really worth the effort.  When you have data for each individual task type, start (slowly and carefully) adding a second type and note what effect that has on the times for the first type.  Rinse and repeat until you really understand when you are overloading to the point where performance is really being impacted.

With 3 GPUs and with GPU tasks from different projects, you should investigate whether or not a particular task type runs best on a particular GPU type.  You can investigate BOINC's configuration options to see how to direct a particular task type to a particular GPU.  If you mix different task types on the one GPU performance may very well suffer.

You have a 6C/12T CPU.  If you allow BOINC to run CPU tasks on all available threads, performance will probably suffer alarmingly.  That's the reason I asked for the exact number of concurrent CPU tasks you were running.

Only you can determine what the best mix of CPU and GPU tasks will be.  Nobody else can do it for you.

alex wrote:
Also yes, I run a lot of different CPU tasks from a bunch of projects plus it is my work computer so I compile code, run an IDE and do other CPU heavy work-related stuff often, but I don't think it affects GPU tasks much.

You can't know that unless you do the tests.  You might be surprised at how much improvement you can make.

Cheers,
Gary.

alex
alex
Joined: 8 Apr 21
Posts: 6
Credit: 2608884916
RAC: 5806473

Thanks everybody for your

Thanks everybody for your input on this topic. I confirm it is true that an overloaded CPU can noticeably affect GPU tasks performance, so I took measures to avoid it. I've also bought a decent used 3080ti and spent last couple of weeks comparing performance across different projects.

I won't get into much details like exact task names and etc, but just to give a general idea. 3080ti is power restricted to 250W, 2060s to 140W, 6800xt to 250W, 6600 to 120W, 5700xt to 150W. It's interesting that utilization is always better for NVIDIA cards. It's almost always enough to just run one task to load the card to 100% and hit the power limit. It's different for AMD: for Einstein 3-4 tasks are ideal and consumption is a bit below the power limit (at least for 6800xt), for Amicable numbers 2 tasks and consumption is the lowest, only PrimeGrid can really load these cards.

  1. I was totally wrong about Einstein. One of my 5700xt cards was on a riser and as it turned out my MB (Asrock b550 steel legend) is terrible for using 3 GPUs. The card on a riser performs 30-40% worse. I do not know why and it does not happen on my other motherboards. So actually AMD GPUs are very good for Einstein. 5700xt and 6600 are about 10% faster than 2060s. 6800xt is 20-25% faster than 3080ti, which is kinda weird, I got the result for MeerKAT, but I didn't spend much time retesting it. The point is Einstein is by far the best project for AMD.

  2. Amicable numbers are just bad for AMD. Mighty 6800xt performs on 2060s level. 6800xt finishes two tasks in 18 minutes, 2060s in 18 minutes as well, 3080ti in 8 minutes, 5700xt/6600 in 40 minutes.

  3. PrimeGrid: comparable NVIDIA card is 30% faster than AMD.

Based on my testing I put my AMD cards exclusively on Einstein. NVIDIA does everything else.

AMD cards not only cheaper and have more memory but also much more convenient to use in Linux. It's possible to monitor junction and memory temperatures, undervolt, set a fan curve for each card and thanks to CoreCtrl do it in a very convenient way. Nvidia doesn't even provide memory temperature (that's just crazy), no undervolting, fan curve is very painful because GreenWithEnvy supports only one GPU. But Nvidia has cool program called nvtop which even shows PCIe utilization, I didn't find anything similar for AMD. So it's a tough choice.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5043
Credit: 19025927389
RAC: 6762803

You are controlling your

You are controlling your Nvidia card clocks and fans the hard way.  The driver all by itself supports core, fan control power limits and reports core temps through the standard sysfs interface. Nvidia does not expose memory temps in Linux though as you learned. 

I don't know what Linux distro you are using but seems very deficient to plain Ubuntu.

Standard nvidia-smi terminal utility installed by the drivers exposes most of everthing.  Nvidia X Server Settings app controls the fan speeds and sets clocks.

For another monitoring program other than GreenWithEnvy you can use the gpu-utils utility at Github.

Ricks-Lab gpu-utils

Does both AMD and Nvidia.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.