Multi GPU rig errors

Pokey
Pokey
Joined: 7 Jan 16
Posts: 14
Credit: 6,868,359,386
RAC: 15,838

@mikey, re your question, I

@mikey, re your question, I use Nvidia X Server settings for one to keep an eye on temps and such, and I use nvidia-smi in terminal for snapshots and setting power limits and persistence.  Also I do have a bunch of 750 watt PSU's setting around, but my goal is a six GPU rig.  You might say it's a project I have set for myself.  Back some years ago I would put four gpus together and use the second psu with a jumper to make it think it was connected to a computer, but not a fan of having to manually turn the psu on and off.  Now I use a harness to interconnect the the two psus.

@GWGeorge007, you have given me a lot of homework................ ;)  I will follow through on that.

@Ian&Steve C. I did reinstall Boinc and reattached to Einstein.  And I am pretty sure it helped on the software side.  Although I did lose petri's software.  now I am running the stock client for HSGammaPulsar.

In the meantime I have micromanaged the placement of the cards to more evenly balance the load until I can get a proper primary psu. 

I appreciate everyone jumping in with help.  Thanks all.

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 2,790
Credit: 4,596,402,208
RAC: 3,251,469

Out of curiosity (I am known

Out of curiosity (I am known as Curious George if you didn't know), what are you running your 5 GPUs on?

A mining rig?  Any chance you could give a picture or two?

If you were to switch over to an AMD Threadripper Pro and go with a custom loop water cooling solution, you could have a ASUS Pro WS WRX80E-SAGE SE WIFI AMD Threadripper Pro EATX workstation motherboard which has 7 PCIe slots.

TR-Pro motherboard here

George

Proud member of the Old Farts Association

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5,629
Credit: 7,715,497,306
RAC: 2,223,309

Pokey wrote:I appreciate

Pokey wrote:

I appreciate everyone jumping in with help.  Thanks all.

I know there are PSU cables that allow you "daisy chain" multiple PSU's to together for high gpu count rigs.

I have a 3 PSU set.

The main trick appears to make sure all the PCIe slots and the one GPU you are running your video are coming off the same PSU.  If the MB has an extra power connector to support the PCIe slots it needs to come off that same PSU.  Otherwise you can spread the top line power around to wherever it fits.

The OTHER trick that I took years to realize I was screwing up was using a single top-side cable from a PSU to drive a GPU instead of 1 per top side plug.

Tom M

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Pokey
Pokey
Joined: 7 Jan 16
Posts: 14
Credit: 6,868,359,386
RAC: 15,838

@mikey: I didn't even

@mikey: I didn't even know Hardware Sensors Indicator was a thing, so I tried it.  Did not fall in love, but thanks for the idea.

@GWGeorge007: I looked at the threadripper link but I will pass for now.  Maybe next project.  I got away from AMD years ago over poor drivers.  But I know everyone is embracing their CPU's now.  I just haven't yet.

BTW, below is link to my two big rigs.

DC Rigs

 

@tom m:  I think I'm ok on the first count but not sure about the second.  I have always used one cable per card.  A PCIe cable single end to PSU with double end to the GPU.  Basically one cable per GPU.  Except the 3090 ti, which explicitly calls for one cable per plug.

mint-1 is running quietly right now so I will sneak off and do something else for a while. 

Appreciate you all.

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 2,790
Credit: 4,596,402,208
RAC: 3,251,469

Pokey wrote: @GWGeorge007: I

Pokey wrote:

@GWGeorge007: I looked at the threadripper link but I will pass for now.  Maybe next project.  I got away from AMD years ago over poor drivers.  But I know everyone is embracing their CPU's now.  I just haven't yet.

BTW, below is link to my two big rigs.

DC Rigs

mint-1 is running quietly right now so I will sneak off and do something else for a while. 

Appreciate you all.

Thanks for the pics, I now can better visualize what you're doing with them.  I need some visual clues now and then.  ;*)

I too will do something else... like more FOOTBALL!

George

Proud member of the Old Farts Association

mikey
mikey
Joined: 22 Jan 05
Posts: 11,927
Credit: 1,831,602,001
RAC: 212,700

Pokey wrote: @mikey: I

Pokey wrote:

@mikey: I didn't even know Hardware Sensors Indicator was a thing, so I tried it.  Did not fall in love, but thanks for the idea.

@GWGeorge007: I looked at the threadripper link but I will pass for now.  Maybe next project.  I got away from AMD years ago over poor drivers.  But I know everyone is embracing their CPU's now.  I just haven't yet.

BTW, below is link to my two big rigs.

DC Rigs

 

@tom m:  I think I'm ok on the first count but not sure about the second.  I have always used one cable per card.  A PCIe cable single end to PSU with double end to the GPU.  Basically one cable per GPU.  Except the 3090 ti, which explicitly calls for one cable per plug.

mint-1 is running quietly right now so I will sneak off and do something else for a while. 

Appreciate you all. 

I got a rig like that 2 Christmas's ago and got it up and running but it was not fun to keep track of and it had a massive., for me, foot print so now it's back in the box. I used an I5 cpu as I was just using Nvidia 1GB 750Ti gpu's so not high end ones by any means.

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5,629
Credit: 7,715,497,306
RAC: 2,223,309

GWGeorge007 wrote: I too

GWGeorge007 wrote:

I too will do something else... like more FOOTBALL!

Before this thread gets high-jacked should you and I decamp to the Seti Bistro?

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Rodrigo
Rodrigo
Joined: 5 Aug 17
Posts: 22
Credit: 184,574,819
RAC: 554,116

Hi! This seemed to be the

Hi! This seemed to be the most appropriate thread to make this question, sorry if its wrong.

I have a small spare GPU, a R5 240, i want to add it to my computer(13071221), i used this GPU in the past with FGRPB1G and BRP7 tasks, the FGRPB1G runs just fine and gets validated, but the BRP7 gets a lot of invalidated tasks. 

Is there any way i can make this specific GPU (R5240) only crunch FGRPB1G tasks, while the others are free to crunch FGRPB1G and BRP7?

I imagine this could be done editing the cc_config.xml, but i dont know how! I use a very simple version i copied and pasted from the forums. This one:

<cc_config>
    <options>
        <use_all_gpus>1</use_all_gpus>
    </options>
</cc_config>

 

Thanks!!

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4,731
Credit: 17,655,870,083
RAC: 6,176,579

Use an exclude gpu statement

Use an exclude gpu statement in your cc_config.xml file to exclude the R5240 gpu from the BRP7 app.  You need to use the client numbering of the gpus in the startup of the event log.  It looks like that host has a 580 identified as the most capable and would be numbered gpu#0.  So your R5240 must be labeled as GPU#1.

The docs for this are here: Client configuration

I just did something similar for GPUGrid for my multi-gpu hosts.  Here is a snippet from one of my configurations to give you an idea of the syntax and structure.

<exclude_gpu>
   <url>https://einstein.phys.uwm.edu/</url>
   <device_num>0</device_num>
   <app>einsteinbinary_BRP7</app>
</exclude_gpu>

In your case where I think the R5240 is identified as gpu#1, then your statement would be:

<exclude_gpu>
   <url>https://einstein.phys.uwm.edu/</url>
   <device_num>1</device_num>
   <app>einsteinbinary_BRP7</app>
</exclude_gpu>

This snippet goes into the Options section of the cc_config.xml between the <options> and </options> delimiters.

Stop BOINC, make the edit and then restart BOINC.  In the beginning of the event log startup you should see the printout of the exclude statement like this:

[Einstein@Home] Config: excluded GPU.  Type: all.  App: einsteinbinary_BRP7.  Device: 0

 

Rodrigo
Rodrigo
Joined: 5 Aug 17
Posts: 22
Credit: 184,574,819
RAC: 554,116

   Awesome!! I edited the

   Awesome!! I edited the cc_config file using your example and it solved the problem!! Thanks!!!

2/10/2023 3:37:30 PM | Einstein@Home | Config: excluded GPU.  Type: all.  App: einsteinbinary_BRP7.  Device: 1

   Just one drawback, to make Boinc see the Gpus properly i had to use the Windows default drivers, so now I have no fan control on the RX580, its running a bit hotter.

   If i install the AMD drivers, Windows detect all GPUs properly, but BOINC shows 5 GPUs, the integrated GPU and the RX580 shows twice with different drivers. I forgot to copy/paste, but its like this:

GPU 0: RX580 - Driver 3444

GPU 1: Vega 3 - Driver 3444

GPU 3: R5 240 - Driver 3240

GPU 4: RX580 - Driver 3240

GPU 5: Vega 3 - Driver 3240

 

Initially i tried to edit the cc_config file to exclude GPUs 0 and 1 entirely, to not use them, when it starts running, it uses the RX580 correctly, but it wont use the R5 240, it instead uses the Vega 3 integrated GPU running 2 tasks at the same time, one BRP7, as it should, but also a FGRPB1G that should be running on the R5 240. I also tried to exclude only GPUs 4 and 5 with the same results.

If i install the R5 240 driver(which is older), Windows and Boinc only shows the R5 240.

I think there's something to do with the Opencl versions, the Vega 3 and the RX580 uses 2.0 and the R5 240 uses 1.2, but im not sure.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.