Troubleshooting Multiple gpu setups that use Riser cards

mikey
mikey
Joined: 22 Jan 05
Posts: 11888
Credit: 1828033366
RAC: 208115

Tom M wrote: a BSOD (Win10)

Tom M wrote:

a BSOD (Win10) error message: "Driver Overran Stack Buffer"

This error message has shown up on both my Ryzen systems (MSI X570-A Pro) and my Intel system (MSI B360-A Pro).

It also will show up during a driver install if "too many" GPUs are plugged in.

The only fix that seems to work consistently is to reduce the # of GPUs below some threshold.  Probably 5 or certainly 4 which is where my top performer (Intel) is now sitting.

Tom M

That sounds like a motherboard limitation of the number of gpu's it was designed to handle, I know years ago when we first started using gpu's to crunch in Boinc two gpu's would crash a system and 3 would BSOD it. Then newer MB's came out and things started to change and 2, 3 and even 4 gpu's were okay in some MB's. BUT some people still had problems running multiple tasks at a time on all those gpu's as the MB couldn't handle all that data going thru at once. In short it could just be a matter of reducing the number of tasks per gpu to keep things running longer.

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5585
Credit: 7672689570
RAC: 1731683

Ian&Steve C. wrote: Fitting

Ian&Steve C. wrote:

Fitting that this post hit my Reddit dashboard today. 
 

Friendly reminder to stop using sata to power risers

What about jumping to ribbon cables? I now have a mb that would seem to support that.

And it would reduce complexity considerably.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33814201187
RAC: 37825615

Tom M wrote:Ian&Steve C.

Tom M wrote:

Ian&Steve C. wrote:

Fitting that this post hit my Reddit dashboard today. 
 

Friendly reminder to stop using sata to power risers

What about jumping to ribbon cables? I now have a mb that would seem to support that.

And it would reduce complexity considerably.

Tom M

that works, I'm a big fan of these and it does simplify things considerably. make sure you are running a competent PSU to handle the load through the motherboard. make sure you plug in a 6-pin PCIe cable from the PSU (NO SATA ADAPTERS!) to the 6-pin plug on the bottom edge of the motherboard (on the EPYCD8 you have).

just get a reasonable quality riser cable. do not get those generic grey un-shielded cables. get the ones that are PCIe gen 3 capable. usually in a thicker black coating. I use these ones with relatively good success, but it looks like they aren't available at the moment, but others like it exist just search around on amazon. for most of my cards I'm using the 20cm length cable on my mining frames (the furthest away use 30cm), but you should figure out what length you need specifically if you're not using a mining frame. also keep in mind that you might need a right angle cable or a right angle adapter for the 2 GPUs that will sit over the CPU area if you're using a tall-ish CPU heatsink.

 

edit: looks like the same brand riser is available under a new part number. I guess they updated it: https://www.amazon.com/EZDIY-FAB-Express-Flexible-Extension-Upgrade/dp/B07MDLYBJ4/ref=sr_1_5?dchild=1&keywords=pcie+3.0+riser+cable+20cm&qid=1622216502&s=electronics&sr=1-5

_________________________________________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5585
Credit: 7672689570
RAC: 1731683

Ian&Steve C. wrote:Tom M

Ian&Steve C. wrote:

Tom M wrote:

Ian&Steve C. wrote:

Fitting that this post hit my Reddit dashboard today. 
 

Friendly reminder to stop using sata to power risers

What about jumping to ribbon cables? I now have a MB that would seem to support that.

And it would reduce complexity considerably.

Tom M

make sure you plug in a 6-pin PCIe cable from the PSU (NO SATA ADAPTERS!) to the 6-pin plug on the bottom edge of the motherboard (on the EPYCD8 you have).

I know I have power connectors (8 pin and 4 pin) next to the 24 pin power connector.  And I have seen 6 pin power connectors on other MBs.  I guess I will have to get out the manual.  I have mislaid the 6 pin :(

==edot== Found it.  It is white and horizontal instead of vertical :)

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5585
Credit: 7672689570
RAC: 1731683

Ian&Steve C. wrote: edit:

Ian&Steve C. wrote:

edit: looks like the same brand riser is available under a new part number. I guess they updated it: https://www.amazon.com/EZDIY-FAB-Express-Flexible-Extension-Upgrade/dp/B07MDLYBJ4/ref=sr_1_5?dchild=1&keywords=pcie+3.0+riser+cable+20cm&qid=1622216502&s=electronics&sr=1-5

/quote]

I know I have 4 full length slots.  And 3 half length slots.  Am I missing something or does this work for both the full and half slots? And should I buy a "spare"?

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4699
Credit: 17541636269
RAC: 6358136

An auxiliary PCIE slot power

An auxiliary PCIE slot power connector is a must for any motherboard I purchase that is intended to host 4 or more gpus in my opinion.

Goes a long way to avoid pulling too much +12V power from the 24 pin connector and burning it up.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33814201187
RAC: 37825615

Tom M wrote: Ian&Steve C.

Tom M wrote:

Ian&Steve C. wrote:

edit: looks like the same brand riser is available under a new part number. I guess they updated it: https://www.amazon.com/EZDIY-FAB-Express-Flexible-Extension-Upgrade/dp/B07MDLYBJ4/ref=sr_1_5?dchild=1&keywords=pcie+3.0+riser+cable+20cm&qid=1622216502&s=electronics&sr=1-5

I know I have 4 full length slots.  And 3 half length slots.  Am I missing something or does this work for both the full and half slots? And should I buy a "spare"?

Tom M

look at your board. look at the slots. the "half" slots (these are x8 slots) are open ended. you can put a x16 device in them, and the extra just hangs out the end of the slot the device will work, but obviously only at x8 lanes since that's all that is making a connection.

for reference, this is how all PCIe devices work. any PCIe device can work in a reduced lane configuration, just with less bandwidth. PCIe 3.0 x8 is more than enough for any project though. I run all my EPYCD8 boards like this. no problem.

_________________________________________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5585
Credit: 7672689570
RAC: 1731683

Ian&Steve C. wrote:look at

Ian&Steve C. wrote:
look at your board. look at the slots. the "half" slots (these are x8 slots) are open ended.

Slap forehead!!!! Its been so long since I have had a MB with open backs on the slots I forgot about them...

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5585
Credit: 7672689570
RAC: 1731683

I am now getting a consistent

I am now getting a consistent 2 gpus pause processing after less than 24 hours.

Its not the same two physical gpus after a reboot.

Suspending/Resuming the tasks doesn't help "clear/reboot" the gpus that have started pausing processing.

Rebooting does.

Any ideas of where I should look next?

===edit====

When I lookup an "important error" from the Log on Google I get back to the dreaded "pci=nommconfThe kernel option pci=nommconf disables Memory-Mapped PCI Configuration Space" fix.

Have added that command line/run grub update.  Now to re-boot, again.

=====edit===

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5585
Credit: 7672689570
RAC: 1731683

Ian&Steve C. wrote:edit:

Ian&Steve C. wrote:

edit: looks like the same brand riser is available under a new part number. I guess they updated it: https://www.amazon.com/EZDIY-FAB-Express-Flexible-Extension-Upgrade/dp/B07MDLYBJ4/ref=sr_1_5?dchild=1&keywords=pcie+3.0+riser+cable+20cm&qid=1622216502&s=electronics&sr=1-5

Where are those "90 degrees" ribbons?  It looks like I could use 2 for better clearance.

===mumble==== :) found something I think.

But it looks like my furthest away slot would need a 14" long (or thereabouts) ribbon?  What am I missing?

It looks like I could reasonably manage 3 gpus on ribbon cables of the above brand.  But apparently, they don't sell ones long enough to reach from my MB to the 2 or 3 furthest away locations for the GPUs on my mining rack. Hmmmm.....

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.