Troubleshooting Multiple gpu setups that use Riser cards

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5585
Credit: 7672862903
RAC: 1740007

Ian&Steve C. wrote: And this

Ian&Steve C. wrote:

And this bit, which you might have to do, from page 7 of the redux thread: 

 

check the /etc/OpenCL/vendors/ directory.

you should find a similar .icd file there as before but this time named "amdocl64_40200.icd". and is it the only file in this directory? if you're on a fresh install and have only tried the ROCm install, I imagine it's the only file there right now. if you have any other files in this directory, please post what they are and their contents (if applicable)

next check your /opt/rocm/opencl/lib/ directory and verify that the libamdocl64.so file is in there. if not please let me know.

open the amdocl64_40200.icd file with nano to edit:

sudo nano /etc/OpenCL/vendors/amdocl64_40200.icd

contents is likely just "libamdocl64.so"

change this to "/opt/rocm/opencl/lib/libamdocl64.so" (without the quotes)

[Ctrl]+[x] to exit, you will be prompted to save, enter [y], and hit [Enter] to verify filename (don't change it) and it will save and close.

Then reboot and retry.

 

note, the suffix of the libamdocl64.icd file (in the above instructions it’s “40200” referring to ROCm v4.2) might be different if ROCm has been updated in the repository. Be sure to check the file names that exist and make the necessary modifications to the given instructions. 

Done getting ready to reboot.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33815987749
RAC: 37813474

after checking, 5.11 kernel

after checking, 5.11 kernel shouldnt cause issues with the ROCm install. I just upgraded my test bench from kernel 5.4 to kernel 5.11 over top my existing ROCm 4.2 driver install and it all went fine. it might be an issue with the AMDGPU-Pro package though. So if you stay with the 5.11 kernel, ROCm should still work.

 

did you get any errors during the install steps? very possible that you missed some error in the terminal output if you were just blindly copy/pasting commands one after the other.

 

I would run the uninstall commands for ROCm, then try the install again. pay attention to the output after each command, and report if any errors show up along the way.

_________________________________________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5585
Credit: 7672862903
RAC: 1740007

Ian&Steve C. wrote:after

Ian&Steve C. wrote:

after checking, 5.11 kernel shouldnt cause issues with the ROCm install. I just upgraded my test bench from kernel 5.4 to kernel 5.11 over top my existing ROCm 4.2 driver install and it all went fine. it might be an issue with the AMDGPU-Pro package though. So if you stay with the 5.11 kernel, ROCm should still work.

 

did you get any errors during the install steps? very possible that you missed some error in the terminal output if you were just blindly copy/pasting commands one after the other.

 

I would run the uninstall commands for ROCm, then try the install again. pay attention to the output after each command, and report if any errors show up along the way.

The lcd file edit was enough to get BOINC to recognize the Rx 5700 I have plugged into the motherboard.

I am now waiting to see how fast it processes the gpu task.

===edit===

Looking good.  33% done and time is around 3 minutes which means have gotten past the slow down issue.  I think.  Will try adding back in the 3 "working" gpus on their ribbon cables next.

Tom M

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33815987749
RAC: 37813474

Tom M wrote: Ian&Steve C.

Tom M wrote:

Ian&Steve C. wrote:

after checking, 5.11 kernel shouldnt cause issues with the ROCm install. I just upgraded my test bench from kernel 5.4 to kernel 5.11 over top my existing ROCm 4.2 driver install and it all went fine. it might be an issue with the AMDGPU-Pro package though. So if you stay with the 5.11 kernel, ROCm should still work.

 

did you get any errors during the install steps? very possible that you missed some error in the terminal output if you were just blindly copy/pasting commands one after the other.

 

I would run the uninstall commands for ROCm, then try the install again. pay attention to the output after each command, and report if any errors show up along the way.

The lcd file edit was enough to get BOINC to recognize the Rx 5700 I have plugged into the motherboard.

I am now waiting to see how fast it processes the gpu task.

===edit===

Looking good.  33% done and time is around 3 minutes which means have gotten past the slow down issue.  I think.  Will try adding back in the 3 "working" gpus on their ribbon cables next.

Tom M

 

 

yup I just saw. congrats. glad it's detecting the GPUs now.

_________________________________________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5585
Credit: 7672862903
RAC: 1740007

Tom M wrote: The lcd file

Tom M wrote:

The lcd file edit was enough to get BOINC to recognize the Rx 5700 I have plugged into the motherboard.

I am now waiting to see how fast it processes the gpu task.

===edit===

Looking good.  33% done and time is around 3 minutes which means have gotten past the slow down issue.  I think.  Will try adding back in the 3 "working" gpus on their ribbon cables next.

Sat 28 Aug 2021 12:29:18 PM CDT |  | Starting BOINC client version 7.16.5 for x86_64-pc-linux-gnu
Sat 28 Aug 2021 12:29:18 PM CDT |  | log flags: file_xfer, sched_ops, task, sched_op_debug
Sat 28 Aug 2021 12:29:18 PM CDT |  | Libraries: libcurl/7.68.0 GnuTLS/3.6.13 zlib/1.2.11 brotli/1.0.7 libidn2/2.2.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 librtmp/2.3
Sat 28 Aug 2021 12:29:18 PM CDT |  | Data directory: /home/tom/Desktop/BOINC
Sat 28 Aug 2021 12:29:27 PM CDT |  | OpenCL: AMD/ATI GPU 0: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (driver version 3305.0 (HSA1.1,LC), device version OpenCL 2.0, 8176MB, 8176MB available, 8064 GFLOPS peak)
Sat 28 Aug 2021 12:29:27 PM CDT |  | OpenCL: AMD/ATI GPU 1: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (driver version 3305.0 (HSA1.1,LC), device version OpenCL 2.0, 8176MB, 8176MB available, 8064 GFLOPS peak)
Sat 28 Aug 2021 12:29:27 PM CDT |  | OpenCL: AMD/ATI GPU 2: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (driver version 3305.0 (HSA1.1,LC), device version OpenCL 2.0, 8176MB, 8176MB available, 8064 GFLOPS peak)
Sat 28 Aug 2021 12:29:27 PM CDT |  | libc: Ubuntu GLIBC 2.31-0ubuntu9.2 version 2.31
Sat 28 Aug 2021 12:29:27 PM CDT |  | Host name: EPYC-Moonshot
Sat 28 Aug 2021 12:29:27 PM CDT |  | Processor: 48 AuthenticAMD AMD EPYC 7401P 24-Core Processor [Family 23 Model 1 Stepping 2]
Sat 28 Aug 2021 12:29:27 PM CDT |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall sev_es fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
Sat 28 Aug 2021 12:29:27 PM CDT |  | OS: Linux Ubuntu: Ubuntu 20.04.3 LTS [5.11.0-27-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.2)]
Sat 28 Aug 2021 12:29:27 PM CDT |  | Memory: 62.79 GB physical, 2.00 GB virtual
Sat 28 Aug 2021 12:29:27 PM CDT |  | Disk: 915.40 GB total, 845.33 GB free

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33815987749
RAC: 37813474

context? commentary? is

context? commentary?

is this a good thing or a bad thing?

I see 3 GPUs recognized.

_________________________________________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5585
Credit: 7672862903
RAC: 1740007

Tom M wrote: Looking

Tom M wrote:

Looking good. 

It appears that the current 4 GPU (1 on MB, 3 on ribbon cables) is working.

And processing using the Beta -> Faster version of the Gamma Ray #1 GPU task.

I am going to see if I can get the 5th GPU installed directly on the MB.

If that works then I will start trying to move everything back up on to ribbon cables for cooler processing.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33815987749
RAC: 37813474

According to your tasks, they

According to your tasks, they are all the 1.18 app. I don’t see any beta tasks from that system (app version 1.28).
 

can you clarify? Or do you mean you’re using libsleep? 

_________________________________________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5585
Credit: 7672862903
RAC: 1740007

Ian&Steve C.

Ian&Steve C. wrote:

According to your tasks, they are all the 1.18 app. I don’t see any beta tasks from that system (app version 1.28).
 

can you clarify? Or do you mean you’re using libsleep? 

My mistake then.

And got up this morning to 4 gpus on ribbons and 4 gpus not making any progress though running.

Unplugged last gpu change.  Restarted system.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33815987749
RAC: 37813474

can you clarify the run

can you clarify the run configuration now?

how many tasks at a time? 1x? 2x?

_________________________________________________________________________

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.