Troubleshooting Multiple gpu setups that use Riser cards

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 922
Credit: 6,785,738,370
RAC: 17,322,063

And this bit, which you might

And this bit, which you might have to do, from page 7 of the redux thread: 

 

check the /etc/OpenCL/vendors/ directory.

you should find a similar .icd file there as before but this time named "amdocl64_40200.icd". and is it the only file in this directory? if you're on a fresh install and have only tried the ROCm install, I imagine it's the only file there right now. if you have any other files in this directory, please post what they are and their contents (if applicable)

next check your /opt/rocm/opencl/lib/ directory and verify that the libamdocl64.so file is in there. if not please let me know.

open the amdocl64_40200.icd file with nano to edit:

sudo nano /etc/OpenCL/vendors/amdocl64_40200.icd

contents is likely just "libamdocl64.so"

change this to "/opt/rocm/opencl/lib/libamdocl64.so" (without the quotes)

[Ctrl]+[x] to exit, you will be prompted to save, enter [y], and hit [Enter] to verify filename (don't change it) and it will save and close.

Then reboot and retry.

 

note, the suffix of the libamdocl64.icd file (in the above instructions it’s “40200” referring to ROCm v4.2) might be different if ROCm has been updated in the repository. Be sure to check the file names that exist and make the necessary modifications to the given instructions. 

_____________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 922
Credit: 6,785,738,370
RAC: 17,322,063

I also just noticed that

I also just noticed that you’ve updated your system to 5.11 kernel. When did you do that? Did that coincide with your recent issues? The AMD driver install seems to be sensitive to kernel version and if the move from your system update from 5.8 to 5.11 coincided with your issues, then that might be a reason for all of this. 

_____________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 1,153
Credit: 2,167,239,166
RAC: 4,624,044

Got an unable to open not a

Got an unable to open not a member of render group so I added myself.

 

tom@EPYC-Moonshot:~$ sudo usermod -a -G render $LOGNAME
[sudo] password for tom:
tom@EPYC-Moonshot:~$ /opt/rocm/bin/rocminfo
ROCk module is loaded
Unable to open /dev/kfd read-write: Permission denied
tom is member of render group
tom@EPYC-Moonshot:~$

Clearly I missed something.

I can run the rocminfo command with a sudo and get the result I think we were expecting.  But that probably means I will have to run BOINCMGR from a command line to invoke under "sudo".  Something I didn't have to do before.

Darn it.

Tom M

As a self-interested person, I aspire to be a Humane.
In detail, I am a BIG Picture person.

 

 

 

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 922
Credit: 6,785,738,370
RAC: 17,322,063

Tom M wrote:Got an unable

Tom M wrote:

Got an unable to open not a member of render group so I added myself.

 

tom@EPYC-Moonshot:~$ sudo usermod -a -G render $LOGNAME
[sudo] password for tom:
tom@EPYC-Moonshot:~$ /opt/rocm/bin/rocminfo
ROCk module is loaded
Unable to open /dev/kfd read-write: Permission denied
tom is member of render group
tom@EPYC-Moonshot:~$

Clearly I missed something.

I can run the rocminfo command with a sudo and get the result I think we were expecting.  But that probably means I will have to run BOINCMGR from a command line to invoke under "sudo".  Something I didn't have to do before.

Darn it.

Tom M

did you add yourself to the other group as well? you should do both.

 

did you follow up with the additional instructions about editing the icd file?

 

what about the kernel question?

_____________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 1,153
Credit: 2,167,239,166
RAC: 4,624,044

Ian&Steve C. wrote: did you

Ian&Steve C. wrote:

did you add yourself to the other group as well? you should do both.

 

did you follow up with the additional instructions about editing the icd file?

 

what about the kernel question?

The directions had me adding myself to the video group.

I don't see any directions for editing the icd file.

I am running Ubuntu 20 LTS with a kernal of 5.8 (I think).

So maybe my directions were not complete?

Tom M

 

As a self-interested person, I aspire to be a Humane.
In detail, I am a BIG Picture person.

 

 

 

 

Tom M
Tom M
Joined: 2 Feb 06
Posts: 1,153
Credit: 2,167,239,166
RAC: 4,624,044

Ian&Steve C. wrote: I also

Ian&Steve C. wrote:

I also just noticed that you’ve updated your system to 5.11 kernel. When did you do that? Did that coincide with your recent issues? The AMD driver install seems to be sensitive to kernel version and if the move from your system update from 5.8 to 5.11 coincided with your issues, then that might be a reason for all of this. 

That was due to the last install and "complete" updating after the install.

Tom M

As a self-interested person, I aspire to be a Humane.
In detail, I am a BIG Picture person.

 

 

 

 

Tom M
Tom M
Joined: 2 Feb 06
Posts: 1,153
Credit: 2,167,239,166
RAC: 4,624,044

Ian&Steve C. wrote: And this

Ian&Steve C. wrote:

And this bit, which you might have to do, from page 7 of the redux thread: 

Files Missing from all locations. There is a Nvidia file in the /etc folder.

It looks like to me that after I get done with this mornings activities I need to re-install the OS with the 5.8 Kernel only (assuming I can).

Or maybe just uninstall the ROCm and try again?

Tom M

As a self-interested person, I aspire to be a Humane.
In detail, I am a BIG Picture person.

 

 

 

 

Tom M
Tom M
Joined: 2 Feb 06
Posts: 1,153
Credit: 2,167,239,166
RAC: 4,624,044

Sorry, missed the previous

Sorry, missed the previous posts your reply was speaking off.

Tom M wrote:

Ian&Steve C. wrote:

did you add yourself to the other group as well? you should do both.

 

did you follow up with the additional instructions about editing the icd file?

 

what about the kernel question?

The directions had me adding myself to the video group.

I don't see any directions for editing the icd file.

I am running Ubuntu 20 LTS with a kernal of 5.8 (I think).

So maybe my directions were not complete?

Tom M

 

As a self-interested person, I aspire to be a Humane.
In detail, I am a BIG Picture person.

 

 

 

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 922
Credit: 6,785,738,370
RAC: 17,322,063

Tom M wrote:The directions

Tom M wrote:

The directions had me adding myself to the video group.

the directions have TWO commands that add you to two groups. the render group command is right below the video command. you see that right?

Ian&Steve C. wrote:

Setting Permissions for Groups

sudo usermod -a -G video $LOGNAME

sudo usermod -a -G render $LOGNAME

 

Tom M wrote:

I don't see any directions for editing the icd file.

at the top of this page... the directions are split across two posts, since I copy/pasted them from two posts. you should follow the instructions in both posts.

 

Tom M wrote:

I am running Ubuntu 20 LTS with a kernal of 5.8 (I think).

No, you're running 5.11. just look at your own host details: https://einsteinathome.org/host/12896211

or run "uname -r" in the terminal, and you'll see.

 

Tom M wrote:

So maybe my directions were not complete?

They're complete. Either the 5.11 kernel is causing issues, or you're not reading all of the posts/content

_____________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 1,153
Credit: 2,167,239,166
RAC: 4,624,044

Tom M wrote: Ian&Steve C.

Tom M wrote:

Ian&Steve C. wrote:

And this bit, which you might have to do, from page 7 of the redux thread: 

===edited===

Just discovered I was looking at wrong system.  Sorry.

 

As a self-interested person, I aspire to be a Humane.
In detail, I am a BIG Picture person.

 

 

 

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.