Somehow I've gotten https://einsteinathome.org/host/10706295 one of my two hosts which currently run Windows with an AMD RX 570 installed into a condition in which BOINC believes it has two RX 570 cards installed.
Einstein was until recently instructed by my preferences to run 3X Gamma Ray Pulsar GPU tasks. If I unsuspend one, two or three GPU tasks these start and run normally, but with the abnormal indication that they are running on Device 0, as indicated by the status column in the BOINC Manager Task tab. But if I unsuspend a fourth task BOINC manager indicates that is running on Device 1, and productivity plummets.
The startup messages for BOINC manager contain these lines:
5/5/2019 7:07:11 AM | | OpenCL: AMD/ATI GPU 0: Radeon RX 570 Series (driver version 2766.5, device version OpenCL 2.0 AMD-APP (2766.5), 8192MB, 8192MB available, 4833 GFLOPS peak)
5/5/2019 7:07:11 AM | | OpenCL: AMD/ATI GPU 1: Radeon RX 570 Series (driver version 2348.3, device version OpenCL 2.0 AMD-APP (2348.3), 8192MB, 8192MB available, 4833 GFLOPS peak)
I opened a recent copy of client_state.xml and find these troublesome lines:
<n_usable_coprocs>2</n_usable_coprocs>
and a bit later
<coproc_ati>
<count>2</count>
<name>Radeon RX 570 Series</name>
There is only one coproc_ati tag in the file.
Windows control panel Device manager only lists one display adapator, an RX 570 with driver 25.20.15031.9002
So far I have tried a "clean install" of the current AMD driver, which did not fix things.
I then tried unstalling the AMD driver, rebooting, and doing a clean install--no joy.
I tried doing a "repair install" of BOINC itself--no joy.
It seems clear I have somehow gotten to a state where something is corrupt. Possibly it could be fixed by appropriate editing of client_state.xml, but I am a bit frightened to tamper in there in my ignorance.
Suggestions?
Copyright © 2024 Einstein@Home. All rights reserved.
I'm aware of this type of
)
I'm aware of this type of issue having arisen a few times recently and indeed it happened to me just before Christmas. At the time the afflicted host was running 2 Vega64 gpu's but after I chose to upgrade the gpu drivers I ended up with both Boinc and Windows 10 reporting 4 Vega64's with Boinc running 2 tasks on each of them.
I also went down the clean uninstall and reinstall route with no success. I then tried DDU (in safe mode), that unistalled all 4 instances of the gpu's and after reboot Windows 10 reported the correct number of gpu's (2) and loaded the default Windows driver - which we should all know by now are no good for crunching... Installing the AMD driver afresh resulted in the return of 4 reported and used gpu's!
The machine in question is also my daily driver and due to a lack of time to properly investigate and the extreme frustration I resolved the issue by buying another SSD and reinstalling Windows... Drastic measures I know!
From speaking to others that have had the problem the common factor appears to be AMD driver upgrades whether they be in situ or clean. There is also the possibility that Windows updates are interfering. Have you or Microsoft run updates recently?
Gavin wrote:There is also the
)
Not coincident with the "extra coprocessors" matter. But a few days ago, yes.
As I have another Windows RX 570 host without the extra coprocessors problem, I'm tempted to compare the BOINC xml files for the two and repair the obvious differences such as the discrepancies I cited in the initial post.
BeemerBiker has been dealing
)
BeemerBiker has been dealing with this exact issue many times now. Too many drivers loaded for the card from different sources. A clean DDU uninstallation and reinstall seems to fix it.
Keith Myers wrote:A clean DDU
)
I'll try that before I get brave and attempt to edit an xml file myself.
The actions I took in
)
The actions I took in response to Keith's suggestion appear to have worked.
I performed this sequence:
0. suspended all BOINC work units
1. downloaded latest DDU from wagnardmobile
2. disconnected PC from the internet
3. ran the Windows Control panel uninstaller for "AMD software"
4. used msconfig to set up a safe minimal reboot
5. ran DDU in the most suggested mode while in the safe reboot
6. ran the AMD current graphics installer after DDU initiated a reboot
7. rebooted when the installer was finished
8. started up BOINC and unsuspended one WU.
9. reconnected the PC to the Internet
I was happy to see that the BOINC event log during startup no longer showed the alarming indication of two separate graphics units I posted earlier. I was also happy to see when I started running a task that the status column in BOINCMgr no longer displayed a device number.
It is early going, but so far my particular situation appears to be fixed. I don't know how I got here.
Also, I don't know why both of my two RX 570 machines have recently gotten into a sloth mode with no obvious trigger in which the reported power consumption and GPU temperature go abruptly way down, and the Einstein productivity drops by at least a factor of three. The truly odd thing is that the GPU operating frequency, as reported to GPU-Z or Wattman, DOES NOT drop during these episodes. But I think that is a different problem, and that I probably accidentally got into the subject problem of this thread by finger error during the driver re-installs I used to try to recover from the sloth mode state (when reboot was not enough, which it sometimes has been).
BeemerBiker says the problem
)
BeemerBiker says the problem occurs if you try to install different versions of the ATI/AMD drivers from either AMD or Windows doing its thing of forcing graphics driver updates on you even when you don't allow updates.
Only a complete DDU cleanout fixes things before the installation of the desired drivers.
Keith Myers wrote:BeemerBiker
)
I suspect his situation differed from mine.
However, the stated procedure seems to have worked in my case. Thanks. It is still holding up.
Btw, what have you set on the
)
Btw, what have you set on the Windows Control Panel - System and Security - System - Advanced system settings - Hardware - Device installation settings ? Is it Enabled or Disabled?
I'm not sure if that option could be related to this kind of problem, but any larger Windows update will always reset that thing back to 'Enabled'.
Richie wrote:Device
)
Just now I checked on the offending machine. It was set to yes.
When I ran DDU, I got into a configuration screen and selected the option that I wanted DDU to turn off Windows automatic update of graphics drivers. But when I actually ran the DDU operation, a box announced that it was leaving that setting at the Windows default (a couple of years ago DDU was configurable on the main operation screen, but not for many revisions back).
Anyway, I've now set it to no.