Understand, Bernd. And thanks for the clear answer. I have removed my gpu_exclude for my RTX 2080 for now and will watch out for any of the offending task types and hopefully abort them in time.
I have removed my gpu_exclude for my RTX 2080 for now and will watch out for any of the offending task types and hopefully abort them in time.
My RTX 2080 has come up off the floor and has been back into the box for some days now. In my case, it currently shares the box with a GTX 1070. My (relatively) low labor content procedure to process the non-Turing tasks is:
1. I carry an appreciable queue depth, so I have well over a day's time in which to detect and act on the troublesome tasks.
2. Once a day I go to the Tasks tab in BoincTasks for this machine, double-click the ready to start list to expand it, and sort it by task name. That currently gets me offending tasks of the 21nnL flavor at one end of the list, and of the 0104? flavor at the other end. (if any)
3. If it were a Turing-only machine, I'd just abort any offending tasks at that point, but as I have a Pascal card in the machine, I instead suspend all tasks older than one I wish to process, then suspend the currently executing task on the Pascal card. As the offending task is then promptly started on the Pascal card, I can then undo all suspensions, and go on about my other interests until the next time I choose to reduce my offending task inventory.
They have come in irregular clusters. The scheduler tends to punish owners of machines with dissimilar cards by requesting no work for a while, then getting a big gulp. If some host which also gulps, but does not get work done by deadline has just generated a bunch of "time's up" returns, then I've gotten half a dozen or more at a time, but then go days with none.
I'm just happy that my Turing is working for Einstein again, and that Bernd sees some daylight for possible ways it might continue to do so with less operator intervention.
I set NNT all the time for Einstein, then when I am getting low, I request work. I can then review the tasks received for the offending types and abort them then and now. Depending on the work mix I can then reset NNT if I got enough work or I can again let the scheduler get me more work and hope the next download gains me more of the good type.
It requires manual intervention as your method but my resource share is low enough that a slug of Einstein work will last for several days . . . . as long as Seti doesn't have any extended upsets. (wishful thinking)
So is this the new plan of attack for Turing/Volta? Prevent sending incompatible tasks vice fixing the application itself?
Actually we will do both. The limiting factor is manpower, and producing and sending compatible workunits is the most efficient thing we can do right now.
The reason for this problem appears to be a new feature of Turing/Volta ("independent thread scheduling"), which we have very limited control of in OpenCL. We might again intensify our efforts to develop a CUDA version that will likely give us more performance on NVidia cards and solve this problem as well. But we need more time for this than what we currently have.
Have you been in contact with NVidia about the issue? I know someone on the board opened a trouble ticket with them over it crashing on Turing and IIRC got it escalated to engineering support. That suggests that they care about the failures at some level and may be able to provide assistance either in helping with a workaround, or getting sufficient detail of the code from you to understand what needs changed in their OpenCL libraries if it needs fixed on their end.
My quick read of "independent thread scheduling" suggests that it may leave open the possibility of deadlock occurring with locks/mutexes thus leading some other threads perpetually waiting for access they will never get. As each thread in a warp now has its own program counter and call stack then opportunity for such 'stalls' abound. This requires someone ( developer if CUDA ) or something ( scheduler if OpenCL ) to be smart enough to avoid this problem. As NVidia implements OpenCL using CUDA then it is a case of CUDA advances outreaching their current OpenCL library development. Sucks.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
We'll try to figure out a way to prevent sending the scheduler older tasks to such cards.
This should be in effect now. Such cards should get app versions with plan class "FGRPopenclTV-nvidia" and tasks from "old" WUs for these app versions should be rejected.
There is still the possibility that the problem in the app has nothing to do with the "independent thread scheduling", but I increasingly doubt it.
If you tell NVidias OpenCL compiler not to optimize at all (by passing "-cl-opt-disable"), a previously problematic task runs through (although noticeably slower). [FWIW even with optimization level 1 (instead of the default 3) the task locks up.]
How does the scheduler handle mixed Pascal and Turing cards in the same host? Will it see a Turing card
For some purposes, the BOINC software seems only to report one type of card. If the cards are not at the same Compute capability level, it is the higher one that gets reported. If so in a mixed system a Turing card (level 7.n) would get mentioned rather than Pascal (level 6.n), and only Turing-suitable tasks should be downloaded.
Of course, I'm half guessing and would be happy to be corrected.
Understand, Bernd. And
)
Understand, Bernd. And thanks for the clear answer. I have removed my gpu_exclude for my RTX 2080 for now and will watch out for any of the offending task types and hopefully abort them in time.
Keith Myers wrote:I have
)
My RTX 2080 has come up off the floor and has been back into the box for some days now. In my case, it currently shares the box with a GTX 1070. My (relatively) low labor content procedure to process the non-Turing tasks is:
1. I carry an appreciable queue depth, so I have well over a day's time in which to detect and act on the troublesome tasks.
2. Once a day I go to the Tasks tab in BoincTasks for this machine, double-click the ready to start list to expand it, and sort it by task name. That currently gets me offending tasks of the 21nnL flavor at one end of the list, and of the 0104? flavor at the other end. (if any)
3. If it were a Turing-only machine, I'd just abort any offending tasks at that point, but as I have a Pascal card in the machine, I instead suspend all tasks older than one I wish to process, then suspend the currently executing task on the Pascal card. As the offending task is then promptly started on the Pascal card, I can then undo all suspensions, and go on about my other interests until the next time I choose to reduce my offending task inventory.
They have come in irregular clusters. The scheduler tends to punish owners of machines with dissimilar cards by requesting no work for a while, then getting a big gulp. If some host which also gulps, but does not get work done by deadline has just generated a bunch of "time's up" returns, then I've gotten half a dozen or more at a time, but then go days with none.
I'm just happy that my Turing is working for Einstein again, and that Bernd sees some daylight for possible ways it might continue to do so with less operator intervention.
I set NNT all the time for
)
I set NNT all the time for Einstein, then when I am getting low, I request work. I can then review the tasks received for the offending types and abort them then and now. Depending on the work mix I can then reset NNT if I got enough work or I can again let the scheduler get me more work and hope the next download gains me more of the good type.
It requires manual intervention as your method but my resource share is low enough that a slug of Einstein work will last for several days . . . . as long as Seti doesn't have any extended upsets. (wishful thinking)
Bernd Machenschalk
)
Have you been in contact with NVidia about the issue? I know someone on the board opened a trouble ticket with them over it crashing on Turing and IIRC got it escalated to engineering support. That suggests that they care about the failures at some level and may be able to provide assistance either in helping with a workaround, or getting sufficient detail of the code from you to understand what needs changed in their OpenCL libraries if it needs fixed on their end.
My quick read of "independent
)
My quick read of "independent thread scheduling" suggests that it may leave open the possibility of deadlock occurring with locks/mutexes thus leading some other threads perpetually waiting for access they will never get. As each thread in a warp now has its own program counter and call stack then opportunity for such 'stalls' abound. This requires someone ( developer if CUDA ) or something ( scheduler if OpenCL ) to be smart enough to avoid this problem. As NVidia implements OpenCL using CUDA then it is a case of CUDA advances outreaching their current OpenCL library development. Sucks.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Bernd Machenschalk
)
This should be in effect now. Such cards should get app versions with plan class "FGRPopenclTV-nvidia" and tasks from "old" WUs for these app versions should be rejected.
BM
There is still the
)
There is still the possibility that the problem in the app has nothing to do with the "independent thread scheduling", but I increasingly doubt it.
If you tell NVidias OpenCL compiler not to optimize at all (by passing "-cl-opt-disable"), a previously problematic task runs through (although noticeably slower). [FWIW even with optimization level 1 (instead of the default 3) the task locks up.]
BM
How does the scheduler handle
)
How does the scheduler handle mixed Pascal and Turing cards in the same host? Will it see a Turing card and just send
"FGRPopenclTV-nvidia" only type from then on?
Keith Myers wrote:How does
)
For some purposes, the BOINC software seems only to report one type of card. If the cards are not at the same Compute capability level, it is the higher one that gets reported. If so in a mixed system a Turing card (level 7.n) would get mentioned rather than Pascal (level 6.n), and only Turing-suitable tasks should be downloaded.
Of course, I'm half guessing and would be happy to be corrected.
Good guess - you're right,
)
Good guess - you're right, plus there are extra factors considered lower down the priority order.
from https://github.com/BOINC/boinc/blob/master/client/gpu_nvidia.cpp#L134