System 2:
Pending (352)
Valid (1030)
Invalid (110)
Error (0)
As suggested earlier, a good amount of invalids but not unacceptable (~10%). Running 2x work units on each.
Out of curiosity, are you using "NVIDIA X Server Settings'?
And if so, what do you have your PowerMizer settings at?
And if not, maybe you should. I find that with the Graphics Clock set to something less than +50, in your case maybe it should just remain at +0, the Invalids remain around ~ 5% (mine are at 5.2%).
System 2:
Pending (352)
Valid (1030)
Invalid (110)
Error (0)
As suggested earlier, a good amount of invalids but not unacceptable (~10%). Running 2x work units on each.
Out of curiosity, are you using "NVIDIA X Server Settings'?
And if so, what do you have your PowerMizer settings at?
And if not, maybe you should. I find that with the Graphics Clock set to something less than +50, in your case maybe it should just remain at +0, the Invalids remain around ~ 5% (mine are at 5.2%).
Just a thought.
I do use it. It is set to "adaptive". That is the normal setting when I power on and I usually never remember to change it. Either way, I do not increase the settings normally. It seems to hang out around level 3 (GC max = 3165 and MTR max = 20502). Even when I turn it to max performance, it still stays at level 3 (not sure if is is supposed to force it to level 4, but it does not move there).
I am not sure I am content with ~10%, I would be happy to decrease that number but I think that has been in line with what the 4090 GPUs have been doing here (not that it is a good thing).
And yes, they are working well, overall. We are using Mint for BOINC running E@H and then using a VM of Clear Linux for BOINC running WCG. Clear is just so fast for WCG tasks on these Threadripper systems. I tried Clear on a VM on a Xeon system and it was drastically slower.
The other workstations (the TR Pro and Xeon workstations) are used more for the day-to-day by students while these have been pretty much designated to BOINC. Students can still use them for anything they need, but they are just not as familiar with Linux.
Have you used, or tried to use, Clear on any of your Threadripper Pro boards?
We have not yet. Those were the Dell systems. When I have the time, I will try running a VM of Clear on one of them and do a comparison vs Windows 11 Pro.
The students did not build the TR-Pro systems, only the TR 2970WX 24-Core systems. There were not too many issues along the way until we tried to get Clear to install on the system as the main OS. Clear did NOT want to boot with the 4090 GPUs and there is not a lot of support out there for how to get the Nvidia drivers to work with Clear. The support that was out there for the Nvidia driver install on Clear was admittedly above my head.
The students have REALLY enjoyed our BOINC/WCG work. We have been able to talk to the director of WCG along with reaching out and communicating with some of the individual researchers who do work on WCG. We then focused on building the two new workstations to run BOINC projects.
The only systems that we are running Petri's app are the student-built systems and I can thank everyone for helping us overcome technical issues to get them working!
I do use it. It is set to "adaptive". That is the normal setting when I power on and I usually never remember to change it. Either way, I do not increase the settings normally. It seems to hang out around level 3 (GC max = 3165 and MTR max = 20502). Even when I turn it to max performance, it still stays at level 3 (not sure if is is supposed to force it to level 4, but it does not move there).
I don't know what the Level 4 clocks of the Ada Lovelace cards are. Pretty impressive that under the compute penalty from the drivers that the MTR is still above 20GT/s.
You can't force any Nvidia card to use its highest performance level when doing compute. Nvidia deliberately hamstrings all consumer cards except for the lowest of the low performance cards.
Normally the penalty is a couple hundred megahertz of the MTR clock. Only about 600Mhz for the Ampere cards. Likely similar for Ada. Can be as high as 1400Mhz penalty in the MTR clocks for Pascal generation cards.
The only thing you can do is add back a positive offset in the MTR clock to take the clocks back to Level 4 performance in Level 3 mode where all cards run under a compute load.
I was actually going to suggest a further negative offset for memory speed. As a test to see if the stock memory speeds were too fast and the cause of the invalids for the 4090s. Or even a negative offset on the clock speed too. If those don’t help at all then maybe it’s the kernel parameters that aren’t well tuned for Ada.
My bet is on the kernel parameters that Petri came up with in the EAH_SLEEP file. The SM count is so dramatically larger for Ada than any previous generation, I almost wonder if there is some kind of overflow condition that some of the parameters fall into.
Question- I have been watching my results (valid/invalids) on the 4090 systems and I would like to slow the memory clock down but have been having trouble with this, even though I don't think it should be that complicated.
What I have done:
In Nvidia X Server, the clock options were disabled.
I then tried GreenWithEnvy and it told me to enable Coolbits 8. I did this successfully.
In Nvidia X Server, the clock options now show up. I can type in new numbers but I don't think it saves or changes anything.
I tried using GreenWithEnvy to slow down the memory clock by 200. System crashed.
Why can't I save changes in Nvidia X Server? I would like to get under the ~10% invalid mark I am at with these systems.
Boca Raton Community HS
)
Out of curiosity, are you using "NVIDIA X Server Settings'?
And if so, what do you have your PowerMizer settings at?
And if not, maybe you should. I find that with the Graphics Clock set to something less than +50, in your case maybe it should just remain at +0, the Invalids remain around ~ 5% (mine are at 5.2%).
Just a thought.
Proud member of the Old Farts Association
GWGeorge007 wrote: Boca
)
I do use it. It is set to "adaptive". That is the normal setting when I power on and I usually never remember to change it. Either way, I do not increase the settings normally. It seems to hang out around level 3 (GC max = 3165 and MTR max = 20502). Even when I turn it to max performance, it still stays at level 3 (not sure if is is supposed to force it to level 4, but it does not move there).
If your satisfied with it, or
)
If your satisfied with it, or maybe 'content' is a better choice of words, then that'll be okay... I guess.
Otherwise, are they working well for your students? I should say "Are they ALL working well for your students?"
Proud member of the Old Farts Association
I am not sure I am content
)
I am not sure I am content with ~10%, I would be happy to decrease that number but I think that has been in line with what the 4090 GPUs have been doing here (not that it is a good thing).
And yes, they are working well, overall. We are using Mint for BOINC running E@H and then using a VM of Clear Linux for BOINC running WCG. Clear is just so fast for WCG tasks on these Threadripper systems. I tried Clear on a VM on a Xeon system and it was drastically slower.
The other workstations (the TR Pro and Xeon workstations) are used more for the day-to-day by students while these have been pretty much designated to BOINC. Students can still use them for anything they need, but they are just not as familiar with Linux.
Have you used, or tried to
)
Have you used, or tried to use, Clear on any of your Threadripper Pro boards?
I'm curios how fast Clear is with BOINC, Einstein in particular.
Also, did your students have any difficulties in setting up your TR-Pro boards? ...to run BOINC?
How are they doing, BTW, with setting up BOINC in general?
Proud member of the Old Farts Association
GWGeorge007 wrote: Have you
)
We have not yet. Those were the Dell systems. When I have the time, I will try running a VM of Clear on one of them and do a comparison vs Windows 11 Pro.
The students did not build the TR-Pro systems, only the TR 2970WX 24-Core systems. There were not too many issues along the way until we tried to get Clear to install on the system as the main OS. Clear did NOT want to boot with the 4090 GPUs and there is not a lot of support out there for how to get the Nvidia drivers to work with Clear. The support that was out there for the Nvidia driver install on Clear was admittedly above my head.
The students have REALLY enjoyed our BOINC/WCG work. We have been able to talk to the director of WCG along with reaching out and communicating with some of the individual researchers who do work on WCG. We then focused on building the two new workstations to run BOINC projects.
The only systems that we are running Petri's app are the student-built systems and I can thank everyone for helping us overcome technical issues to get them working!
Boca Raton Community HS
)
I don't know what the Level 4 clocks of the Ada Lovelace cards are. Pretty impressive that under the compute penalty from the drivers that the MTR is still above 20GT/s.
You can't force any Nvidia card to use its highest performance level when doing compute. Nvidia deliberately hamstrings all consumer cards except for the lowest of the low performance cards.
Normally the penalty is a couple hundred megahertz of the MTR clock. Only about 600Mhz for the Ampere cards. Likely similar for Ada. Can be as high as 1400Mhz penalty in the MTR clocks for Pascal generation cards.
The only thing you can do is add back a positive offset in the MTR clock to take the clocks back to Level 4 performance in Level 3 mode where all cards run under a compute load.
I was actually going to
)
I was actually going to suggest a further negative offset for memory speed. As a test to see if the stock memory speeds were too fast and the cause of the invalids for the 4090s. Or even a negative offset on the clock speed too. If those don’t help at all then maybe it’s the kernel parameters that aren’t well tuned for Ada.
_________________________________________________________________________
My bet is on the kernel
)
My bet is on the kernel parameters that Petri came up with in the EAH_SLEEP file. The SM count is so dramatically larger for Ada than any previous generation, I almost wonder if there is some kind of overflow condition that some of the parameters fall into.
Question- I have been
)
Question- I have been watching my results (valid/invalids) on the 4090 systems and I would like to slow the memory clock down but have been having trouble with this, even though I don't think it should be that complicated.
What I have done:
In Nvidia X Server, the clock options were disabled.
I then tried GreenWithEnvy and it told me to enable Coolbits 8. I did this successfully.
In Nvidia X Server, the clock options now show up. I can type in new numbers but I don't think it saves or changes anything.
I tried using GreenWithEnvy to slow down the memory clock by 200. System crashed.
Why can't I save changes in Nvidia X Server? I would like to get under the ~10% invalid mark I am at with these systems.
Thank you to everyone!