Einstein FGRPB1G Linux/Nvidia Special app "AIO"

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 3061
Credit: 4965347686
RAC: 1415377

Boca Raton Community HS

Boca Raton Community HS wrote:

Quick update on the 4090 GPUs. Keep in mind, I am only running these systems during business hours right now. 

System 1:

Pending (374)
Valid (1201)
Invalid (139)
Error (0)

 

System 2:
Pending (352)
Valid (1030)
Invalid (110)
Error (0)
 

As suggested earlier, a good amount of invalids but not unacceptable (~10%). Running 2x work units on each. 

Out of curiosity, are you using "NVIDIA X Server Settings'?

And if so, what do you have your PowerMizer settings at?

And if not, maybe you should.  I find that with the Graphics Clock set to something less than +50, in your case maybe it should just remain at +0, the Invalids remain around ~ 5% (mine are at 5.2%).

Just a thought.

George

Proud member of the Old Farts Association

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 240
Credit: 10555145586
RAC: 25276346

GWGeorge007 wrote: Boca

GWGeorge007 wrote:

Boca Raton Community HS wrote:

Quick update on the 4090 GPUs. Keep in mind, I am only running these systems during business hours right now. 

System 1:

Pending (374)
Valid (1201)
Invalid (139)
Error (0)

 

System 2:
Pending (352)
Valid (1030)
Invalid (110)
Error (0)
 

As suggested earlier, a good amount of invalids but not unacceptable (~10%). Running 2x work units on each. 

Out of curiosity, are you using "NVIDIA X Server Settings'?

And if so, what do you have your PowerMizer settings at?

And if not, maybe you should.  I find that with the Graphics Clock set to something less than +50, in your case maybe it should just remain at +0, the Invalids remain around ~ 5% (mine are at 5.2%).

Just a thought.

 

I do use it. It is set to "adaptive". That is the normal setting when I power on and I usually never remember to change it. Either way, I do not increase the settings normally. It seems to hang out around level 3 (GC max = 3165 and MTR max = 20502). Even when I turn it to max performance, it still stays at level 3 (not sure if is is supposed to force it to level 4, but it does not move there). 

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 3061
Credit: 4965347686
RAC: 1415377

If your satisfied with it, or

If your satisfied with it, or maybe 'content' is a better choice of words, then that'll be okay...  I guess.

Otherwise, are they working well for your students?  I should say  "Are they ALL working well for your students?"

George

Proud member of the Old Farts Association

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 240
Credit: 10555145586
RAC: 25276346

I am not sure I am content

I am not sure I am content with ~10%, I would be happy to decrease that number but I think that has been in line with what the 4090 GPUs have been doing here (not that it is a good thing). 

And yes, they are working well, overall. We are using Mint for BOINC running E@H and then using a VM of Clear Linux for BOINC running WCG. Clear is just so fast for WCG tasks on these Threadripper systems. I tried Clear on a VM on a Xeon system and it was drastically slower. 

The other workstations (the TR Pro and Xeon workstations) are used more for the day-to-day by students while these have been pretty much designated to BOINC. Students can still use them for anything they need, but they are just not as familiar with Linux.  

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 3061
Credit: 4965347686
RAC: 1415377

Have you used, or tried to

Have you used, or tried to use, Clear on any of your Threadripper Pro boards?

I'm curios how fast Clear is with BOINC, Einstein in particular.

Also, did your students have any difficulties in setting up your TR-Pro boards?  ...to run BOINC?

How are they doing, BTW, with setting up BOINC in general?

George

Proud member of the Old Farts Association

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 240
Credit: 10555145586
RAC: 25276346

GWGeorge007 wrote: Have you

GWGeorge007 wrote:

Have you used, or tried to use, Clear on any of your Threadripper Pro boards?

We have not yet. Those were the Dell systems. When I have the time, I will try running a VM of Clear on one of them and do a comparison vs Windows 11 Pro. 

The students did not build the TR-Pro systems, only the TR 2970WX 24-Core systems. There were not too many issues along the way until we tried to get Clear to install on the system as the main OS. Clear did NOT want to boot with the 4090 GPUs and there is not a lot of support out there for how to get the Nvidia drivers to work with Clear. The support that was out there for the Nvidia driver install on Clear was admittedly above my head. 

The students have REALLY enjoyed our BOINC/WCG work. We have been able to talk to the director of WCG along with reaching out and communicating with some of the individual researchers who do work on WCG. We then focused on building the two new workstations to run BOINC projects. 

The only systems that we are running Petri's app are the student-built systems and I can thank everyone for helping us overcome technical issues to get them working!

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18715986062
RAC: 6371702

Boca Raton Community HS

Boca Raton Community HS wrote:

I do use it. It is set to "adaptive". That is the normal setting when I power on and I usually never remember to change it. Either way, I do not increase the settings normally. It seems to hang out around level 3 (GC max = 3165 and MTR max = 20502). Even when I turn it to max performance, it still stays at level 3 (not sure if is is supposed to force it to level 4, but it does not move there). 

I don't know what the Level 4 clocks of the Ada Lovelace cards are.  Pretty impressive that under the compute penalty from the drivers that the MTR is still above 20GT/s.

You can't force any Nvidia card to use its highest performance level when doing compute.  Nvidia deliberately hamstrings all consumer cards except for the lowest of the low performance cards.

Normally the penalty is a couple hundred megahertz of the MTR clock. Only about 600Mhz for the Ampere cards.  Likely similar for Ada.  Can be as high as  1400Mhz penalty in the MTR clocks for Pascal generation cards.

The only thing you can do is add back a positive offset in the MTR clock to take the clocks back to Level 4 performance in Level 3 mode where all cards run under a compute load.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46760702642
RAC: 64111521

I was actually going to

I was actually going to suggest a further negative offset for memory speed. As a test to see if the stock memory speeds were too fast and the cause of the invalids for the 4090s. Or even a negative offset on the clock speed too. If those don’t help at all then maybe it’s the kernel parameters that aren’t well tuned for Ada. 

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18715986062
RAC: 6371702

My bet is on the kernel

My bet is on the kernel parameters that Petri came up with in the EAH_SLEEP file.  The SM count is so dramatically larger for Ada than any previous generation,  I almost wonder if there is some kind of overflow condition that some of the parameters fall into.

 

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 240
Credit: 10555145586
RAC: 25276346

Question- I have been

Question- I have been watching my results (valid/invalids) on the 4090 systems and I would like to slow the memory clock down but have been having trouble with this, even though I don't think it should be that complicated. 

What I have done: 

In Nvidia X Server, the clock options were disabled. 

I then tried GreenWithEnvy and it told me to enable Coolbits 8. I did this successfully. 

In Nvidia X Server, the clock options now show up. I can type in new numbers but I don't think it saves or changes anything. 

I tried using GreenWithEnvy to slow down the memory clock by 200. System crashed. 

Why can't I save changes in Nvidia X Server? I would like to get under the ~10% invalid mark I am at with these systems. 

Thank you to everyone!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.