FGRPB1G downclocks memory

shuhui1990
shuhui1990
Joined: 16 Sep 06
Posts: 27
Credit: 3631456971
RAC: 0
Topic 217922

I read that Einstein@home is bottlenecked by GPU memory bandwidth. However I discovered that FGRPB1G downclocks the memory when it runs.

The default memory clock of GTX 1080 Ti is 5600 MHz. When FGRPB1G starts it downclocks to 5100 MHz. If I overclock the memory by 1000 MHz in MSI Afterburner, the memory clock becomes 6600 MHz. As soon as FGRPB1G starts the memory clock drops to 6100 MHz.

https://drive.google.com/file/d/1ZR00QFjqqFtzTb0_4Fjp_lj6Q2NNsD2u/view

Another example. The default memory clock of GTX 980 Ti is 3500 MHz. When FGRPB1G runs it downclocks to 3300 MHz. If I overclock the memory by 400 MHz, the memory clock becomes 3900 MHz in gaming benchmark. But when FGRPB1G runs it stays at 3300 MHz.

Is this a bug? Einstein@home is definitely memory bandwidth dependent. I don't know why it downclocks the memory. Below are the test results of my GTX 1080 Ti on LATeah2008L tasks. Numbers are averaged over a couple tasks. Overclocking the memory by 20% increases power consumption by 6%, but shortens the time by 4.5%.

Concurrency Memory Clock/MHz Power/W Temperature/℃ Total Time/s Time per WU/s
3 6100 212 47 966 322 
3 5100 200 46 1010 337 
2 6100 201 45 689 345 
2 5100 191 44 713 357 

BTW Vega 64 has a memory bandwidth of 484 GB/s and a RAC between 110k and 150k. Since Radeon VII has a memory bandwidth of 1024 GB/s, should we expect a RAC about 250k on a Radeon VII?

 

Added 15 Jan 2019 18:54:45 UTC

Apparently a stable overclock for gaming is not a stable overclock for crunching. Since Jan 14th I have gotten 169 invalid results.

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

It's not the work units that

It's not the work units that downclock the memory, it's Nvidia.  Nvidia has stated on their website that when GPU recognizes a scientific work unit, the GPU is moved to P2 states which, by default, is a lower GPU speed and memory so that the scientific work units don't get corrupted by the normal P0 state levels.  AMD doesn't have this issue.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

I'm not sure if this works

I'm not sure if this works with GTX 10xx series cards but with 9xx you can use a program called Nvidia Inspector to adjust the P2 mem clock to max. https://www.guru3d.com/files-details/nvidia-inspector-download.html

I'm still using version 1.9.7.8 as I recall there was something strange with installing or using the newer version, but might have been just a user error by me.

Show overclocking --- Overclocking : Performance Level (2)-(P2) - Memory Clock Offset ... move slider to max ... and Apply Clocks & Voltage ... and you can see mem bandwidth (Bus Width GB/s) on the left side will change --- Exit

shuhui1990
shuhui1990
Joined: 16 Sep 06
Posts: 27
Credit: 3631456971
RAC: 0

Thank you for pointing it

Thank you for pointing it out. Could you elaborate on "get corrupted by the normal P0 state levels"?

Now that I got 169 invalid results since yesterday, I think the P2 state has a point. An overclocked memory at 6100 MHz should be stable for games. But obviously Einstein@home doesn't think so. Need to find a sweet spot between shorter finished time and failure rate.

shuhui1990
shuhui1990
Joined: 16 Sep 06
Posts: 27
Credit: 3631456971
RAC: 0

Thanks. It seems the P2 state

Thanks. It seems the P2 state can be disabled.

https://www.reddit.com/r/RenderToken/comments/9w2rd9/how_to_use_maximum_p0_power_state_with_nvidia/

However I got lots of invalid results by going 500 MHz above the base memory clock (1000 MHz above P2). Do you have any recommendations on a safe overclock relative to the base clock?

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

shuhui1990 wrote:Thank you

shuhui1990 wrote:

Thank you for pointing it out. Could you elaborate on "get corrupted by the normal P0 state levels"?

Now that I got 169 invalid results since yesterday, I think the P2 state has a point. An overclocked memory at 6100 MHz should be stable for games. But obviously Einstein@home doesn't think so. Need to find a sweet spot between shorter finished time and failure rate.

 

Without going into to much detail, P0 states are fine for gaming. No one really cares if you are dropping small bits of data here and there. It will get overshadowed quickly as the screen changes.  However, when doing scientific work, any error in calculations will result in the entire work unit being corrupted by an error in processing. Nvidia knows this so to prevent any errors from occurring they restrict scientific processes to P2 state with slower speeds so that no corruption gets incorporated in the analysis.  Remember these are gaming cards not scientific cards as opposed to Tesla cards.  If you want more info you can google P0 vs P2 states and come up with Miners talking about the difference, etc.. This is just a quick explanation in a nutshell.

shuhui1990
shuhui1990
Joined: 16 Sep 06
Posts: 27
Credit: 3631456971
RAC: 0

Thanks. I do understand when

Thanks. I do understand when memory clock crosses a certain point the error rate increases exponentially with clock speed. So it seems to me the P2 memory clock was the absolutely safe clock with zero error while the P0 memory clock was already "officially overclocked" with few errors but fine for gaming.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023174931
RAC: 1830249

The only way to find the

The only way to find the failure threshold for a particular sample of a particular card running a particular application is large-scale testing.

I've done this.  The answer varies from card to card of the same make and model.  So don't trust anyone who gives you a number.

The other side of the coin is "how much benefit?".  Several generations ago when the fact that Maxwell2 generation cards downclocked memory a lot became known, there was an appreciable gain available for Einstein performance by tampering.  I had sworn off CPU overclock several years before but got into GPU overclocking for the first time on that occasion.

I think you may find that the current Einstein application gives less performance improvement with memory overclocking than you might suppose, making it a bit questionable whether it is worth the time and effort to find a safe overclock.

Also, as not all the data sets are the same, there is no guarantee that a carefully found just barely safe operating point will stay safe into the future.

Warnings aside, I personally do overclock, but I do it by slowly creeping up in clock rate until I find error, then backing down until I find a rate that gives zero errors in 24 hours, then back down two more increments.

Your preferred method will vary, naturally.

 

shuhui1990
shuhui1990
Joined: 16 Sep 06
Posts: 27
Credit: 3631456971
RAC: 0

I was gonna take your method.

I was gonna take your method. I do hope that there is an application that tests vram fidelity like MemTest for ram so I don't need to screw up E@H tasks.

Did you see performance improvement with core clock overclocking?

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023174931
RAC: 1830249

shuhui1990 wrote:Did you see

shuhui1990 wrote:
Did you see performance improvement with core clock overclocking?

Yes, but again somewhat less than one might suppose.  But more than the memory clock for recent cards and the current Einstein application, if memory serves.

rjs5
rjs5
Joined: 3 Jul 05
Posts: 32
Credit: 404850962
RAC: 1308454

Do you know if there is any

Do you know if there is any app_config options to pass a command line option to disable the throttling?

Do you think this might be related to the Nvidia Series 20 problems?

I took a GPUZ log at the point of the EAH screen blank and the it seems like the GPU and memory frequencies are dropped by more than a 1gz. I was not aware that EAH messed around with the frequency. If EAH lowers the frequency too much, it might be the source from the time out.

Thoughts?

 

        Date              GPU Core Clock [MHz]   GPU Memory Clock [MHz]   GPU Temperature [°C] 
26:30.2 1350 1750 64
26:30.5 1350 1750 64
26:30.9 1350 1750 64
26:31.2 435 * 101.3 * 63 *
26:31.5 435 101.3 63
26:31.8 435 101.3 63
26:32.1 330 101.3 63
26:32.4 330 101.3 63

 

archae86 wrote:
shuhui1990 wrote:
Did you see performance improvement with core clock overclocking?
Yes, but again somewhat less than one might suppose.  But more than the memory clock for recent cards and the current Einstein application, if memory serves.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.