Short version: the major update to Windows 10 that I installed today seems to have fixed the leak behavior.
Long Version:
I've been rebooting every day or two, partly as a means to manage the memory leak problem on my 1060/1070 machine. After today's reboot, Windows advised me it had an update ready to install, and that I could either restart, accept the offered scheduled install time, or schedule a different one. Thinking this would be a short matter, I selected immediate restart. As soon as it got going, it warned me that this would take a while, and involve multiple reboots. In the event it took almost an hour. I think this may be the "Anniversary Update" which has been much in the news. But the description actually given was "Feature update to Windows 10, version 1607".
After a couple of hours of steady Einstein running (3X BRP6/CUDA55 tasks on each of a 1070 and 1060) after completing the update, I see no progressive increase in paged pool bytes, no steady decline in available bytes, and no relentless accumulation of Vi12 tagged pool bytes. Maybe somehow this update fixed this behavior.
I have another three Windows 10 machines, but only one includes a Maxwell2 or Pascal card and has been showing the memory leak symptoms. I'll do this update on that machine soon, and report whether the leak behavior has also changed.
It would be helpful if others here who have observed similar memory leak symptoms on Windows 10 Maxwell2 or Pascal systems would check before and after this update to see if they have any such improvement.
The second of my two machines which was having the Vi12 tagged pool memory leak has also apparently been healed with a Window 10 Anniversary update. Oddly enough, in that case, when I first ran it as updated it still leaked, with the 359.06 driver then on it (at the suggestion of someone here). But I've rebooted after updating to the current (372.54) and now see none of the three primary symptoms of the problem.
The possibility that this particular update made a difference is enhanced by the observation that it installed new version of both of the .sys driver files which contain the Vi12 tag.
driver old_size new_size
dxgmms1 393,056 402,272
dxgmms2 576,864 658,784
The Windows 10 Anniversary update has been in slowly rolling distribution since August 2. Published reports shows under 20% of system converted as of a couple of days ago, and as "born" Windows 10 system are getting the update sooner than upgraded ones, it is likely you don't have the update if you have not noticed. If want to try it, you might consider this rather detailed description of a simple means to force the process, together with running commentary on how it tends to look.
As we have had some inconsistent results on "switches" which enable or disable this behavior, I'm cautious at the optimistic view that the combination of up-to-date Nvidia driver plus the Windows 10 Anniversary update heals the problem, but these are the best results I've gotten to date.
Both machines are currently running BRP6/CUDA55 work with the "test" or "beta" application designation.
...it installed new version of both of the .sys driver files which contain the Vi12 tag.
driver old_size new_size
dxgmms1 393,056 402,272
dxgmms2 576,864 658,784
Hmm, build 14905.1000 seems to have those file sizes closer to new_size than old_size.
dxgmms1 398,096 , dxgmms2 651,024
My host with a GTX 670 and 372.54 is currently having memory leak with these, but this week should bring out a new test build. I'll monitor if something will change.
edit:
Build 14915.1000, Nvidia driver 372.70, GTX 670. Memory still leaking.
dxgmms1 399,632 , dxgmms2 649,488
edit 2:
Build 14926.1000, Nvidia driver 372.70, GTX 670. Memory still leaking..
A few versions later... build 14955 and Nvidia driver 375.70 in five hosts.
While running the current app (Binary Radio Pulsar Search (Arecibo, GPU) v1.57 (BRP4G-Beta-cuda55) windows_intelx86)... Paged Pool is being raped on all three machines that have a GTX 960. Hosts with GTX 760 or GTX 670 are not affected. This was also happening with Nvidia driver 375.63
Interestingly... at some point in the past, situation was the other way around with those cards.
Perhaps I have very good
)
Perhaps I have very good news.
Short version: the major update to Windows 10 that I installed today seems to have fixed the leak behavior.
Long Version:
I've been rebooting every day or two, partly as a means to manage the memory leak problem on my 1060/1070 machine. After today's reboot, Windows advised me it had an update ready to install, and that I could either restart, accept the offered scheduled install time, or schedule a different one. Thinking this would be a short matter, I selected immediate restart. As soon as it got going, it warned me that this would take a while, and involve multiple reboots. In the event it took almost an hour. I think this may be the "Anniversary Update" which has been much in the news. But the description actually given was "Feature update to Windows 10, version 1607".
After a couple of hours of steady Einstein running (3X BRP6/CUDA55 tasks on each of a 1070 and 1060) after completing the update, I see no progressive increase in paged pool bytes, no steady decline in available bytes, and no relentless accumulation of Vi12 tagged pool bytes. Maybe somehow this update fixed this behavior.
I have another three Windows 10 machines, but only one includes a Maxwell2 or Pascal card and has been showing the memory leak symptoms. I'll do this update on that machine soon, and report whether the leak behavior has also changed.
It would be helpful if others here who have observed similar memory leak symptoms on Windows 10 Maxwell2 or Pascal systems would check before and after this update to see if they have any such improvement.
The second of my two machines
)
The second of my two machines which was having the Vi12 tagged pool memory leak has also apparently been healed with a Window 10 Anniversary update. Oddly enough, in that case, when I first ran it as updated it still leaked, with the 359.06 driver then on it (at the suggestion of someone here). But I've rebooted after updating to the current (372.54) and now see none of the three primary symptoms of the problem.
The possibility that this particular update made a difference is enhanced by the observation that it installed new version of both of the .sys driver files which contain the Vi12 tag.
driver old_size new_size
dxgmms1 393,056 402,272
dxgmms2 576,864 658,784
The Windows 10 Anniversary update has been in slowly rolling distribution since August 2. Published reports shows under 20% of system converted as of a couple of days ago, and as "born" Windows 10 system are getting the update sooner than upgraded ones, it is likely you don't have the update if you have not noticed. If want to try it, you might consider this rather detailed description of a simple means to force the process, together with running commentary on how it tends to look.
As we have had some inconsistent results on "switches" which enable or disable this behavior, I'm cautious at the optimistic view that the combination of up-to-date Nvidia driver plus the Windows 10 Anniversary update heals the problem, but these are the best results I've gotten to date.
Both machines are currently running BRP6/CUDA55 work with the "test" or "beta" application designation.
archae86 wrote:healed with a
)
Hmm, build 14905.1000 seems to have those file sizes closer to new_size than old_size.
dxgmms1 398,096 , dxgmms2 651,024
My host with a GTX 670 and 372.54 is currently having memory leak with these, but this week should bring out a new test build. I'll monitor if something will change.
edit:
Build 14915.1000, Nvidia driver 372.70, GTX 670. Memory still leaking.
dxgmms1 399,632 , dxgmms2 649,488
edit 2:
Build 14926.1000, Nvidia driver 372.70, GTX 670. Memory still leaking..
dxgmms1 399,632 , dxgmms2 648,464
Just a little update. A few
)
Just a little update.
A few versions later... build 14955 and Nvidia driver 375.70 in five hosts.
While running the current app (Binary Radio Pulsar Search (Arecibo, GPU) v1.57 (BRP4G-Beta-cuda55) windows_intelx86)... Paged Pool is being raped on all three machines that have a GTX 960. Hosts with GTX 760 or GTX 670 are not affected. This was also happening with Nvidia driver 375.63
Interestingly... at some point in the past, situation was the other way around with those cards.