I tried suspending all CPU computation and running 2 concurrent BRP7s offset about 1/3 but it made no difference, they still took over 15 min. to complete.
I just got to looking at Marcelo's Windows Titan V system.
It looks like he is also running his Brp7/meerKat at 1x.
Which is a disturbing indicator.
I may have to take a Titan V out of my Epyc box and put it into my Windows box to see what I can see.
Hmmmm......
Tom M
===edit==
The vote of the Top 50 systems running Nvidia Windows boxes is nearly unanimous. 1x for Brp7/meerKat. One exception I ran across was running more than twice the time of 300s on his Brp7/meerKat tasks.
So I am baffled and bemused.
Respectfully,
A Proud member of the O.F.A. (Old Farts Association).
Why do you have a project max of zero? I assume the task max are ways of shutting off specific tasks? If not I would take those out too.
==edit==
If you are not restarting Boinc after the change. Please run the "read config files" again on the boincmgr menu.
==end edit===
Are you running the current Nvidia drivers for Windows?
How about not Overclocking the Titan V GPU?
Under Linux we have not had much luck OCing this Titan V.
Still scratching my head about it being slower at 2x.
Respectfully.
0 (zero) in max concurrent means no limit. I don't think there's a way to shut off specific tasks via app_config, has to be done thru website. I tend to have those lines in app_config files for ease of adjustments.
My drivers are pretty recent, I updated them within the last 2 weeks or so.
I tried overclocking to see if I can improve anything since I couldn't run more than 1 concurrent. Tweaking GPU clock didn't make a difference but boosting GPU RAM clock did have a significant difference (for BRP7) so I kept it. I use MSI Afterburner.
I'll get some GPU stats for BRP7 and post them later.
I tried doubling up on O3AS again and it seems to be break even. I'll let it run for a bit longer to get more samples to see if it might be slightly better on average. Last time I tried doubling up on O3AS was before the recent change with that sub-project.
I have temporarily moved a Titan V to my Daily Driver.
I did an custom install of both the Content Creation and the Gamer drivers.
Neither of them shows a "Cuda" entry in the Task Manager. The only entry showing any action is 3D occasionally. The Gpu Load is zero for the Task Manager.
===edit==
If you disable the "hardware gpu accelerate" in Settings. Cuda and Gpu load come back. Watching now to see if anything changes in the actual processing times for 1x Brp7/meerKat.
===end edit==
The Titan V is running brp7/meerkat 1x in about the right time frame.
nvidia-smi from the command line does show gpu load etc.
Microsoft Windows [Version 10.0.26100.2894]
(c) Microsoft Corporation. All rights reserved.
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 17268 C ...9_windows_x86_64__BRP7-cuda102W.exe N/A |
+-----------------------------------------------------------------------------------------+
C:\Users\tlgal>
I will switch to 2x later.
I am currently running: Binary Radio Pulsar Search (MeerKAT) 0.19 (BRP7-cuda102)
A Proud member of the O.F.A. (Old Farts Association).
Just tried some MeerKAT variety on our Windows machines running nvidia A4500 GPUs. Here is what I saw (wall clock times):
1x = 440 sec/task
2x = 410 sec/task
3x = 410 sec/task
I did not test this over a massive amount of work units, only about 10 each.
I know the Titan V is not the same as the A4500, but I think you should still see improvement running 2x. There could be a different factor at play that I/we are not seeing.
Just tried some MeerKAT variety on our Windows machines running nvidia A4500 GPUs. Here is what I saw (wall clock times):
1x = 440 sec/task
2x = 410 sec/task
3x = 410 sec/task
I did not test this over a massive amount of work units, only about 10 each.
I know the Titan V is not the same as the A4500, but I think you should still see improvement running 2x. There could be a different factor at play that I/we are not seeing.
you mean "effective" task times. not wall clock times.
if it ran ~410s wall clock at 2x, that would be ~205s effective time.
but i can see from your recent tasks that wall clock was actually 800-ish and 1200-ish wall clock time for the 2x and 3x tests.
Just tried some MeerKAT variety on our Windows machines running nvidia A4500 GPUs. Here is what I saw (wall clock times):
1x = 440 sec/task
2x = 410 sec/task
3x = 410 sec/task
I did not test this over a massive amount of work units, only about 10 each.
I know the Titan V is not the same as the A4500, but I think you should still see improvement running 2x. There could be a different factor at play that I/we are not seeing.
you mean "effective" task times. not wall clock times.
if it ran ~410s wall clock at 2x, that would be ~205s effective time.
but i can see from your recent tasks that wall clock was actually 800-ish and 1200-ish wall clock time for the 2x and 3x tests.
Yes, sorry about that. Miswording on my part. Those times were effective, not wall clock.
I am currently getting between 308s and 360s on my Windows/Titan 5 at 1x.
Will report more later.
===edit===
Additional conversation has convinced me that 1x is the only choice for Windows and Brp7/meerKat.
I have tinkered with the current available Microsoft Windows driver, and the current release of both the gamer and content creator drivers.
I THINK the gamer driver might be slightly faster. Those results were 1x.
I am now getting to run the "other" application on 1x. It used to get better milage at 2x back when there was a significant pause in the gpu processing for cpu processing (twice). It is entirely possible that advantage has gone away. But even at 1x it might produce more Windows production (aka: higher RAC) than Brp7/meerkat is producing.
I want to wring all the data I can out of this Windows/Titan V combo before I move it back to my main crunching machine.
Respectfully,
A Proud member of the O.F.A. (Old Farts Association).
Can you post the GPU stats (gpu usage, mem usage, watts, etc) when running at 1x and then at 2x when the work units are mid-run?
Here're links to screenshots of HWiINFO GPU data running BRP7 single and double for about half an hour each, doubles were staggered at about 50%. BRP7 2x, BRP7 1x
I got a big enough sample size of O3AS running doubles (started out staggered at about 50%). The average of almost 60 tasks is 1525 sec/task. That's .2% slower than the fastest single task I have (1522 sec) and 15.7% faster than the slowest (1809 sec). So on average it seems to be ~8.5% faster time per task running doubles with this new version of O3AS tasks.
If my arithmetic is right. The O3AS 2x could produce a ~1.3M RAC on your system.
I just got done running it overnight at 1x and it looks like it could run ~997,000 RAC on my Windows system.
This RAC is significantly better than the very best Brp7/meeKat RAC on my Windows box which I calculated out to be ~934,000 RAC.
Looks like I will switch to 2x on O3AS. And then we can probably say. "It looks like currently the new O3AS 2x is probably the way to go on any Windows Nvidia gpu system".
:)
A Proud member of the O.F.A. (Old Farts Association).
AndreyOR wrote:I tried
)
I just got to looking at Marcelo's Windows Titan V system.
It looks like he is also running his Brp7/meerKat at 1x.
Which is a disturbing indicator.
I may have to take a Titan V out of my Epyc box and put it into my Windows box to see what I can see.
Hmmmm......
Tom M
===edit==
The vote of the Top 50 systems running Nvidia Windows boxes is nearly unanimous. 1x for Brp7/meerKat. One exception I ran across was running more than twice the time of 300s on his Brp7/meerKat tasks.
So I am baffled and bemused.
Respectfully,
A Proud member of the O.F.A. (Old Farts Association).
Tom M wrote: Why do you have
)
0 (zero) in max concurrent means no limit. I don't think there's a way to shut off specific tasks via app_config, has to be done thru website. I tend to have those lines in app_config files for ease of adjustments.
My drivers are pretty recent, I updated them within the last 2 weeks or so.
I tried overclocking to see if I can improve anything since I couldn't run more than 1 concurrent. Tweaking GPU clock didn't make a difference but boosting GPU RAM clock did have a significant difference (for BRP7) so I kept it. I use MSI Afterburner.
I'll get some GPU stats for BRP7 and post them later.
I tried doubling up on O3AS again and it seems to be break even. I'll let it run for a bit longer to get more samples to see if it might be slightly better on average. Last time I tried doubling up on O3AS was before the recent change with that sub-project.
I have temporarily moved a
)
I have temporarily moved a Titan V to my Daily Driver.
I did an custom install of both the Content Creation and the Gamer drivers.
Neither of them shows a "Cuda" entry in the Task Manager. The only entry showing any action is 3D occasionally. The Gpu Load is zero for the Task Manager.
===edit==
If you disable the "hardware gpu accelerate" in Settings. Cuda and Gpu load come back. Watching now to see if anything changes in the actual processing times for 1x Brp7/meerKat.
===end edit==
The Titan V is running brp7/meerkat 1x in about the right time frame.
nvidia-smi from the command line does show gpu load etc.
I will switch to 2x later.
I am currently running: Binary Radio Pulsar Search (MeerKAT) 0.19 (BRP7-cuda102)
A Proud member of the O.F.A. (Old Farts Association).
Just tried some MeerKAT
)
Just tried some MeerKAT variety on our Windows machines running nvidia A4500 GPUs. Here is what I saw (wall clock times):
1x = 440 sec/task
2x = 410 sec/task
3x = 410 sec/task
I did not test this over a massive amount of work units, only about 10 each.
I know the Titan V is not the same as the A4500, but I think you should still see improvement running 2x. There could be a different factor at play that I/we are not seeing.
Boca Raton Community HS
)
you mean "effective" task times. not wall clock times.
if it ran ~410s wall clock at 2x, that would be ~205s effective time.
but i can see from your recent tasks that wall clock was actually 800-ish and 1200-ish wall clock time for the 2x and 3x tests.
_________________________________________________________________________
Ian&Steve C. wrote:Boca
)
Yes, sorry about that. Miswording on my part. Those times were effective, not wall clock.
I am currently getting
)
I am currently getting between 308s and 360s on my Windows/Titan 5 at 1x.
Will report more later.
===edit===
Additional conversation has convinced me that 1x is the only choice for Windows and Brp7/meerKat.
I have tinkered with the current available Microsoft Windows driver, and the current release of both the gamer and content creator drivers.
I THINK the gamer driver might be slightly faster. Those results were 1x.
I am now getting to run the "other" application on 1x. It used to get better milage at 2x back when there was a significant pause in the gpu processing for cpu processing (twice). It is entirely possible that advantage has gone away. But even at 1x it might produce more Windows production (aka: higher RAC) than Brp7/meerkat is producing.
I want to wring all the data I can out of this Windows/Titan V combo before I move it back to my main crunching machine.
Respectfully,
A Proud member of the O.F.A. (Old Farts Association).
Boca Raton Community HS
)
Here're links to screenshots of HWiINFO GPU data running BRP7 single and double for about half an hour each, doubles were staggered at about 50%. BRP7 2x, BRP7 1x
I got a big enough sample size of O3AS running doubles (started out staggered at about 50%). The average of almost 60 tasks is 1525 sec/task. That's .2% slower than the fastest single task I have (1522 sec) and 15.7% faster than the slowest (1809 sec). So on average it seems to be ~8.5% faster time per task running doubles with this new version of O3AS tasks.
AndreYor,If my arithmetic
)
AndreYor,
If my arithmetic is right. The O3AS 2x could produce a ~1.3M RAC on your system.
I just got done running it overnight at 1x and it looks like it could run ~997,000 RAC on my Windows system.
This RAC is significantly better than the very best Brp7/meeKat RAC on my Windows box which I calculated out to be ~934,000 RAC.
Looks like I will switch to 2x on O3AS. And then we can probably say. "It looks like currently the new O3AS 2x is probably the way to go on any Windows Nvidia gpu system".
:)
A Proud member of the O.F.A. (Old Farts Association).
Tom M wrote: ... Additional
)
Maybe I'm off track again.
My results (rounded up more or less) for running tasks only on one Titan V are:
1 task 420 sec
2 tasks 330 sec
3 tasks 320 sec | more or less
4 tasks 320 sec | the same time
So what am I missing out on ?
sfv