Additional conversation has convinced me that 1x is the only choice for Windows and Brp7/meerKat.
...
Maybe I'm off track again.
My results (rounded up more or less) for running tasks only on one Titan V are:
1 task 420 sec
2 tasks 330 sec
3 tasks 320 sec | more or less
4 tasks 320 sec | the same time
So what am I missing out on ?
sfv
You maybe missing out on nothing.
If those numbers hold up on your Windows/Titan V machines, go with them. 4x would calculate out to be about 3.5M Rac / gpu. Since this is an outlier, I am going to guess that your numbers will get worse.
It looks like you may be running two 3 Titan V boxes. If that is true. I would just let them "bang heads" and see which Operating system comes out on top?
I am currently claiming that the new Windows (beta) O3AS running at 2x is likely the top RAC generator for Windows-based Nvidia machines.
And if you are running MPS under Linux, with the optimized O3AS then 2x is the top RAC generator.
I don't have any data for high end Radeon gpus under Windows. Only my dinky iGpu Radeon.
Respectfully,
A Proud member of the O.F.A. (Old Farts Association).
Here're links to screenshots of HWiINFO GPU data running BRP7 single and double for about half an hour each, doubles were staggered at about 50%. BRP7 2x, BRP7 1x
I got a big enough sample size of O3AS running doubles (started out staggered at about 50%). The average of almost 60 tasks is 1525 sec/task. That's .2% slower than the fastest single task I have (1522 sec) and 15.7% faster than the slowest (1809 sec). So on average it seems to be ~8.5% faster time per task running doubles with this new version of O3AS tasks.
I am confused about these data. If both work units (on 2x) were mid-run then why is the core load and wattage lower when running 2x? The memory usage makes sense, but that seems odd to me. Ian&Steve C, is that normal for the Titan V, since you have been using them for a long time?
San-Fernando-Valley wrote:
Maybe I'm off track again.
My results (rounded up more or less) for running tasks only on one Titan V are:
1 task 420 sec
2 tasks 330 sec
3 tasks 320 sec | more or less
4 tasks 320 sec | the same time
So what am I missing out on ?
sfv
This is also what I see- 2x is better and then no improvement past 2x in Windows with the A4500.
O3AS might be more productive on a given system- there is no substitute for just letting it run to see what happens in the long term.
If those numbers hold up on your Windows/Titan V machines, go with them. 4x would calculate out to be about 3.5M Rac / gpu. Since this is an outlier, I am going to guess that your numbers will get worse.
It looks like you may be running two 3 Titan V boxes. If that is true. I would just let them "bang heads" and see which Operating system comes out on top?
...
What do you mean by an "outlier" ?
I'm running way more than 2 boxes on Titans.
I was "told" that Ubuntu is way faster than Windows ... ?
I was "told" that Ubuntu is way faster than Windows ... ?
So, now what ?
cheers
sfv
==edit==
Lots of high flouting' discussion deleted.
==end edit===
I don't have a clue. All my experience shows on the same hardware Windows crunches "slower" than Linux-based crunching. But that doesn't mean its some kind of "law". It just means we didn't have any counter-examples show up.
Respectfully,
A Proud member of the O.F.A. (Old Farts Association).
Does anyone that uses the Titan V or Quadro GPUs also implement TCC mode (Windows only if multiple GPUs are installed)? We switched one of the A4500 GPUs to this mode and we think we do see an uplift in performance from this GPU. We do not have many work units completed yet in this mode, so we will see.
It took us some changes in the bios to make the second GPU show up after switching to TCC but it completely makes sense that this mode would be superior (apparently launches CUDA instances faster).
If we do see an improvement, we will post more info/instructions for what we had to do to make this work on our machine.
If those numbers hold up on your Windows/Titan V machines, go with them. 4x would calculate out to be about 3.5M Rac / gpu. Since this is an outlier, I am going to guess that your numbers will get worse.
It looks like you may be running two 3 Titan V boxes. If that is true. I would just let them "bang heads" and see which Operating system comes out on top?
...
What do you mean by an "outlier" ?
I'm running way more than 2 boxes on Titans.
I was "told" that Ubuntu is way faster than Windows ... ?
So, now what ?
cheers
sfv
Linux is absolutely still faster. both for the better more optimized application, and the ability to use MPS.
Tom is getting confused by your numbers. he's misreading your stated runtimes as wall clock times, instead of the effective per task runtime. your Titan V will not do 3.5M ppd that Tom is claiming. I mean you can just look at the results yourself and see it's not doing that. and your runtime that shows clearly that your 3x config ran in ~1300s "wall clock", which makes perfect sense to me.
Linux optimized + MPS is closer to 200s per task "effective" on a Titan V.
I am confused about these data. If both work units (on 2x) were mid-run then why is the core load and wattage lower when running 2x? The memory usage makes sense, but that seems odd to me. Ian&Steve C, is that normal for the Titan V, since you have been using them for a long time?
The GPU Core load is higher an 2x (avg. 98.5% vs. 83.5%) but I did notice too that the wattage is lower at 2x. Memory controller load is lower at 2x too. Seems odd to me too.
Another thing that looks peculiar to me are the times you and another use posted. Mine are reverse: I get 270-300 sec/task running BRP7 1x and it slows down to 420+ sec when 2x. It seems like my times 1x are faster than you-all's 2x or 3x.
San-Fernando-Valley
)
You maybe missing out on nothing.
If those numbers hold up on your Windows/Titan V machines, go with them. 4x would calculate out to be about 3.5M Rac / gpu. Since this is an outlier, I am going to guess that your numbers will get worse.
It looks like you may be running two 3 Titan V boxes. If that is true. I would just let them "bang heads" and see which Operating system comes out on top?
I am currently claiming that the new Windows (beta) O3AS running at 2x is likely the top RAC generator for Windows-based Nvidia machines.
And if you are running MPS under Linux, with the optimized O3AS then 2x is the top RAC generator.
I don't have any data for high end Radeon gpus under Windows. Only my dinky iGpu Radeon.
Respectfully,
A Proud member of the O.F.A. (Old Farts Association).
AndreyOR wrote: Here're
)
I am confused about these data. If both work units (on 2x) were mid-run then why is the core load and wattage lower when running 2x? The memory usage makes sense, but that seems odd to me. Ian&Steve C, is that normal for the Titan V, since you have been using them for a long time?
This is also what I see- 2x is better and then no improvement past 2x in Windows with the A4500.
O3AS might be more productive on a given system- there is no substitute for just letting it run to see what happens in the long term.
Boca Raton Community HS
)
+1
Ideas are not fixed, nor should they be; we live in model-dependent reality.
Tom M wrote: ... You maybe
)
What do you mean by an "outlier" ?
I'm running way more than 2 boxes on Titans.
I was "told" that Ubuntu is way faster than Windows ... ?
So, now what ?
cheers
sfv
San-Fernando-Valley
)
==edit==
Lots of high flouting' discussion deleted.
==end edit===
I don't have a clue. All my experience shows on the same hardware Windows crunches "slower" than Linux-based crunching. But that doesn't mean its some kind of "law". It just means we didn't have any counter-examples show up.
Respectfully,
A Proud member of the O.F.A. (Old Farts Association).
Does anyone that uses the
)
Does anyone that uses the Titan V or Quadro GPUs also implement TCC mode (Windows only if multiple GPUs are installed)? We switched one of the A4500 GPUs to this mode and we think we do see an uplift in performance from this GPU. We do not have many work units completed yet in this mode, so we will see.
It took us some changes in the bios to make the second GPU show up after switching to TCC but it completely makes sense that this mode would be superior (apparently launches CUDA instances faster).
If we do see an improvement, we will post more info/instructions for what we had to do to make this work on our machine.
San-Fernando-Valley
)
Linux is absolutely still faster. both for the better more optimized application, and the ability to use MPS.
Tom is getting confused by your numbers. he's misreading your stated runtimes as wall clock times, instead of the effective per task runtime. your Titan V will not do 3.5M ppd that Tom is claiming. I mean you can just look at the results yourself and see it's not doing that. and your runtime that shows clearly that your 3x config ran in ~1300s "wall clock", which makes perfect sense to me.
Linux optimized + MPS is closer to 200s per task "effective" on a Titan V.
_________________________________________________________________________
Ian&Steve C.
)
"What me confused? Nah."
Attributed to Alfred E Neuman of Mad Magazine.
A Proud member of the O.F.A. (Old Farts Association).
Boca Raton Community HS
)
The GPU Core load is higher an 2x (avg. 98.5% vs. 83.5%) but I did notice too that the wattage is lower at 2x. Memory controller load is lower at 2x too. Seems odd to me too.
Another thing that looks peculiar to me are the times you and another use posted. Mine are reverse: I get 270-300 sec/task running BRP7 1x and it slows down to 420+ sec when 2x. It seems like my times 1x are faster than you-all's 2x or 3x.
When you run 2X or 3X, you
)
When you run 2X or 3X, you elapsed times shown for the task need to be divided by the integer to show 'effective' elapsed times.
So your 420+ second tasks are actually completing in 210 seconds, IOW
faster"more productive" than your 270 second tasks at 1X