The performance differences may not be the application's to fix.
There is often quite large differences between different the two OS's (look around the rendering community who use OpenCL)
There are OS / library / compiler / driver differences the application has no chance of "improving".
We also seem to be setting GPU expectations based on the CUDA experience in BRP4 and BRP6. We have not run OpenCL on nVidia here in the past, so perhaps that is also something to consider as well.
SOG is an open cl app as I understand. Is there a appreciable difference in run time between running open cl on Windoz vs Linux? The reason I ask is Einstein recently came out with an open cl app and it runs 5 to 10 times faster on Linux than Windoz. Is that something inherent in Linux or just the way the app was written?
This is the answer I got:
When I moved one of my crunchers from Windows to Linux it briefly ran SoG applications. I found very little difference in run times, but there was a reduction in the demands on the CPU.
I ran a quick 8 FGRP v1.17 X2 with a Win 10 OS and GeForce 660Ti in 53 mins each X2 running.
I have started and stopped the next X2 a couple times and they restarted fine (busy running some VB tasks)
Plan on running the rest I have tonight and when they finally put the vLHC to sleep I can get all my GPU cards back to work here before I start my 13th year here in a couple weeks.
Einstein recently came out with an open cl app and it runs 5 to 10 times faster on Linux than Windoz.
While the very first Windows application released for FGRBP1 was rushed out with a very limited subset of computation moved to the GPU (FFT only, I think) and thus was almost a pure CPU application with speed-up supplied by GPU for roughly half the original pure CPU work, that was followed very quickly by a release which moved much more work to the GPU.
I doubt the current released application runs 5 to 10 times faster on Linux than the 1.17 Windows application. I think it would be well to compare elapsed times on comparable hardware comparably configured. Because of the high CPU requirement which inhibits many configurations from running high multiplicity, and because of the intermittent GPU use which renders 1X running unusually ineffiicient, I suggest we compare systems which are running 2X, and which have a light enough CPU load to avoid crippling the FGRBP1 application by CPU starvation.
Here are numbers averaged over several days running. They come from a single system, which is my most modern and productive. A single i5-4690K CPU, running stock, is supporting a total of 4 GPU tasks, as the system has both a GTX 1070 and a 6 GB GTX 1060, both running at the fastest overclocks I believe to be long-term safe. No BOINC CPU tasks are running, and although I use the system as my daily driver, I believe the Einstein productivity is very little reduced by that. While recent Einstein applications have suffered little performance loss in sharing a capable host among two GPUs, this application is much more demanding of host resources, and I suspect my times are degraded by this sharing appreciably.
I've removed from the averages a few remaining WUs of the much shorter type which get 693 credits and have names generally starting LATeah2003 instead of LATeah0010.
I hope some people reading this will report GTX 1070 or 1060 elapsed time averages on Windows and Linux machines running 2X. Perhaps we can put a sounder comparison in place for the current applications.
Card CPU mult ET(h:mm:ss)
1070 i5-4690K 2X 0:52:21
6GB1060 i5-4690K 2X 1:48:40
As the computers visible from your ID are all Windows, I assume these are Windows timings. As they also list two GPUs for every host, I assume you are also suffering some harm from host sharing. But your timings are appreciably better than mine for the same model of card (1070). Perhaps your motherboard/CPU combination is doing a better job of supporting the GPU than is mine for some reason.
I am running one AVX and one 1.17 at a time on a quad core. Getting pretty severe lag on any inputs. This goes away as soon as I snooze the gpu. I am sure t here is some non boinc mucking things up too, but I wonder if the swapping between the GPU and CPU is just too much. Its only a 1 gb card.
Yes, all Windows. Two Windows7 64 bit and one Windows10 64 bit. Yes, I am overclocking the cards in their P-2 states to the original P0 state timings for GPU core and memory frequencies via NVI. Other than that, just letting the cards do their normal GPU Boost thing. If anything, I am running handicapped on the CPU front since I run AMD FX chips. They are slightly overclocked to 4.0 Ghz for the 8300 and 4.4 Ghz for the 8350/8370.
I hope some people reading this will report GTX 1070 or 1060 elapsed time averages on Windows and Linux machines running 2X. Perhaps we can put a sounder comparison in place for the current applications.
Card CPU mult ET(h:mm:ss)
1070 i5-4690K 2X 0:52:21
6GB1060 i5-4690K 2X 1:48:40
All of my computers are running Windows 10 and these are the run times I am getting for FGRP 1.17:
Card CPU Multi Run Time
1070 i5-6600 2X 0:29:06
1070 Q9450 2X 0:38:32
6GB1060 i5-4690 2X 0:41:46
3GB1060 i7-4790K 2X 0:43:34
My times are significantly different from the times you are getting. I am using the median run time over the last 7 days to compare the relative performance of my computers. Is the elapsed time you are using the same as the run time I am using? I am pulling the run time from the task webpage. This is an example from my fast 1070 machine: https://einsteinathome.org/task/599111630
The performance differences
)
The performance differences may not be the application's to fix.
There is often quite large differences between different the two OS's (look around the rendering community who use OpenCL)
There are OS / library / compiler / driver differences the application has no chance of "improving".
We also seem to be setting GPU expectations based on the CUDA experience in BRP4 and BRP6. We have not run OpenCL on nVidia here in the past, so perhaps that is also something to consider as well.
That does not seem to be the
)
That does not seem to be the case over at Seti.
I asked the question:
This is the answer I got:
I ran a quick 8 FGRP v1.17
)
I ran a quick 8 FGRP v1.17 X2 with a Win 10 OS and GeForce 660Ti in 53 mins each X2 running.
I have started and stopped the next X2 a couple times and they restarted fine (busy running some VB tasks)
Plan on running the rest I have tonight and when they finally put the vLHC to sleep I can get all my GPU cards back to work here before I start my 13th year here in a couple weeks.
Betreger wrote:Einstein
)
While the very first Windows application released for FGRBP1 was rushed out with a very limited subset of computation moved to the GPU (FFT only, I think) and thus was almost a pure CPU application with speed-up supplied by GPU for roughly half the original pure CPU work, that was followed very quickly by a release which moved much more work to the GPU.
I doubt the current released application runs 5 to 10 times faster on Linux than the 1.17 Windows application. I think it would be well to compare elapsed times on comparable hardware comparably configured. Because of the high CPU requirement which inhibits many configurations from running high multiplicity, and because of the intermittent GPU use which renders 1X running unusually ineffiicient, I suggest we compare systems which are running 2X, and which have a light enough CPU load to avoid crippling the FGRBP1 application by CPU starvation.
Here are numbers averaged over several days running. They come from a single system, which is my most modern and productive. A single i5-4690K CPU, running stock, is supporting a total of 4 GPU tasks, as the system has both a GTX 1070 and a 6 GB GTX 1060, both running at the fastest overclocks I believe to be long-term safe. No BOINC CPU tasks are running, and although I use the system as my daily driver, I believe the Einstein productivity is very little reduced by that. While recent Einstein applications have suffered little performance loss in sharing a capable host among two GPUs, this application is much more demanding of host resources, and I suspect my times are degraded by this sharing appreciably.
I've removed from the averages a few remaining WUs of the much shorter type which get 693 credits and have names generally starting LATeah2003 instead of LATeah0010.
I hope some people reading this will report GTX 1070 or 1060 elapsed time averages on Windows and Linux machines running 2X. Perhaps we can put a sounder comparison in place for the current applications.
I also don't think that there
)
I also don't think that there is that much difference between similar platforms running either Windows or Linux.
Keith Myers wrote:Card CPU
)
As the computers visible from your ID are all Windows, I assume these are Windows timings. As they also list two GPUs for every host, I assume you are also suffering some harm from host sharing. But your timings are appreciably better than mine for the same model of card (1070). Perhaps your motherboard/CPU combination is doing a better job of supporting the GPU than is mine for some reason.
Also, are you overclocking your GPU cards?
I am running one AVX and one
)
I am running one AVX and one 1.17 at a time on a quad core. Getting pretty severe lag on any inputs. This goes away as soon as I snooze the gpu. I am sure t here is some non boinc mucking things up too, but I wonder if the swapping between the GPU and CPU is just too much. Its only a 1 gb card.
Yes, all Windows. Two
)
Yes, all Windows. Two Windows7 64 bit and one Windows10 64 bit. Yes, I am overclocking the cards in their P-2 states to the original P0 state timings for GPU core and memory frequencies via NVI. Other than that, just letting the cards do their normal GPU Boost thing. If anything, I am running handicapped on the CPU front since I run AMD FX chips. They are slightly overclocked to 4.0 Ghz for the 8300 and 4.4 Ghz for the 8350/8370.
Dualboot Dell T20, ECC
)
archae86 wrote:I hope some
)
All of my computers are running Windows 10 and these are the run times I am getting for FGRP 1.17:
Card CPU Multi Run Time
1070 i5-6600 2X 0:29:06
1070 Q9450 2X 0:38:32
6GB1060 i5-4690 2X 0:41:46
3GB1060 i7-4790K 2X 0:43:34
My times are significantly different from the times you are getting. I am using the median run time over the last 7 days to compare the relative performance of my computers. Is the elapsed time you are using the same as the run time I am using? I am pulling the run time from the task webpage. This is an example from my fast 1070 machine: https://einsteinathome.org/task/599111630