Observations on FGRBP1 1.17 for Windows

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513,211,304
RAC: 0

The performance differences

The performance differences may not be the application's to fix. 

There is often quite large differences between different the two OS's (look around the rendering community who use OpenCL)

There are OS / library / compiler / driver differences the application has no chance of "improving". 

We also seem to be setting GPU expectations based on the CUDA experience in BRP4 and BRP6.  We have not run OpenCL on nVidia here in the past,  so perhaps that is also something to consider as well.

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1,556,948,437
RAC: 727,801

That does not seem to be the

That does not seem to be the case over at Seti.

I asked the question:

SOG is an open cl app as I understand. Is there a appreciable difference in run time between running open cl on Windoz vs Linux? The reason I ask is Einstein recently came out with an open cl app and it runs 5 to 10 times faster on Linux than Windoz. Is that something inherent in Linux or just the way the app was written?

This is the answer I got:

When I moved one of my crunchers from Windows to Linux it briefly ran SoG applications. I found very little difference in run times, but there was a reduction in the demands on the CPU.
MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1,858
Credit: 1,355,705,853
RAC: 1,554,656

I ran a quick 8  FGRP v1.17

I ran a quick 8  FGRP v1.17 X2 with a Win 10 OS and GeForce 660Ti in 53 mins each X2 running.

I have started and stopped the next X2 a couple times and they restarted fine (busy running some VB tasks)

Plan on running the rest I have tonight and when they finally put the vLHC to sleep I can get all my GPU cards back to work here before I start my 13th year here in a couple weeks.

 
 

archae86
archae86
Joined: 6 Dec 05
Posts: 3,157
Credit: 7,181,734,931
RAC: 734,383

Betreger wrote:Einstein

Betreger wrote:
Einstein recently came out with an open cl app and it runs 5 to 10 times faster on Linux than Windoz.

While the very first Windows application released for FGRBP1 was rushed out with a very limited subset of computation moved to the GPU (FFT only, I think) and thus was almost a pure CPU application with speed-up supplied by GPU for roughly half the original pure CPU work, that was followed very quickly by a release which moved much more work to the GPU.

I doubt the current released application runs 5 to 10 times faster on Linux than the 1.17 Windows application.  I think it would be well to compare elapsed times on comparable hardware comparably configured.  Because of the high CPU requirement which inhibits many configurations from running high multiplicity, and because of the intermittent GPU use which renders 1X running unusually ineffiicient, I suggest we compare systems which are running 2X, and which have a light enough CPU load to avoid crippling the FGRBP1 application by CPU starvation.

Here are numbers averaged over several days running.  They come from a single system, which is my most modern and productive.  A single i5-4690K CPU, running stock, is supporting a total of 4 GPU tasks, as the system has both a GTX 1070 and a 6 GB GTX 1060, both running at the fastest overclocks I believe to be long-term safe.  No BOINC CPU tasks are running, and although I use the system as my daily driver, I believe the Einstein productivity is very little reduced by that.  While recent Einstein applications have suffered little performance loss in sharing a capable host among two GPUs, this application is much more demanding of host resources, and I suspect my times are degraded by this sharing appreciably.

I've removed from the averages a few remaining WUs of the much shorter type which get 693 credits and have names generally starting LATeah2003 instead of LATeah0010.

I hope some people reading this will report GTX 1070 or 1060 elapsed time averages on Windows and Linux machines running 2X.  Perhaps we can put a sounder comparison in place for the current applications.

Card    CPU     mult ET(h:mm:ss)
1070    i5-4690K 2X     0:52:21
6GB1060 i5-4690K 2X     1:48:40
Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4,918
Credit: 18,470,962,529
RAC: 5,973,350

I also don't think that there

I also don't think that there is that much difference between similar platforms running either Windows or Linux.

Card    CPU     mult ET(h:mm:ss)
1070    FX-8350  2X     0:42:56

1070    FX-8370  2X     0:43:40

970     FX-8300  2X     0:51:13

Each GPU task supported by 1 full CPU core.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3,157
Credit: 7,181,734,931
RAC: 734,383

Keith Myers wrote:Card CPU

Keith Myers wrote:

Card CPU mult ET(h:mm:ss)

1070    FX-8350  2X     0:42:56

1070    FX-8370  2X     0:43:40

970     FX-8300  2X     0:51:13

Each GPU task supported by 1 full CPU core.

As the computers visible from your ID are all Windows, I assume these are Windows timings.  As they also list two GPUs for every host, I assume you are also suffering some harm from host sharing.  But your timings are appreciably better than mine for the same model of card (1070).  Perhaps your motherboard/CPU combination is doing a better job of supporting the GPU than is mine for some reason.  

Also, are you overclocking your GPU cards?

Chris
Chris
Joined: 9 Apr 12
Posts: 61
Credit: 45,056,670
RAC: 0

I am running one AVX and one

I am running one AVX and one 1.17 at a time on a quad core. Getting pretty severe lag on any inputs. This goes away as soon as I snooze the gpu. I am sure t here is some non boinc mucking things up too, but I wonder if the swapping between the GPU and CPU is just too much. Its only a 1 gb card.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4,918
Credit: 18,470,962,529
RAC: 5,973,350

Yes, all  Windows.  Two

Yes, all  Windows.  Two Windows7 64 bit and one Windows10 64 bit.  Yes, I am overclocking the cards in their P-2 states to the original P0 state timings for GPU core and memory frequencies via NVI.  Other than that, just letting the cards do their normal GPU Boost thing. If anything, I am running handicapped on the CPU front since I run AMD FX chips.  They are slightly overclocked to 4.0 Ghz for the 8300 and 4.4 Ghz for the 8350/8370.

 

DF1DX
DF1DX
Joined: 14 Aug 10
Posts: 105
Credit: 3,666,040,617
RAC: 4,383,204

Dualboot Dell T20, ECC

Dualboot Dell T20, ECC Memory, NVidia 750Ti, no overclocking, no CPU Tasks, 2x FGRP 1.17:

Card    CPU      mult  ET(h:mm:ss) Driver  OS               Hostid

750Ti   E3-1225v3  2X  ~2:18:45(!) 367.57  Linux Mint 17.3  12241921

750Ti   E3-1225v3  2X  ~1:57:30    376.57  Win 7/64 Bit     12247194

and my other pc (Windows only) with no CPU Tasks, 2x FGRP 1.17:

1060/6GB i5-3570K  2x  ~0:43:00    372.70  Win 7/64 Bit      6742381

 
n12365
n12365
Joined: 4 Mar 16
Posts: 26
Credit: 6,491,436,572
RAC: 0

archae86 wrote:I hope some

archae86 wrote:

I hope some people reading this will report GTX 1070 or 1060 elapsed time averages on Windows and Linux machines running 2X.  Perhaps we can put a sounder comparison in place for the current applications.

Card    CPU     mult ET(h:mm:ss)
1070    i5-4690K 2X     0:52:21
6GB1060 i5-4690K 2X     1:48:40

All of my computers are running Windows 10 and these are the run times I am getting for FGRP 1.17:

Card        CPU       Multi    Run Time

1070        i5-6600    2X    0:29:06

1070        Q9450      2X    0:38:32

6GB1060   i5-4690    2X   0:41:46

3GB1060   i7-4790K  2X   0:43:34

 

My times are significantly different from the times you are getting.  I am using the median run time over the last 7 days to compare the relative performance of my computers.  Is the elapsed time you are using the same as the run time I am using?  I am pulling the run time from the task webpage.  This is an example from my fast 1070 machine:   https://einsteinathome.org/task/599111630

 

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.