Observations on FGRPB1 1.16 for Windows

Richie
Richie
Joined: 7 Mar 14
Posts: 384
Credit: 1,516,477,284
RAC: 278,625
Topic 203787

GTX 780 (347.88) + Intel Q9550 @ 3.7GHz, running 1 task : 17min

GTX 760 (376.33), 2 cards, Intel X56xx @ 4GHz, running 1 task per card : 32min

GTX 760 (376.33), 2 cards, Intel X56xx @ 4GHz, running 2 tasks per card : 60min

AMD R9 270X (16.12.1), Intel X56xx @ 4GHz, running 1 task : 9min


- Mediocre GPU from AMD seems to be much faster than any 700-series Nvidia

- not much benefit from running 2 tasks parallel with 700-series Nvidia

- the progress bar may jump to 100% from whatever point (25, 40, 60 for example)

Mumak
Joined: 26 Feb 13
Posts: 312
Credit: 1,652,812,637
RAC: 747,844

Posting here as well...Some

Posting here as well, as this is the proper thread...

FGRPB1G v1.16, times in seconds:

                        1x       2x
-----------------------------
Fury X             330     420
HD 7950         460     560
GTX 1050 Ti    730   1180
Tesla K20c     1500              

GTX 1080        350    480  (from user Matt)

 
There's something odd happening on the Tesla, GPU is well utilized, but CPU at 99% too. I didn't have any problems with Milkyway or BRP there...

-----

archae86
archae86
Joined: 6 Dec 05
Posts: 2,557
Credit: 1,860,107,806
RAC: 2,584,346

As this application is much

As this application is much more CPU-hungry than was the mature BRP4G application, a host which was well-balanced for high BRP4G productivity is likely to be underpowered in CPU for FGRPB1 1.16.  Once consequence is that people with a lot of GPU capability on a host may need to get used to lower GPU load than they have liked.  Also one may wish to reconsider sharing a host between GPU and pure-CPU work.

My most productive machine has one GTX 1070 and one 6GB GTX 1060 on a machine with a 4-core i5-4690K CPU.

Running 2X on 1.16 (so four total 1.16 tasks), on that machine I am seeing GPU-Z reported GPU load of about 87%, memory controller load of about 47%, and Bus Interface load of 15%.

In my current configuration the 1.16 tasks running 2X on the 1070 are completing in about 9:20 elapsed time, and the 1060 in about 12:02, both at a reported CPU utilization of about 90%.  I'm not sure whether the 90% is near the limit of their hunger, or was limited by availability.

I urge people not to judge likely productivity from the reported completion percentage of partially processed work.  Not only is 1.16 leaping to 100% from some far less point, but the jump point varies, a lot, as Richie reported in this thread.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 312
Credit: 175,269,044
RAC: 7,748

The results are in for my two

The results are in for my two GTX 750 Ti's running under Win7 64-bit.  This are minimally-factory overclocked cards running at 1210 MHz according to GPU-Z.  They each reserve a core of an i7-4771 (Z87 MB).  I have four other cores running the Gravity Wave CV work units, so in essence four cores are free to support the GPUs, which should be enough.

Note also that these are the "LATeah2003L_236" series; the times for the other series may vary slightly.

One work unit per card:  1306 seconds average per work unit (averaged over six work units)

Two work units per card: 1088 seconds average per work unit (averaged over four work units)

The average GPU usage was about 84% for one WU at a time, and 90% for two at a time.  However, the power was almost the same, at about 49% TDP for one at a time verses about 51% TDP for two at a time.  Hence, the temps were consistent and low, at 50 to 52 C.

 Also note: I use the 359.06 drivers, being the last of the CUDA 7.5 series.  These are faster than the later ones on CUDA, as previously discussed on this forum, though I don't know if they make any difference for OpenCL.

archae86
archae86
Joined: 6 Dec 05
Posts: 2,557
Credit: 1,860,107,806
RAC: 2,584,346

I have some timings to report

I have some timings to report from some combinations of card types and host CPUs.

In all cases save the 1050, the host is supporting two cards.  As this application is more CPU-dependent than earlier ones here, I suspect there will be material host capability modulation of the reported card performance, and sharing the host across two cards probably hurts some also (it was not hurting enough to notice for the late Cuda55 applications).

The CPUs are all running at stock clock.  However all the GPUs are overclocked in both core clock and memory clock, running at clocks I found to be long-term stable on the previous major applications here, a couple of ticks slower than maximum observed success.

Card    CPU     mult ET(mm:ss)
1070    i5-4690K 2X   8:30
6GB1060 i5-4690K 2X  11:50
3GB1060 i5-2500K 2X  14:00
970     i3-4130  2X  15:00
1050    E5620    2X  21:30
750Ti   i5-2500K 2X  29:15
750Ti   i3-4130  2X  30:10

 

archae86
archae86
Joined: 6 Dec 05
Posts: 2,557
Credit: 1,860,107,806
RAC: 2,584,346

I'm happy to report that I

I'm happy to report that I already have 35 validations across my four hosts running this application, with zero error or invalid results so far.

Holmis
Joined: 4 Jan 05
Posts: 933
Credit: 544,902,589
RAC: 821,881

archae86 wrote:I urge people

archae86 wrote:
I urge people not to judge likely productivity from the reported completion percentage of partially processed work.  Not only is 1.16 leaping to 100% from some far less point, but the jump point varies, a lot, as Richie reported in this thread.

Richie wrote:
- the progress bar may jump to 100% from whatever point (25, 40, 60 for example)

I think this might have to do with how Boinc behaves when an app doesn't report progress, Boinc will then estimate progress (based on the estimated runtime, I think) of the task and the progress will converge on 100% done but never fully reach it.
I think the the new FGRP GPU app either is "broken" when it comes to progress reporting or it's not there at all.
Hopefully the project will release a new version when the dust of the initial deployment has settled.

Zalster
Zalster
Joined: 26 Nov 13
Posts: 2,923
Credit: 3,058,991,780
RAC: 1,582,169

Mumak wrote:Posting here as

Mumak wrote:

Posting here as well, as this is the proper thread...

FGRPB1G v1.16, times in seconds:

                        1x       2x
-----------------------------
Fury X             330     420
HD 7950         460     560
GTX 1050 Ti    730   1180
Tesla K20c     1500              

GTX 1080        350    480  (from user Matt)

 
There's something odd happening on the Tesla, GPU is well utilized, but CPU at 99% too. I didn't have any problems with Milkyway or BRP there...

 

i7 6950X @4.0Gh DDR4 3.47GHz  4-980Tis

                  1x                 3x                                                        4x

GTX 980Ti   337sec          690sec (or 230sec per work units)          910sec (or 227.5 sec)

 

Can't test 5 at a time, not enough cores...

WhiteWulfe
Joined: 3 Mar 15
Posts: 31
Credit: 61,177,387
RAC: 0

Running one task at a time on

Running one task at a time on a 4770k that's running at 3.9GHz and paired with my GTX 980 Ti Golden Edition.....  My tasks are completing around the 7:20 mark or so.  Oh right, 16GB of DDR3-2400 CL10, running Windows 7 Home Premium 64-Bit Edition.

 

What's moderately annoying though is I have my queue set for 0.45 days (with 0.05 extra) and it downloaded 498 work units, which by quick estimates pins it at around two and a half days or so.  And even better?  It's constantly trying to get even more, and the server is deferring me 24 hours now with the message "reached daily quota of (amount of work units remaining in my queue)", so I'm having to manually upload finished work units every couple of hours.

 

1.16 is a definite improvement for Windows users compared to 1.15 though!

 

 

Interestingly enough, all eight threads on my rig are pinned at 100% usage, despite me having BOINC set to only use 75%.

archae86
archae86
Joined: 6 Dec 05
Posts: 2,557
Credit: 1,860,107,806
RAC: 2,584,346

WhiteWulfe wrote:... the

WhiteWulfe wrote:
... the server is deferring me 24 hours now with the message "reached daily quota of

I hit the daily quota limit (640 in my case).  Initially it appeared that the deferral time BOINC displayed was until twenty minutes or so after midnight UTC.  I figured that it figured that midnight UTC was when a new day's quota would kick in.

But my host has gotten zero new work since the quota limit message first displayed, and now displays deferral until about half an hour after midnight UTC for another day.

Does anyone reading this actually know just how the reached daily limit system works?  Midnight in Germany?  Midnight in the American Midwest?  24 hours after the quota was reached? 24 hours after the first unit downloaded in the group that hit the limit?   ...

I figure I'll try an update now and again hoping the server will have a different opinion than does boinc on my machine as to when my timeout is over.  At the current rate of progress, I'll have gone to zero work about nine hours before the indicated deferral end.

Mad_Max
Mad_Max
Joined: 2 Jan 10
Posts: 124
Credit: 1,185,694,607
RAC: 1,090,085

Yes, v 1.16 MUCH faster.

Yes, v 1.16 MUCH faster. Runtimes drops down from 28-30k sec ( about 8 hours) to 900-1000s on AMD HD 7870 with single WU running.
Like ~30 times faster. Now its real GPU speed.
Good work.

GPU VRAM use almost 3 times higher though (700-800 Mb per task) so 1Gb cards can run only one task and 2 GB only 2.
Actually i try 2 WUs on 1 GB card (HD 7850) and it work OK, but slower - seem it start using system RAM via PCI-E.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.