Observations on FGRPB1 1.16 for Windows

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

16 Dec 2016 14:09:32 UTC

Topic 203787

(moderation:

)

GTX 780 (347.88) + Intel Q9550 @ 3.7GHz, running 1 task : 17min

GTX 760 (376.33), 2 cards, Intel X56xx @ 4GHz, running 1 task per card : 32min

GTX 760 (376.33), 2 cards, Intel X56xx @ 4GHz, running 2 tasks per card : 60min

AMD R9 270X (16.12.1), Intel X56xx @ 4GHz, running 1 task : 9min

- Mediocre GPU from AMD seems to be much faster than any 700-series Nvidia

- not much benefit from running 2 tasks parallel with 700-series Nvidia

- the progress bar may jump to 100% from whatever point (25, 40, 60 for example)

Mumak

Joined: 26 Feb 13

Posts: 335

Credit: 3639159787

RAC: 1585322

Posting here as well...Some

16 Dec 2016 14:39:53 UTC

Message 152898

(moderation:

)

Posting here as well, as this is the proper thread...

FGRPB1G v1.16, times in seconds:

                        1x       2x
-----------------------------
Fury X           330     420
HD 7950   460     560
GTX 1050 Ti 730   1180
Tesla K20c     1500

GTX 1080 350 480 (from user Matt)

There's something odd happening on the Tesla, GPU is well utilized, but CPU at 99% too. I didn't have any problems with Milkyway or BRP there...

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7398161687

RAC: 1958313

As this application is much

16 Dec 2016 16:05:42 UTC

Message 152910

(moderation:

)

As this application is much more CPU-hungry than was the mature BRP4G application, a host which was well-balanced for high BRP4G productivity is likely to be underpowered in CPU for FGRPB1 1.16. Once consequence is that people with a lot of GPU capability on a host may need to get used to lower GPU load than they have liked. Also one may wish to reconsider sharing a host between GPU and pure-CPU work.

My most productive machine has one GTX 1070 and one 6GB GTX 1060 on a machine with a 4-core i5-4690K CPU.

Running 2X on 1.16 (so four total 1.16 tasks), on that machine I am seeing GPU-Z reported GPU load of about 87%, memory controller load of about 47%, and Bus Interface load of 15%.

In my current configuration the 1.16 tasks running 2X on the 1070 are completing in about 9:20 elapsed time, and the 1060 in about 12:02, both at a reported CPU utilization of about 90%. I'm not sure whether the 90% is near the limit of their hunger, or was limited by availability.

I urge people not to judge likely productivity from the reported completion percentage of partially processed work. Not only is 1.16 leaping to 100% from some far less point, but the jump point varies, a lot, as Richie reported in this thread.

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

The results are in for my two

16 Dec 2016 17:05:57 UTC

Message 152911

(moderation:

)

The results are in for my two GTX 750 Ti's running under Win7 64-bit. This are minimally-factory overclocked cards running at 1210 MHz according to GPU-Z. They each reserve a core of an i7-4771 (Z87 MB). I have four other cores running the Gravity Wave CV work units, so in essence four cores are free to support the GPUs, which should be enough.

Note also that these are the "LATeah2003L_236" series; the times for the other series may vary slightly.

One work unit per card: 1306 seconds average per work unit (averaged over six work units)

Two work units per card: 1088 seconds average per work unit (averaged over four work units)

The average GPU usage was about 84% for one WU at a time, and 90% for two at a time. However, the power was almost the same, at about 49% TDP for one at a time verses about 51% TDP for two at a time. Hence, the temps were consistent and low, at 50 to 52 C.

Also note: I use the 359.06 drivers, being the last of the CUDA 7.5 series. These are faster than the later ones on CUDA, as previously discussed on this forum, though I don't know if they make any difference for OpenCL.

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7398161687

RAC: 1958313

I have some timings to report

16 Dec 2016 17:28:35 UTC

Message 152913 in response to message 152898

(moderation:

)

I have some timings to report from some combinations of card types and host CPUs.

In all cases save the 1050, the host is supporting two cards. As this application is more CPU-dependent than earlier ones here, I suspect there will be material host capability modulation of the reported card performance, and sharing the host across two cards probably hurts some also (it was not hurting enough to notice for the late Cuda55 applications).

The CPUs are all running at stock clock. However all the GPUs are overclocked in both core clock and memory clock, running at clocks I found to be long-term stable on the previous major applications here, a couple of ticks slower than maximum observed success.

Card    CPU     mult ET(mm:ss)
1070    i5-4690K 2X   8:30
6GB1060 i5-4690K 2X  11:50
3GB1060 i5-2500K 2X  14:00
970     i3-4130  2X  15:00
1050    E5620    2X  21:30
750Ti   i5-2500K 2X  29:15
750Ti   i3-4130  2X  30:10

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7398161687

RAC: 1958313

I'm happy to report that I

16 Dec 2016 17:46:03 UTC

Message 152915

(moderation:

)

I'm happy to report that I already have 35 validations across my four hosts running this application, with zero error or invalid results so far.

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

archae86 wrote:I urge people

16 Dec 2016 18:18:56 UTC

Message 152918 in response to message 152910

(moderation:

)

archae86 wrote:

I urge people not to judge likely productivity from the reported completion percentage of partially processed work. Not only is 1.16 leaping to 100% from some far less point, but the jump point varies, a lot, as Richie reported in this thread.

Richie wrote:

- the progress bar may jump to 100% from whatever point (25, 40, 60 for example)

I think this might have to do with how Boinc behaves when an app doesn't report progress, Boinc will then estimate progress (based on the estimated runtime, I think) of the task and the progress will converge on 100% done but never fully reach it.
I think the the new FGRP GPU app either is "broken" when it comes to progress reporting or it's not there at all.
Hopefully the project will release a new version when the dust of the initial deployment has settled.

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

Mumak wrote:Posting here as

17 Dec 2016 2:55:09 UTC

Message 152956 in response to message 152898

(moderation:

)

Mumak wrote:

Posting here as well, as this is the proper thread...

FGRPB1G v1.16, times in seconds:

                        1x       2x
-----------------------------
Fury X           330     420
HD 7950   460     560
GTX 1050 Ti 730   1180
Tesla K20c     1500

GTX 1080        350    480 (from user Matt)

There's something odd happening on the Tesla, GPU is well utilized, but CPU at 99% too. I didn't have any problems with Milkyway or BRP there...

i7 6950X @4.0Gh DDR4 3.47GHz 4-980Tis

1x 3x 4x

GTX 980Ti 337sec 690sec (or 230sec per work units) 910sec (or 227.5 sec)

Can't test 5 at a time, not enough cores...

WhiteWulfe

Joined: 3 Mar 15

Posts: 31

Credit: 62249506

RAC: 0

Running one task at a time on

17 Dec 2016 3:12:26 UTC

Message 152958

(moderation:

)

Running one task at a time on a 4770k that's running at 3.9GHz and paired with my GTX 980 Ti Golden Edition..... My tasks are completing around the 7:20 mark or so. Oh right, 16GB of DDR3-2400 CL10, running Windows 7 Home Premium 64-Bit Edition.

What's moderately annoying though is I have my queue set for 0.45 days (with 0.05 extra) and it downloaded 498 work units, which by quick estimates pins it at around two and a half days or so. And even better? It's constantly trying to get even more, and the server is deferring me 24 hours now with the message "reached daily quota of (amount of work units remaining in my queue)", so I'm having to manually upload finished work units every couple of hours.

1.16 is a definite improvement for Windows users compared to 1.15 though!

Interestingly enough, all eight threads on my rig are pinned at 100% usage, despite me having BOINC set to only use 75%.

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7398161687

RAC: 1958313

WhiteWulfe wrote:... the

17 Dec 2016 4:11:05 UTC

Message 152962 in response to message 152958

(moderation:

)

WhiteWulfe wrote:

... the server is deferring me 24 hours now with the message "reached daily quota of

I hit the daily quota limit (640 in my case). Initially it appeared that the deferral time BOINC displayed was until twenty minutes or so after midnight UTC. I figured that it figured that midnight UTC was when a new day's quota would kick in.

But my host has gotten zero new work since the quota limit message first displayed, and now displays deferral until about half an hour after midnight UTC for another day.

Does anyone reading this actually know just how the reached daily limit system works? Midnight in Germany? Midnight in the American Midwest? 24 hours after the quota was reached? 24 hours after the first unit downloaded in the group that hit the limit? ...

I figure I'll try an update now and again hoping the server will have a different opinion than does boinc on my machine as to when my timeout is over. At the current rate of progress, I'll have gone to zero work about nine hours before the indicated deferral end.

Mad_Max

Joined: 2 Jan 10

Posts: 165

Credit: 2268008770

RAC: 676026

Yes, v 1.16 MUCH faster.

17 Dec 2016 4:45:15 UTC

Message 152963

(moderation:

)

Yes, v 1.16 MUCH faster. Runtimes drops down from 28-30k sec ( about 8 hours) to 900-1000s on AMD HD 7870 with single WU running.
Like ~30 times faster. Now its real GPU speed.
Good work.

GPU VRAM use almost 3 times higher though (700-800 Mb per task) so 1Gb cards can run only one task and 2 GB only 2.
Actually i try 2 WUs on 1 GB card (HD 7850) and it work OK, but slower - seem it start using system RAM via PCI-E.

Observations on FGRPB1 1.16 for Windows

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner