GTX 780 (347.88) + Intel Q9550 @ 3.7GHz, running 1 task : 17min
GTX 760 (376.33), 2 cards, Intel X56xx @ 4GHz, running 1 task per card : 32min
GTX 760 (376.33), 2 cards, Intel X56xx @ 4GHz, running 2 tasks per card : 60min
AMD R9 270X (16.12.1), Intel X56xx @ 4GHz, running 1 task : 9min
- Mediocre GPU from AMD seems to be much faster than any 700-series Nvidia
- not much benefit from running 2 tasks parallel with 700-series Nvidia
- the progress bar may jump to 100% from whatever point (25, 40, 60 for example)
Copyright © 2024 Einstein@Home. All rights reserved.
Posting here as well...Some
)
Posting here as well, as this is the proper thread...
FGRPB1G v1.16, times in seconds:
1x 2x
-----------------------------
Fury X 330 420
HD 7950 460 560
GTX 1050 Ti 730 1180
Tesla K20c 1500
GTX 1080 350 480 (from user Matt)
There's something odd happening on the Tesla, GPU is well utilized, but CPU at 99% too. I didn't have any problems with Milkyway or BRP there...
As this application is much
)
As this application is much more CPU-hungry than was the mature BRP4G application, a host which was well-balanced for high BRP4G productivity is likely to be underpowered in CPU for FGRPB1 1.16. Once consequence is that people with a lot of GPU capability on a host may need to get used to lower GPU load than they have liked. Also one may wish to reconsider sharing a host between GPU and pure-CPU work.
My most productive machine has one GTX 1070 and one 6GB GTX 1060 on a machine with a 4-core i5-4690K CPU.
Running 2X on 1.16 (so four total 1.16 tasks), on that machine I am seeing GPU-Z reported GPU load of about 87%, memory controller load of about 47%, and Bus Interface load of 15%.
In my current configuration the 1.16 tasks running 2X on the 1070 are completing in about 9:20 elapsed time, and the 1060 in about 12:02, both at a reported CPU utilization of about 90%. I'm not sure whether the 90% is near the limit of their hunger, or was limited by availability.
I urge people not to judge likely productivity from the reported completion percentage of partially processed work. Not only is 1.16 leaping to 100% from some far less point, but the jump point varies, a lot, as Richie reported in this thread.
The results are in for my two
)
The results are in for my two GTX 750 Ti's running under Win7 64-bit. This are minimally-factory overclocked cards running at 1210 MHz according to GPU-Z. They each reserve a core of an i7-4771 (Z87 MB). I have four other cores running the Gravity Wave CV work units, so in essence four cores are free to support the GPUs, which should be enough.
Note also that these are the "LATeah2003L_236" series; the times for the other series may vary slightly.
One work unit per card: 1306 seconds average per work unit (averaged over six work units)
Two work units per card: 1088 seconds average per work unit (averaged over four work units)
The average GPU usage was about 84% for one WU at a time, and 90% for two at a time. However, the power was almost the same, at about 49% TDP for one at a time verses about 51% TDP for two at a time. Hence, the temps were consistent and low, at 50 to 52 C.
Also note: I use the 359.06 drivers, being the last of the CUDA 7.5 series. These are faster than the later ones on CUDA, as previously discussed on this forum, though I don't know if they make any difference for OpenCL.
I have some timings to report
)
I have some timings to report from some combinations of card types and host CPUs.
In all cases save the 1050, the host is supporting two cards. As this application is more CPU-dependent than earlier ones here, I suspect there will be material host capability modulation of the reported card performance, and sharing the host across two cards probably hurts some also (it was not hurting enough to notice for the late Cuda55 applications).
The CPUs are all running at stock clock. However all the GPUs are overclocked in both core clock and memory clock, running at clocks I found to be long-term stable on the previous major applications here, a couple of ticks slower than maximum observed success.
I'm happy to report that I
)
I'm happy to report that I already have 35 validations across my four hosts running this application, with zero error or invalid results so far.
archae86 wrote:I urge people
)
I think this might have to do with how Boinc behaves when an app doesn't report progress, Boinc will then estimate progress (based on the estimated runtime, I think) of the task and the progress will converge on 100% done but never fully reach it.
I think the the new FGRP GPU app either is "broken" when it comes to progress reporting or it's not there at all.
Hopefully the project will release a new version when the dust of the initial deployment has settled.
Mumak wrote:Posting here as
)
i7 6950X @4.0Gh DDR4 3.47GHz 4-980Tis
1x 3x 4x
GTX 980Ti 337sec 690sec (or 230sec per work units) 910sec (or 227.5 sec)
Can't test 5 at a time, not enough cores...
Running one task at a time on
)
Running one task at a time on a 4770k that's running at 3.9GHz and paired with my GTX 980 Ti Golden Edition..... My tasks are completing around the 7:20 mark or so. Oh right, 16GB of DDR3-2400 CL10, running Windows 7 Home Premium 64-Bit Edition.
What's moderately annoying though is I have my queue set for 0.45 days (with 0.05 extra) and it downloaded 498 work units, which by quick estimates pins it at around two and a half days or so. And even better? It's constantly trying to get even more, and the server is deferring me 24 hours now with the message "reached daily quota of (amount of work units remaining in my queue)", so I'm having to manually upload finished work units every couple of hours.
1.16 is a definite improvement for Windows users compared to 1.15 though!
Interestingly enough, all eight threads on my rig are pinned at 100% usage, despite me having BOINC set to only use 75%.
WhiteWulfe wrote:... the
)
I hit the daily quota limit (640 in my case). Initially it appeared that the deferral time BOINC displayed was until twenty minutes or so after midnight UTC. I figured that it figured that midnight UTC was when a new day's quota would kick in.
But my host has gotten zero new work since the quota limit message first displayed, and now displays deferral until about half an hour after midnight UTC for another day.
Does anyone reading this actually know just how the reached daily limit system works? Midnight in Germany? Midnight in the American Midwest? 24 hours after the quota was reached? 24 hours after the first unit downloaded in the group that hit the limit? ...
I figure I'll try an update now and again hoping the server will have a different opinion than does boinc on my machine as to when my timeout is over. At the current rate of progress, I'll have gone to zero work about nine hours before the indicated deferral end.
Yes, v 1.16 MUCH faster.
)
Yes, v 1.16 MUCH faster. Runtimes drops down from 28-30k sec ( about 8 hours) to 900-1000s on AMD HD 7870 with single WU running.
Like ~30 times faster. Now its real GPU speed.
Good work.
GPU VRAM use almost 3 times higher though (700-800 Mb per task) so 1Gb cards can run only one task and 2 GB only 2.
Actually i try 2 WUs on 1 GB card (HD 7850) and it work OK, but slower - seem it start using system RAM via PCI-E.