I don't know what's best to do for multi-GPU setups. Firstly, I don't have any so I haven't put much thought into it. I might be totally wrong but I tend to think that interactions might be such as to make all GPUs sharing the bus have a somewhat 'crippled' performance compared to a single unit. Also this might vary over time so that stats might show worse variation than what is directly attributable to the way the beta app itself now works.
I operate two hosts that have two add-on card GPUs each. In both cases one is a GTX 660 and the other some flavor of GTX 750.
I can affirm that when I set them up on Perseus the sharing of system resources materially impaired productivity of each individual card compared to standalone operation. Further, I suspect that the heavy consumption of system resources by WUs in the high-CPU consumption tail will probably impair work on the other GPU, which will broaden the overall distribution of elapsed times. On the plus side, the non-tail population WUs have dramatically lower CPU consumption (also PCIe bus traffic) than the pre-beta, so it is reasonable to suppose that for work in the base population the cross-effects may be much less in Parkes 1.50 work than 1.39, and even the tail may get better if Bikeman's next effort enjoys success.
Regarding the attribution to a particular GPU issue: I stumbled this week on the fact that BOINCTasks, which I run on one PC in my flotilla as a sort of control panel to monitor and adjust BOINC work across the flotilla, labels completed work shown in the history tab by device number. As data copied from that tab was already my preferred source for productivity computations, a simple text-to-columns isolation of the device ID from column in which it appears allows easy separation. As it only lists one device, this won't work in the case that a restarted WU restarted on a different device, but such cases don't seem to occur in normal undisturbed operations.
Regarding the device labelling, in BoincTasks version 1.61, which I currently run, it shows up in the history list in the "use" column.
is a subset of the history columns showing information for two completed WUs which ran on the GTX 660 in a little over 3 hours ET with a bit over 13 minutes CPU, plus one WU which ran on a GTX 750 Ti taking about an hour longer.
I have seen a nice boost in performance with the latest 1.50 Beta application for my NVIDIA card.
[...]
Jeroen
Interesting, this confirms my suspicion that NVIDIA cards will benefit much more from the recent optimization than AMD cards.
People who use both NVIDIA and AMD cards and do so on E@H and other GPU-projects might be able to give a rough indication whether BRP6-Beta has leveled the playing field between NVIDIA and AMD, in the following sense:
For every project P_i, you will see that a given NVIDIA card performs at a factor of k_i compared to some given AMD card of yours. How does this factor compare for your other projects with the factor for E@H?
Cheers
HB
I do not have any recent data from other projects for an accurate comparison. I do recall that with Seti@Home, my 780 Ti was slightly faster than my 7970 with both Seti v7 and Astropulse v6. The latest BRP6 optimizations have helped level the playing field between my NVIDIA and AMD cards. My NVIDIA card is now within reach of my AMD card for daily production.
Thanks for the work done on optimizing the application.
[
I do not have any recent data from other projects for an accurate comparison. I do recall that with Seti@Home, my 780 Ti was slightly faster than my 7970 with both Seti v7 and Astropulse v6. The latest BRP6 optimizations have helped level the playing field between my NVIDIA and AMD cards. My NVIDIA card is now within reach of my AMD card for daily production.
Great, that was the kind of qualitative statement I had hoped for, thanks.
My 1st beta 1.52 is running on my Intel HD4000 (1.30 GHz, i7 3770K with dual channel DDR3-2400)! It's going to take a long time (no worries), so collecting stats is going to be tedious. However, what I can say immediately: with one 1.52 and one 1.39 running, the runtime of the 1.39 tasks has reduced from 21 minutes to a bit over 16 minutes! It's going to get even more intersting once 2 of the new WUs are running.
Besides: the utilization of my GTX970 running GPU-Grid seems to be up, from 92 - 94% to 94-95%. Getting better numbers is also tedious here and may be rather pointless with one 1.39 still running.
Currently I have 2 physical cores dedicated to feed the iGPU and the GTX970 (also running 2 WUs with constant polling). Due to the optimizations I may be able to open more CPU ressources for other tasks - a very welcome side effect of the new app :)
The PCIe bandwidth usage is highly data dependent. I hope to make a new beta app version soon (this week) that will further optimize this and will reduce the variance between different WUs, hopefully.
From recent posts in Tech News, that seems to be upon us already!! :-).
I'll modify the template in the RESULTS thread to include 2 BRP6-beta lines, one for betas up to 1.50 and the other for 1.52 - unless you'd like it done some other way?
Actually we are not sure yet whether we want to keep serving the rather big BRP6 work units to the Intel iGPUs. At least the less powerful among them like the HD 2500 will take longer to crunch than we usually like tasks to take to complete. It is quite possible that we'll stop BRP6 beta on the Intel iGPUs after some initial tests.
I'll modify the template in the RESULTS thread to include 2 BRP6-beta lines, one for betas up to 1.50 and the other for 1.52 - unless you'd like it done some other way?
I agree, the tasks are going to take very long on weak iGPUs. This includes the lesser tiers and mobile ones which can't clock as high. However, furture iGPUs are only becoming more powerful. As long as there is enough short work which I can crunch with the new app, I'm completely fine.
If this poses a problem you could offer both (the current short tasks and regular long ones) and let users decide to opt-in into the long ones.
I'm well into running my first on my Haswell HD 4600. Looks to be aiming at ~9 hours, which is long but not unmanageable. That's running a single task, CPU 75% loaded, so a 'free core' (which BRP4 liked, very much). Tomorrow I'll try 100% CPU, followed by 100% + realtime priority.
RE: I don't know what's
)
I operate two hosts that have two add-on card GPUs each. In both cases one is a GTX 660 and the other some flavor of GTX 750.
I can affirm that when I set them up on Perseus the sharing of system resources materially impaired productivity of each individual card compared to standalone operation. Further, I suspect that the heavy consumption of system resources by WUs in the high-CPU consumption tail will probably impair work on the other GPU, which will broaden the overall distribution of elapsed times. On the plus side, the non-tail population WUs have dramatically lower CPU consumption (also PCIe bus traffic) than the pre-beta, so it is reasonable to suppose that for work in the base population the cross-effects may be much less in Parkes 1.50 work than 1.39, and even the tail may get better if Bikeman's next effort enjoys success.
Regarding the attribution to a particular GPU issue: I stumbled this week on the fact that BOINCTasks, which I run on one PC in my flotilla as a sort of control panel to monitor and adjust BOINC work across the flotilla, labels completed work shown in the history tab by device number. As data copied from that tab was already my preferred source for productivity computations, a simple text-to-columns isolation of the device ID from column in which it appears allows easy separation. As it only lists one device, this won't work in the case that a restarted WU restarted on a different device, but such cases don't seem to occur in normal undisturbed operations.
Regarding the device labelling, in BoincTasks version 1.61, which I currently run, it shows up in the history list in the "use" column.
For example:
[pre]PM0008_01011_316_0 03:14:23 (00:13:13) 0.2C + 0.5NV (d1)
PM0008_01011_158_0 03:13:36 (00:13:18) 0.2C + 0.5NV (d1)
PM0008_01011_304_0 04:22:34 (00:16:41) 0.2C + 0.5NV (d0)[/pre]
is a subset of the history columns showing information for two completed WUs which ran on the GTX 660 in a little over 3 hours ET with a bit over 13 minutes CPU, plus one WU which ran on a GTX 750 Ti taking about an hour longer.
RE: RE: I have seen a
)
I do not have any recent data from other projects for an accurate comparison. I do recall that with Seti@Home, my 780 Ti was slightly faster than my 7970 with both Seti v7 and Astropulse v6. The latest BRP6 optimizations have helped level the playing field between my NVIDIA and AMD cards. My NVIDIA card is now within reach of my AMD card for daily production.
Thanks for the work done on optimizing the application.
Jeroen
RE: [ I do not have any
)
Great, that was the kind of qualitative statement I had hoped for, thanks.
HBE
RE: Does it mean that the
)
I do not think there is any firm correlation between the "content"/scientific value of a WU and the run time.
HB
My 1st beta 1.52 is running
)
My 1st beta 1.52 is running on my Intel HD4000 (1.30 GHz, i7 3770K with dual channel DDR3-2400)! It's going to take a long time (no worries), so collecting stats is going to be tedious. However, what I can say immediately: with one 1.52 and one 1.39 running, the runtime of the 1.39 tasks has reduced from 21 minutes to a bit over 16 minutes! It's going to get even more intersting once 2 of the new WUs are running.
Besides: the utilization of my GTX970 running GPU-Grid seems to be up, from 92 - 94% to 94-95%. Getting better numbers is also tedious here and may be rather pointless with one 1.39 still running.
Currently I have 2 physical cores dedicated to feed the iGPU and the GTX970 (also running 2 WUs with constant polling). Due to the optimizations I may be able to open more CPU ressources for other tasks - a very welcome side effect of the new app :)
MrS
Scanning for our furry friends since Jan 2002
RE: The PCIe bandwidth
)
From recent posts in Tech News, that seems to be upon us already!! :-).
I'll modify the template in the RESULTS thread to include 2 BRP6-beta lines, one for betas up to 1.50 and the other for 1.52 - unless you'd like it done some other way?
Cheers,
Gary.
RE: My 1st beta 1.52 is
)
Actually we are not sure yet whether we want to keep serving the rather big BRP6 work units to the Intel iGPUs. At least the less powerful among them like the HD 2500 will take longer to crunch than we usually like tasks to take to complete. It is quite possible that we'll stop BRP6 beta on the Intel iGPUs after some initial tests.
HB
RE: I'll modify the
)
That would be perfect, thanks
HB
I agree, the tasks are going
)
I agree, the tasks are going to take very long on weak iGPUs. This includes the lesser tiers and mobile ones which can't clock as high. However, furture iGPUs are only becoming more powerful. As long as there is enough short work which I can crunch with the new app, I'm completely fine.
If this poses a problem you could offer both (the current short tasks and regular long ones) and let users decide to opt-in into the long ones.
MrS
Scanning for our furry friends since Jan 2002
I'm well into running my
)
I'm well into running my first on my Haswell HD 4600. Looks to be aiming at ~9 hours, which is long but not unmanageable. That's running a single task, CPU 75% loaded, so a 'free core' (which BRP4 liked, very much). Tomorrow I'll try 100% CPU, followed by 100% + realtime priority.