Since E@H is verhy bandwith-hungry I guess it's caused by the different PCI-lanes.
Could someone please confirm this? I would have thought as much data as possible would be transferred to gpu memory initially where it would then be crunched... This would mean pie bandwidth wouldn't have such a large effect on performance.
I don't know the BRP code from the top of my head, but I believe in the FGRP GPU app less data is transferred between CPU and GPU memory than in BRP. I wouldn't expect the PCI bandwith being an issue here.
There is indication, however, that the clFinish()s in the current code cause a lot of CPU load via the driver. I'm really not sure how much data transfer this involves, it really depends on the implementation in the driver. We'll work on a way to get rid of these clFinish()s as much as possible, which should also reduce the CPU utilization.
Ok so I did some gpu shuffling between my machines to see what's going on. The machines are:
elmo: 2x L5630, PCIE x8
frazzle: 2x X5670, PCIE x16
kermit: 2x X5670, PCIE x8
Geekbench compute results using the R9's follow this trend also; elmo: 56129, frazzle 113664, kermit: 113168
So clearly the PCIE bandwidth is not the issue; frazzle and kermit are identical except for the x16/x8 slots, and kermit and elmo have literally identical motherboards. The only significant difference here then are the CPUs; can these really cause such large discrepancies?
So clearly the PCIE bandwidth is not the issue; frazzle and kermit are identical except for the x16/x8 slots, and kermit and elmo have literally identical motherboards. The only significant difference here then are the CPUs; can these really cause such large discrepancies?
Both hosts have 24GB RAM, but what is the RAM speed on them (MHz, single/dual/triple channel, CAS-latencies)? I wonder if difference there could be a reason... if one is running RAM perhaps at significantly different speed.
I would like to take issue with Oliver Bock's comment on page 12 of this thread where he states that utilising the gpu at 100% is their goal. BOINC states on its first page that:-Use the idle time on your computer (Windows, Mac, Linux, or Android) to cure diseases, study global warming, discover pulsars, and do many other types of scientific research. It's safe, secure, and easy: Therefore taking 100% of my computer's gpu and causing it to be laggy and visually unusable is not in the ethos of BOINC. On the previous application I ran 2 simultaneous BRPG4s and the gpu ran at about 95-98%. This left plenty for me to run web browsing and YouTube videos and I never noticed Einstein crunching away in the background.I would be more than happy to continue in this way. Now with one of these 1.16 applications I cannot. Someone else suggested running 'not as root user' well I don't know how to do this, since I think it already is a root user. Oliver also suggested TThrottle;- well I could only see how gpu use is throttled by temperature not % use, so that is a very blunt tool. I tried it and it seemed useless as it hunted for long periods of time around the temp and still allowed 100% use of the gpu. The other option of using computer preferences is poor for me as Einstein would hardly use the gpu at all for many hours of the day. I also read that people with GTX750s have this problem so what of your concern about all the little people who contribute to your science project? My guess is that most will turn off the application or gpu as it make their computer unuseable.
Lastly, from an engineering point of view, and please correct me if I am wrong, I was happy that my gpu sat at 95-98% all day as the temperature was stable from one hour to the next. Now the gpu goes at 100% for 4-5 secs then spikes to 0% then back to 100% again. Thermally cycling millions of electronic junction by a degree or so. If I use computer preferences then the temp will do much bigger thermal cycles all day long. Is this good for the longevity of my card?
I agree with you that something like 98% could be better goal than 100% (my one host experienced same kind of total lagging earlier with test version... and I can see it's an annoying moment to even try to reach the settings if the computer almost doesn't respond at all).
Pete_28 wrote:
I tried it and it seemed useless as it hunted for long periods of time around the temp and still allowed 100% use of the gpu.
In the "Expert mode settings" on TThrottle... did you try the 'Laptop' mode for GPU throttling? Manual says: "Sometimes the GPU throttle is working too fast or too slow. Select "desktop" for slow regulating and "laptop" for fast regulation." I don't know if this would help get rid of the lagging, but what if you set even lower temp limit for the GPU... so that TThrottle would really start to limit the utilization?
Pete_28 wrote:
this good for the longevity of my card?
I believe that if it's possible to run more than one task at a time, that would help to even out the thermal stress at least somewhat. If tasks are running in different phase then there's a change that GPU utilization doesn't keep jumping from 0% to something. Instead, there will be some amount of load all the time.
@Pete: Now the gpu goes at 100% for 4-5 secs then spikes to 0% then back to 100% again
That's the way the maths works: we need to go to the CPU at some points at least to have boinc status saved, so you can restart your job not at the beginning.
As for thermal cycling, do you see significant GPU temperature changes?
Mouse/scrolling lags looks a more important issue to me. It is only affecting Nvidia cards? - From what I have read here, Nvidia cards performing a WU in more than 1000-2000 seconds looks affected by lagging.
Please report the inconvenience and your HW specs.
I can make a patch tomorrow to force a sleep in order to make the desktop more responsible, but I need to know on which hardware I have to do it. (don't want to slow down users that don't have any lag effects - like me )
MAC/Hackintosh is FINALLY on the 1.14 FGRPB1G Units. Been crunching since 6 AM - PST, and at some point in the last couple of hours the MAC has completed several 1.14 Units.
Of the Invalids that I have showing for the MAC, three are 1.12 Units, and two are 1.13 Units. I still attribute these Invalids to the same OpenCL Bug that affects SETI OpenCL Units on MAC. I will keep monitoring and report anymore new Invalids.
Both of my systems are still crunching TWO Units at a time per GPU card. Three GPU cards crunching, for a Total of 6 Units at a time.
Defender_2 wrote:Since E@H is
)
Could someone please confirm this? I would have thought as much data as possible would be transferred to gpu memory initially where it would then be crunched... This would mean pie bandwidth wouldn't have such a large effect on performance.
Tia,
Kailee.
In the FGRP GPU app I think
)
I don't know the BRP code from the top of my head, but I believe in the FGRP GPU app less data is transferred between CPU and GPU memory than in BRP. I wouldn't expect the PCI bandwith being an issue here.
There is indication, however, that the clFinish()s in the current code cause a lot of CPU load via the driver. I'm really not sure how much data transfer this involves, it really depends on the implementation in the driver. We'll work on a way to get rid of these clFinish()s as much as possible, which should also reduce the CPU utilization.
BM
Ok so I did some gpu
)
Ok so I did some gpu shuffling between my machines to see what's going on. The machines are:
elmo: 2x L5630, PCIE x8
frazzle: 2x X5670, PCIE x16
kermit: 2x X5670, PCIE x8
Runtimes (FGRP 1.14 opencl-ati on R9 280x);
elmo: 975 frazzle: 420 kermit: 437
Runtimes (FGRP 1.14 opencl-nvidia on GTX580);
elmo: 890 frazzle: 1100 kermit: 1130
Geekbench compute results using the R9's follow this trend also; elmo: 56129, frazzle 113664, kermit: 113168
So clearly the PCIE bandwidth is not the issue; frazzle and kermit are identical except for the x16/x8 slots, and kermit and elmo have literally identical motherboards. The only significant difference here then are the CPUs; can these really cause such large discrepancies?
TIA,
Kailee.
Are they all running the same
)
Are they all running the same memory at the same speed and channels?
A while back i was looking at the difference between several Xeons here and there is a Intel PCM tool
The Intel tool helped identify the difference. HTH.
Kai Leibrandt wrote:So
)
Both hosts have 24GB RAM, but what is the RAM speed on them (MHz, single/dual/triple channel, CAS-latencies)? I wonder if difference there could be a reason... if one is running RAM perhaps at significantly different speed.
edit: AgentB had faster thoughts :)
http://ark.intel.com/compare/
)
http://ark.intel.com/compare/47920,47927 shows the L5630 should be slower all things being equal.
I would like to take issue
)
I would like to take issue with Oliver Bock's comment on page 12 of this thread where he states that utilising the gpu at 100% is their goal. BOINC states on its first page that:-Use the idle time on your computer (Windows, Mac, Linux, or Android) to cure diseases, study global warming, discover pulsars, and do many other types of scientific research. It's safe, secure, and easy: Therefore taking 100% of my computer's gpu and causing it to be laggy and visually unusable is not in the ethos of BOINC. On the previous application I ran 2 simultaneous BRPG4s and the gpu ran at about 95-98%. This left plenty for me to run web browsing and YouTube videos and I never noticed Einstein crunching away in the background.I would be more than happy to continue in this way. Now with one of these 1.16 applications I cannot. Someone else suggested running 'not as root user' well I don't know how to do this, since I think it already is a root user. Oliver also suggested TThrottle;- well I could only see how gpu use is throttled by temperature not % use, so that is a very blunt tool. I tried it and it seemed useless as it hunted for long periods of time around the temp and still allowed 100% use of the gpu. The other option of using computer preferences is poor for me as Einstein would hardly use the gpu at all for many hours of the day. I also read that people with GTX750s have this problem so what of your concern about all the little people who contribute to your science project? My guess is that most will turn off the application or gpu as it make their computer unuseable.
Lastly, from an engineering point of view, and please correct me if I am wrong, I was happy that my gpu sat at 95-98% all day as the temperature was stable from one hour to the next. Now the gpu goes at 100% for 4-5 secs then spikes to 0% then back to 100% again. Thermally cycling millions of electronic junction by a degree or so. If I use computer preferences then the temp will do much bigger thermal cycles all day long. Is this good for the longevity of my card?
I agree with you that
)
I agree with you that something like 98% could be better goal than 100% (my one host experienced same kind of total lagging earlier with test version... and I can see it's an annoying moment to even try to reach the settings if the computer almost doesn't respond at all).
In the "Expert mode settings" on TThrottle... did you try the 'Laptop' mode for GPU throttling? Manual says: "Sometimes the GPU throttle is working too fast or too slow. Select "desktop" for slow regulating and "laptop" for fast regulation." I don't know if this would help get rid of the lagging, but what if you set even lower temp limit for the GPU... so that TThrottle would really start to limit the utilization?
I believe that if it's possible to run more than one task at a time, that would help to even out the thermal stress at least somewhat. If tasks are running in different phase then there's a change that GPU utilization doesn't keep jumping from 0% to something. Instead, there will be some amount of load all the time.
@Pete: Now the gpu goes at
)
That's the way the maths works: we need to go to the CPU at some points at least to have boinc status saved, so you can restart your job not at the beginning.
As for thermal cycling, do you see significant GPU temperature changes?
Mouse/scrolling lags looks a more important issue to me. It is only affecting Nvidia cards? - From what I have read here, Nvidia cards performing a WU in more than 1000-2000 seconds looks affected by lagging.
Please report the inconvenience and your HW specs.
I can make a patch tomorrow to force a sleep in order to make the desktop more responsible, but I need to know on which hardware I have to do it. (don't want to slow down users that don't have any lag effects - like me )
Thanks,
Christophe
[Update:] MAC/Hackintosh is
)
[Update:]
MAC/Hackintosh is FINALLY on the 1.14 FGRPB1G Units. Been crunching since 6 AM - PST, and at some point in the last couple of hours the MAC has completed several 1.14 Units.
Of the Invalids that I have showing for the MAC, three are 1.12 Units, and two are 1.13 Units. I still attribute these Invalids to the same OpenCL Bug that affects SETI OpenCL Units on MAC. I will keep monitoring and report anymore new Invalids.
Both of my systems are still crunching TWO Units at a time per GPU card. Three GPU cards crunching, for a Total of 6 Units at a time.
TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join SETI Refugees