Gamma-ray pulsar binary search #1 on GPUs

Kailee71
Kailee71
Joined: 22 Nov 16
Posts: 35
Credit: 42623563
RAC: 0

Defender_2 wrote:Since E@H is

Defender_2 wrote:
Since E@H is verhy bandwith-hungry I guess it's caused by the different PCI-lanes.

Could someone please confirm this? I would have thought as much data as possible would be transferred to gpu memory initially where it would then be crunched... This would mean pie bandwidth wouldn't have such a large effect on performance.

Tia,

 

Kailee.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250515772
RAC: 34216

In the FGRP GPU app I think

I don't know the BRP code from the top of my head, but I believe in the FGRP GPU app less data is transferred between CPU and GPU memory than in BRP. I wouldn't expect the PCI bandwith being an issue here.

There is indication, however, that the clFinish()s in the current code cause a lot of CPU load via the driver. I'm really not sure how much data transfer this involves, it really depends on the implementation in the driver. We'll work on a way to get rid of these clFinish()s as much as possible, which should also reduce the CPU utilization.

BM

Kailee71
Kailee71
Joined: 22 Nov 16
Posts: 35
Credit: 42623563
RAC: 0

Ok so I did some gpu

Ok so I did some gpu shuffling between my machines to see what's going on. The machines are:
elmo: 2x L5630, PCIE x8
frazzle: 2x X5670, PCIE x16
kermit: 2x X5670, PCIE x8

Runtimes (FGRP 1.14 opencl-ati on R9 280x);
elmo: 975 frazzle: 420 kermit: 437
Runtimes (FGRP 1.14 opencl-nvidia on GTX580);
elmo: 890 frazzle: 1100 kermit: 1130

Geekbench compute results using the R9's follow this trend also; elmo: 56129, frazzle 113664, kermit: 113168

So clearly the PCIE bandwidth is not the issue; frazzle and kermit are identical except for the x16/x8 slots, and kermit and elmo have literally identical motherboards. The only significant difference here then are the CPUs; can these really cause such large discrepancies?

 

TIA,

 

Kailee.

 

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

Are they all running the same

Are they all running the same memory at the same speed and channels?

A while back i was looking at the difference between several Xeons here and there is a  Intel PCM tool

The Intel tool helped identify the difference.  HTH.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Kai Leibrandt wrote:So

Kai Leibrandt wrote:
So clearly the PCIE bandwidth is not the issue; frazzle and kermit are identical except for the x16/x8 slots, and kermit and elmo have literally identical motherboards. The only significant difference here then are the CPUs; can these really cause such large discrepancies?

Both hosts have 24GB RAM, but what is the RAM speed on them (MHz, single/dual/triple channel, CAS-latencies)? I wonder if difference there could be a reason... if one is running RAM perhaps at significantly different speed.

edit: AgentB had faster thoughts :)

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

http://ark.intel.com/compare/

http://ark.intel.com/compare/47920,47927 shows the L5630 should be slower all things being equal.

Pete
Pete
Joined: 31 Jul 10
Posts: 14
Credit: 1020243718
RAC: 0

I would like to take issue

I would like to take issue with Oliver Bock's comment on page 12 of this thread where he states that utilising the gpu at 100% is their goal. BOINC states on its first page that:-Use the idle time on your computer (Windows, Mac, Linux, or Android) to cure diseases, study global warming, discover pulsars, and do many other types of scientific research. It's safe, secure, and easy:   Therefore taking 100% of my computer's gpu and causing it to be laggy and visually unusable is not in the ethos of BOINC. On the previous application I ran 2 simultaneous  BRPG4s and the gpu ran at about 95-98%. This left plenty for me to run web browsing and YouTube videos and I never noticed Einstein crunching away in the background.I would be more than happy to continue in this way. Now with one of these 1.16 applications I cannot. Someone else suggested running 'not as root user' well I don't know how to do this, since I think it already is a root user. Oliver also suggested TThrottle;- well I could only see how gpu use is throttled by temperature not % use, so that is a very blunt tool. I tried it and it seemed useless as it hunted for long periods of time around the temp and still allowed 100% use of the gpu. The other option of using computer preferences is poor for me as Einstein would hardly use the gpu at all for many hours of the day. I also read that people with GTX750s have this problem so what of your concern about all the little people who contribute to your science project? My guess is that most will turn off the application or gpu as it make their computer unuseable.

Lastly, from an engineering point of view, and please correct me if I am wrong, I was happy that my gpu sat at 95-98% all day as the temperature was stable from one hour to the next. Now the gpu goes at 100% for 4-5 secs then spikes to 0% then back to 100% again. Thermally cycling millions of electronic junction by a degree or so. If I use computer preferences then the temp will do much bigger thermal cycles all day long. Is this good for the longevity of my card?

 

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

I agree with you that

I agree with you that something like 98% could be better goal than 100% (my one host experienced same kind of total lagging earlier with test version... and I can see it's an annoying moment to even try to reach the settings if the computer almost doesn't respond at all).

Pete_28 wrote:
I tried it and it seemed useless as it hunted for long periods of time around the temp and still allowed 100% use of the gpu.

In the "Expert mode settings" on TThrottle... did you try the 'Laptop' mode for GPU throttling? Manual says: "Sometimes the GPU throttle is working too fast or too slow. Select "desktop" for slow regulating and "laptop" for fast regulation." I don't know if this would help get rid of the lagging, but what if you set even lower temp limit for the GPU... so that TThrottle would really start to limit the utilization?

Pete_28 wrote:
this good for the longevity of my card?

I believe that if it's possible to run more than one task at a time, that would help to even out the thermal stress at least somewhat. If tasks are running in different phase then there's a change that GPU utilization doesn't keep jumping from 0% to something. Instead, there will be some amount of load all the time.

choks
choks
Joined: 24 Feb 05
Posts: 16
Credit: 146112101
RAC: 66110

@Pete:  Now the gpu goes at

@Pete:  Now the gpu goes at 100% for 4-5 secs then spikes to 0% then back to 100% again

That's the way the maths works: we need to go to the CPU at some points at least to have boinc status saved, so you can restart your job not at the beginning. 

As for thermal cycling, do you see significant GPU temperature changes?

Mouse/scrolling lags looks a more important issue to me. It is only affecting Nvidia cards? - From what I have read here, Nvidia cards performing a WU in more than 1000-2000 seconds looks affected by lagging.

Please report the inconvenience and your HW specs.

I can make a patch tomorrow to force a sleep in order to make the desktop more responsible, but I need to know on which hardware I have to do it. (don't want to slow down users that don't have any lag effects - like meLaughing )

Thanks,

Christophe

 

TimeLord04
TimeLord04
Joined: 8 Sep 06
Posts: 1442
Credit: 72378840
RAC: 0

[Update:] MAC/Hackintosh is

[Update:]

MAC/Hackintosh is FINALLY on the 1.14 FGRPB1G Units.  Been crunching since 6 AM - PST, and at some point in the last couple of hours the MAC has completed several 1.14 Units.

Of the Invalids that I have showing for the MAC, three are 1.12 Units, and two are 1.13 Units.  I still attribute these Invalids to the same OpenCL Bug that affects SETI OpenCL Units on MAC.  I will keep monitoring and report anymore new Invalids.

Both of my systems are still crunching TWO Units at a time per GPU card.  Three GPU cards crunching, for a Total of 6 Units at a time.

 

TL

TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join SETI Refugees

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.