Gamma-ray pulsar binary search #1 on GPUs

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,310
Credit: 46,514,035,701
RAC: 34,076,120

I'm running 2x successfully

I'm running 2x successfully on some 2GB AMD HD7850s (rather ancient fglrx drivers).  The two complete in around 700s with just one unloaded CPU core for support, controlled by app_config.xml.  I tried to run 3x on one machine. Nothing failed but performance dropped off a cliff.  After about 30 mins of that, I went back to 2x and the tasks completed at the former speed without apparent further problem.  I wondered why at the time but wasn't smart enough to think of the possibility that 3 tasks wouldn't fit in the 2GB.  I guess this means that 2x is out for any 1GB cards then?

If using the GPU utilization factor, a core is reserved for each concurrent GPU task.  If using app_config.xml, you can set what you want.  I don't really have time to test at the moment but one core supporting 2 GPU tasks seems to not cause too much slow down of the GPU tasks - at a guess, maybe somewhere around the 5% mark.  I'm happy to wear that because I want to run GW tasks on as many cores as possible.

The biggest problem I see at the moment is the huge disparity between estimate (~1.5 hours) and reality (700s).  It means that unless I use a cache setting of less than 0.5 days, I get huge numbers of CPU tasks because of the dramatic drop in DCF.  I anticipated this problem and was prepared but there may be other 'victims' who get quite a shock Surprised.

 EDIT:  Also, there will be violent swings in DCF as the long running CPU tasks reset the value upwards each time one finishes - the classic DCF 'sawtooth' on steroids.  Because of the magnitude of the disparity, the true amount of work in the cache at any instant will be quite variable, which may annoy some people.  A 0.5 day cache setting for me with just FGRPB1G tasks for the GPU seems to keep lots of GPU tasks but lasting less than 0.5 days and perhaps more than 5 days of CPU tasks - I haven't had time to work it out properly yet.

 

Cheers,
Gary.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 381
Credit: 201,998,644
RAC: 158

choks wrote:@Jim: Looks you

choks wrote:

@Jim: Looks you have 2Gb of GPU RAM which should be enough to perform 2 tasks at the same time. Does it occur if you run two BRP4G WU? (just quit the Beta to test it).

What's clear is that the Nvidia driver got completely crazy after the first error, and computation have be done outside normal GPU memory, which leads to the FP exception you get. 

Did you had to reboot your machine to perform GPU tasks successfully, or just restart BOINC did it? 

Thx

Good suggestions, but I think I have run out of my daily quota and will have to try later.  One outside possibility is that it is somehow related to the fact that I was running Folding on those cards earlier.  I always allow the Folding work unit to finish, and then delete the Folding slot, which should clean everything out.  But maybe something is left in GPU memory?  They both use OpenCL, and there may be some common components or whatever.  I don't remember if I rebooted or not, but will give it a good test in a day or two.  Thanks for the input.

Trotador
Trotador
Joined: 2 May 13
Posts: 58
Credit: 1,258,845,104
RAC: 178,613

The HD7950 has 3GB RAM so it

The HD7950 has 3GB RAM so it is on the safe side. I did not test 4 units, I think because of the 1 CPU requirement since I'm pushing some personal mark in WCG. I was running 4 BRP4 units at once just before.

Also in an ancient ubuntu distro with old fglrx drivers.

poppageek
poppageek
Joined: 13 Aug 10
Posts: 257
Credit: 1,566,830,781
RAC: 792,424

Gary Roberts wrote:The

deleted

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 381
Credit: 201,998,644
RAC: 158

Jim1348 wrote:One outside

Jim1348 wrote:
One outside possibility is that it is somehow related to the fact that I was running Folding on those cards earlier.  I always allow the Folding work unit to finish, and then delete the Folding slot, which should clean everything out. 

Success!  I uninstalled the Folding control program entirely and rebooted.  Now I can run two GTX 960s with two FGRPopenCl work units per card, or a total of four at once.  I don't know whether it was because Folding was leaving the leftovers of a finished work unit around, or because the Folding program itself uses GPU memory or what, but I won't try to use it with a BOINC GPU project again (Folding on a GPU works OK with BOINC CPU projects however).

However, running two work units on my GTX 960s takes about 1170 seconds, or 585 seconds per work unit.  That is really not worth the trouble, considering that it uses two CPU cores per card (one per work unit) at the default settings.  I could reduce it to one CPU core per card, but that would push the time up, and it is only 600 seconds when running a single work unit (this is with an i7-4790 on a Z97 motherboard).  So I think a single work unit pushes the memory bandwidth of the GTX 960 as far as it will go, and I will stick with one work unit per card, but at least I can now run two cards at once.  Thanks for the comments.

choks
choks
Joined: 24 Feb 05
Posts: 16
Credit: 63,120,555
RAC: 7

Running two WU per GPU for

Running two WU per GPU for FGRPopencl does not bring much difference, because:

- WU setup is done with the CPU and only takes 10 seconds,

- only about 20 ms is spent in the CPU to prepare the next data set. That's only 0.720 seconds of CPU time for a whole WU

- data resampling & sorting, FFT, threshold on results is performed by the GPU (no FFT data transfer at all from CPU <-> GPU)

- most of the final stage is performed in the GPU (if FP64 support)

 

The GPU workload is spread across resampling and FFT:

- FFT is about 2/3 of total time spent and only depends on GPU memory speed.

- data resampling is about the other 1/3, and mainly depends on raw float specs in GFlops/s.

 

One trick to reduce power consumption and increase power efficiency: I have an R9 fury, and if I let it run at stock speed (1050 MHz) my machine uses 360 Watts at the wall (gets pretty hot and noisy)

If I downscale the GPU to 717Mhz, power consumption peaks at 220W, with about 20% less BOINC credits a day. At 717Mhz my PC stays cool and extremely quiet.

For Linux user, with sudo:

echo manual > /sys/class/drm/cardX/device/power_dpm_force_performance_level echo 2 > /sys/class/drm/cardX/device/pp_dpm_sclk

where X is your card number. (For Nvidia users, check the names under devices - might be different)

Even if I run the GPU at 974MHz, instead of 1050, power decrease by 60 Watts (~15 LEDs) without any noticeable drop in WU/day.

Kailee71
Kailee71
Joined: 22 Nov 16
Posts: 35
Credit: 42,623,563
RAC: 0

Wow great info!Any idea

Wow great info - thank you for that!

Any idea when my Mac with R9 280x might start getting GPU WUs? It's just been churning CPU jobs at the for the past day or so, whereas another Mac with a GTX 580 is happily crunching GPU jobs...

Thanks again,

 

Kailee.

 

Edit: I thought something fishy was going on and indeed, checking with XRG rather than istatmenu, my R9 280x is showing GPU activity (and probably has been active all this time). Getting runtimes of ~240 (s) for a single WU running, and 380 (s) for two parallel tasks (so ~190 each). Anyone have an idea why istatmenu doesn't show AMD GPU activity proberly, when it does for nvidia?

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 738,423,020
RAC: 0

I pulled out my two 7970

I installed two of my 7970 cards this evening. The run time per task is approximately 199 seconds each. That is quite the improvement over running tasks via the FGRP CPU application.

I have not looked at the AMD Linux drivers in quite a while but I was surprised to see that the latest drivers supporting this GPU model are about a year old which in turn required going back to an older kernel and xorg version.

Petec888
Petec888
Joined: 3 Oct 06
Posts: 33
Credit: 1,141,276,802
RAC: 459,605

Something weird is going on,

Something weird is going on, I haven't gotten any Einstein GPU tasks on my Macs with AMD cards in a long time.  Might be concurrent with going to macOS Sierra, but I'm not getting any GPU tasks on a Mac Pro running El Capitan with a R9 280x either.  I seem to remember getting a bunch of them a while back and having computation errors after only a few seconds.

 

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513,211,304
RAC: 0

Jeroen_9 wrote:I have not

Jeroen_9 wrote:
I have not looked at the AMD Linux drivers in quite a while but I was surprised to see that the latest drivers supporting this GPU model are about a year old which in turn required going back to an older kernel and xorg version.

AMD have just released 16.50.362463 of AMDGPU (this week) and according to Phoronix

GCN 1.0 / Southern Islands Support: With Linux 4.9 the AMDGPU DRM driver added experimental GCN 1.0 / Southern Islands support, which is disabled by default at the kernel's build time. With the AMDGPU-PRO 16.50 release, its DKMS kernel module ships with the GCN 1.0 support enabled along with a supported user-space stack...

Unfortunately, it wasn't stable at all. I was unable to get any GCN 1.0 benchmarks to share today as it was consistently hanging while running graphics tests, sometimes even when just launching Steam. But when popping in newer GCN hardware, those problems didn't appear atop this AMDGPU-PRO stack. So while the GCN 1.0 support is there for AMDGPU-PRO, your mileage may vary upon how usable it is.

I may wait a few days and give it a try on my RX-480 and update https://einsteinathome.org/content/ubuntu-1604-lts-deprecating-amds-fglrx-catalyst-replacing-amd-gpupro

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.