Gamma-ray pulsar binary search #1 on GPUs

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119694087181

RAC: 25381688

I'm running 2x successfully

9 Dec 2016 20:38:00 UTC

Message 152618

(moderation:

)

I'm running 2x successfully on some 2GB AMD HD7850s (rather ancient fglrx drivers). The two complete in around 700s with just one unloaded CPU core for support, controlled by app_config.xml. I tried to run 3x on one machine. Nothing failed but performance dropped off a cliff. After about 30 mins of that, I went back to 2x and the tasks completed at the former speed without apparent further problem. I wondered why at the time but wasn't smart enough to think of the possibility that 3 tasks wouldn't fit in the 2GB. I guess this means that 2x is out for any 1GB cards then?

If using the GPU utilization factor, a core is reserved for each concurrent GPU task. If using app_config.xml, you can set what you want. I don't really have time to test at the moment but one core supporting 2 GPU tasks seems to not cause too much slow down of the GPU tasks - at a guess, maybe somewhere around the 5% mark. I'm happy to wear that because I want to run GW tasks on as many cores as possible.

The biggest problem I see at the moment is the huge disparity between estimate (~1.5 hours) and reality (700s). It means that unless I use a cache setting of less than 0.5 days, I get huge numbers of CPU tasks because of the dramatic drop in DCF. I anticipated this problem and was prepared but there may be other 'victims' who get quite a shock .

EDIT: Also, there will be violent swings in DCF as the long running CPU tasks reset the value upwards each time one finishes - the classic DCF 'sawtooth' on steroids. Because of the magnitude of the disparity, the true amount of work in the cache at any instant will be quite variable, which may annoy some people. A 0.5 day cache setting for me with just FGRPB1G tasks for the GPU seems to keep lots of GPU tasks but lasting less than 0.5 days and perhaps more than 5 days of CPU tasks - I haven't had time to work it out properly yet.

Cheers,
Gary.

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

choks wrote:@Jim: Looks you

9 Dec 2016 20:40:58 UTC

Message 152619 in response to message 152612

(moderation:

)

choks wrote:

@Jim: Looks you have 2Gb of GPU RAM which should be enough to perform 2 tasks at the same time. Does it occur if you run two BRP4G WU? (just quit the Beta to test it).

What's clear is that the Nvidia driver got completely crazy after the first error, and computation have be done outside normal GPU memory, which leads to the FP exception you get.

Did you had to reboot your machine to perform GPU tasks successfully, or just restart BOINC did it?

Thx

Good suggestions, but I think I have run out of my daily quota and will have to try later. One outside possibility is that it is somehow related to the fact that I was running Folding on those cards earlier. I always allow the Folding work unit to finish, and then delete the Folding slot, which should clean everything out. But maybe something is left in GPU memory? They both use OpenCL, and there may be some common components or whatever. I don't remember if I rebooted or not, but will give it a good test in a day or two. Thanks for the input.

Trotador

Joined: 2 May 13

Posts: 58

Credit: 2122645985

RAC: 3

The HD7950 has 3GB RAM so it

9 Dec 2016 20:56:43 UTC

Message 152620

(moderation:

)

The HD7950 has 3GB RAM so it is on the safe side. I did not test 4 units, I think because of the 1 CPU requirement since I'm pushing some personal mark in WCG. I was running 4 BRP4 units at once just before.

Also in an ancient ubuntu distro with old fglrx drivers.

poppageek

Joined: 13 Aug 10

Posts: 259

Credit: 2473733872

RAC: 0

Gary Roberts wrote:The

10 Dec 2016 14:38:02 UTC

Message 152633 in response to message 152618

(moderation:

)

deleted

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

Jim1348 wrote:One outside

10 Dec 2016 12:57:33 UTC

Message 152637 in response to message 152619

(moderation:

)

Jim1348 wrote:

One outside possibility is that it is somehow related to the fact that I was running Folding on those cards earlier. I always allow the Folding work unit to finish, and then delete the Folding slot, which should clean everything out.

Success! I uninstalled the Folding control program entirely and rebooted. Now I can run two GTX 960s with two FGRPopenCl work units per card, or a total of four at once. I don't know whether it was because Folding was leaving the leftovers of a finished work unit around, or because the Folding program itself uses GPU memory or what, but I won't try to use it with a BOINC GPU project again (Folding on a GPU works OK with BOINC CPU projects however).

However, running two work units on my GTX 960s takes about 1170 seconds, or 585 seconds per work unit. That is really not worth the trouble, considering that it uses two CPU cores per card (one per work unit) at the default settings. I could reduce it to one CPU core per card, but that would push the time up, and it is only 600 seconds when running a single work unit (this is with an i7-4790 on a Z97 motherboard). So I think a single work unit pushes the memory bandwidth of the GTX 960 as far as it will go, and I will stick with one work unit per card, but at least I can now run two cards at once. Thanks for the comments.

choks

Joined: 24 Feb 05

Posts: 16

Credit: 152149947

RAC: 134816

Running two WU per GPU for

10 Dec 2016 18:22:48 UTC

Message 152643

(moderation:

)

Running two WU per GPU for FGRPopencl does not bring much difference, because:

- WU setup is done with the CPU and only takes 10 seconds,

- only about 20 ms is spent in the CPU to prepare the next data set. That's only 0.720 seconds of CPU time for a whole WU

- data resampling & sorting, FFT, threshold on results is performed by the GPU (no FFT data transfer at all from CPU <-> GPU)

- most of the final stage is performed in the GPU (if FP64 support)

The GPU workload is spread across resampling and FFT:

- FFT is about 2/3 of total time spent and only depends on GPU memory speed.

- data resampling is about the other 1/3, and mainly depends on raw float specs in GFlops/s.

One trick to reduce power consumption and increase power efficiency: I have an R9 fury, and if I let it run at stock speed (1050 MHz) my machine uses 360 Watts at the wall (gets pretty hot and noisy)

If I downscale the GPU to 717Mhz, power consumption peaks at 220W, with about 20% less BOINC credits a day. At 717Mhz my PC stays cool and extremely quiet.

For Linux user, with sudo:

echo manual > /sys/class/drm/cardX/device/power_dpm_force_performance_level echo 2 > /sys/class/drm/cardX/device/pp_dpm_sclk

where X is your card number. (For Nvidia users, check the names under devices - might be different)

Even if I run the GPU at 974MHz, instead of 1050, power decrease by 60 Watts (~15 LEDs) without any noticeable drop in WU/day.

Kailee71

Joined: 22 Nov 16

Posts: 35

Credit: 42623563

RAC: 0

Wow great info!Any idea

11 Dec 2016 11:47:21 UTC

Message 152648 in response to message 152643

(moderation:

)

Wow great info - thank you for that!

Any idea when my Mac with R9 280x might start getting GPU WUs? It's just been churning CPU jobs at the for the past day or so, whereas another Mac with a GTX 580 is happily crunching GPU jobs...

Thanks again,

Kailee.

Edit: I thought something fishy was going on and indeed, checking with XRG rather than istatmenu, my R9 280x is showing GPU activity (and probably has been active all this time). Getting runtimes of ~240 (s) for a single WU running, and 380 (s) for two parallel tasks (so ~190 each). Anyone have an idea why istatmenu doesn't show AMD GPU activity proberly, when it does for nvidia?

Jeroen

Joined: 25 Nov 05

Posts: 379

Credit: 740030628

RAC: 0

I pulled out my two 7970

11 Dec 2016 3:20:50 UTC

Message 152649

(moderation:

)

I installed two of my 7970 cards this evening. The run time per task is approximately 199 seconds each. That is quite the improvement over running tasks via the FGRP CPU application.

I have not looked at the AMD Linux drivers in quite a while but I was surprised to see that the latest drivers supporting this GPU model are about a year old which in turn required going back to an older kernel and xorg version.

Petec888

Joined: 3 Oct 06

Posts: 33

Credit: 1978975321

RAC: 0

Something weird is going on,

11 Dec 2016 3:06:22 UTC

Message 152650

(moderation:

)

Something weird is going on, I haven't gotten any Einstein GPU tasks on my Macs with AMD cards in a long time. Might be concurrent with going to macOS Sierra, but I'm not getting any GPU tasks on a Mac Pro running El Capitan with a R9 280x either. I seem to remember getting a bunch of them a while back and having computation errors after only a few seconds.

AgentB

Joined: 17 Mar 12

Posts: 915

Credit: 513211304

RAC: 0

Jeroen_9 wrote:I have not

11 Dec 2016 9:06:59 UTC

Message 152652 in response to message 152649

(moderation:

)

Jeroen_9 wrote:

I have not looked at the AMD Linux drivers in quite a while but I was surprised to see that the latest drivers supporting this GPU model are about a year old which in turn required going back to an older kernel and xorg version.

AMD have just released 16.50.362463 of AMDGPU (this week) and according to Phoronix

GCN 1.0 / Southern Islands Support: With Linux 4.9 the AMDGPU DRM driver added experimental GCN 1.0 / Southern Islands support, which is disabled by default at the kernel's build time. With the AMDGPU-PRO 16.50 release, its DKMS kernel module ships with the GCN 1.0 support enabled along with a supported user-space stack...

Unfortunately, it wasn't stable at all. I was unable to get any GCN 1.0 benchmarks to share today as it was consistently hanging while running graphics tests, sometimes even when just launching Steam. But when popping in newer GCN hardware, those problems didn't appear atop this AMDGPU-PRO stack. So while the GCN 1.0 support is there for AMDGPU-PRO, your mileage may vary upon how usable it is.

I may wait a few days and give it a try on my RX-480 and update https://einsteinathome.org/content/ubuntu-1604-lts-deprecating-amds-fglrx-catalyst-replacing-amd-gpupro

Gamma-ray pulsar binary search #1 on GPUs

Forums › Technical News

Comment viewing options

Forums › Technical News