Gravitational Wave Engineering run on LIGO O1 Open Data

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3232287015
RAC: 122892

Richie wrote:A quick note

Richie wrote:

A quick note about total wattage. I don't have a Kill A Watt type of tool, but three of my hosts get their AC from a power conditioner. It shows amperes.

I don't see individual readings but I know how much current these hosts require in total while crunching nothing else than 2x FGRPB1G per host (1 CPU + 0.5 GPU per task). 1 Nvidia and 2 AMD GPUs involved.

The same triplet crunching these GW GPU v0.11 tasks 4x per host (1 CPU + 0.25 GPU per task) requires pretty much identical amount of juice. I would say the difference in wattage from the wall is max 50W total. That can't be much per host. If there's a small difference between total load then I would say this GW task scenario takes the lower end on that short scale.

 

Richie, I believe it was this thread where another mentioned issues with multiple tasks with Windows/AMD cards running really slow.

https://einsteinathome.org/content/rx-480-running-slow-why

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

A fresh comparison... with

A fresh comparison... with some new results. RX cards are able to run 5x ! 

RUN TIMES PER TASK:

host# , GPU , concurrency and time (sec)

1. R9 390 ... 4x 1667 ... 3x 1899 ... 2x 2579
2. RX580 ... 5x 1792 ... 4x 1775
3. RX580 ... 4x 1586
4. RX 570 ... 5x 1510 ... 4x 1620
5. R9 270X ... 5x 1693 ... 4x 1805

 

Looks like going from 4x to 5x can still give some additional output but it's small. For the slower RX 580 host there's no boost at all or it's even slightly negative. Weird machine.

Additional info about hosts and GPU clock speeds behind those results:

1.
R9 390 (MSI Gaming 8GB), Xeon X56.. @ 4GHz, Windows 10 (18362)
GPU @ 960 MHz (stock 1040), mem 1500 MHz, no power limiting, driver 19.4.1
* GPU load about: 2x ... 50 % , 3x ... 65 % , 4x ... 70 %

2.
RX 580 (MSI Gaming X+ 8G), Xeon X56.. @ 4GHz, Windows 10 (17763)
GPU 1431 MHz, mem 2000 MHz, no power limiting, driver 19.4.1
* GPU load avg: 4x ... 53 % , 5x ... 53 %
* GPU mem controller load avg: 4x ... 8 % , 5x ... 8 %
* GPU only power draw avg: 4x ... 64 W , 5x ... 65 W

3.
RX 580 (MSI Gaming X 8G), Xeon X56.. @ 4.07 GHz, Windows 10 (18775)
GPU 1380 MHz, mem 2000 MHz, no power limiting, driver 19.4.1
* GPU load avg: 4x ... 55 %
* GPU mem controller load avg: 4x ... 9 %
* GPU only power draw avg: 4x ... 75 W

4.
RX 570 (Asus Expedition 4GB), Xeon X56.. @ 4GHz, Windows 10 (18362)
GPU 1228 MHz, mem 2088 MHz, no power limiting, vddc offset -12mV, driver 19.4.1
* GPU load avg: 4x ... 63 % , 5x ... 68 %
* GPU mem controller load avg: 4x ... 9 % , 5x ... 9 %
* GPU only power draw avg: 4x ... 74 W , 5x ... 76 W

5.
R9 270X (Gigabyte 4 GB), Xeon X56.. @ 4GHz, Windows 10 (18875)
GPU @ 1000 MHz (stock 1100), mem 1400 MHz, power limit -20 %, driver 19.4.1
* GPU load avg: 4x ... 69 % , 5x ... 74 %

 

GPU temps remain low in every host and configuration.

That R9 270X is ridiculously fast ! A cheap black sheep ! Works like an atom clock with 5x too. Looking at monitoring software while running... the GPU load is a straight line. That's not the case at all with those other cards.

These tasks have sort of slow start. After the first couple of minutes progress will reset to zero and then the main GPU part will start. Also if tasks are suspended and then crunching is resumed it will take quite some time until progress will continue. Also there's the last computation part kicking in at 99 %. It takes a few minutes until progress jumps straight from 99% to 100%.

I wrote down some "run times" of those special phases. These hosts @ 4 GHz, running 4x or 5x, that starting phase will take about 165 - 190 seconds. Progress is at 5-6 % when it resets. Times for this starting phase don't change much within a host.
The ending phase (99-100 %) takes about 280 - 303 seconds.

I like micromanaging and that's why I have recommended to myself to set at least 10 minutes intervals between each task. That way there will never be more than one task at a time in these special phases in total. I believe this separation will provide the most constant flow for the GPU. I haven't tested this theory. It may well be that misaligning tasks this way won't have any effect on the task run times.

 


I gave up with trying more than 4x on the MSI R9 390. I knowit did run 6x once, but I'm not able to reproduce that thing. Then even 5x hasn't worked anymore. Last time I tried 5x tasks run well for some time, but then somewhere in the middle of progress three of the five tasks had stalled. No progress was happening for them anymore. Two tasks had still constant progress. I suspended work, restarted Boinc and resumed computation. All five tasks continued running, but one crashed right away. Error said something about output files missing. I think at least two of the tasks did eventually complete, but run times were very slow.
I decided that earlier success with this host should be considered as weird anomaly and deserves not to be taken into account. I'm not able to reproduce it. This host has been running 4x now. It's happily doing that (and it's okay with 3x and 2x too).


 

mmonnin wrote:

Richie, I believe it was this thread where another mentioned issues with multiple tasks with Windows/AMD cards running really slow.

https://einsteinathome.org/content/rx-480-running-slow-why

Thanks again! I faintly remember reading about problems with RX cards running multiple tasks. That's why I avoided them this long. Other reason is they were too expensive back then.
There were interesting observations and suggestions in that thread. I tried to install the Intel Driver & Support Assistant. It wanted to scan hardware (through browser) but gave instantly an error saying approx "Oops... something went wrong". I gave up with that, but I'm still working to get this host crunching these tasks properly. Maybe I'll update later if I succeed. I don't want to disturb people too much with this R9 390 stuff anymore. This GPU tends to run hot and has high TDP. I'm sure most of these 390's on this planet have already broken and the remaining specimens will quickly face extinction. So, this GPU is becoming less and less interesting anyway.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

There's this one little thing

There's this one little thing that should be taken into account while I rant about these cards.

Three computers are running Nvidia cards. Currently 54 % of these GW GPU results from those cards are validated. 0.000% are invalid.

Five computers are running AMD GPUs. How is validation going for them. Well... 3 % validated, 3 % invalid and all the rest are pending. It could be that these AMD GPUs produce mostly crap at this point. Then these run time comparisons might not have much information value yet.

But I looked at the host that has most of the invalids so far (host #3 in my last post). I got an idea. Maybe lower CPU clock would help it to produce less invalids. At the same time I could also test how much run times would change if I changed only CPU clock speed but nothing else.

I entered BIOS and changed CPU clock multiplier from 22 to 16. Everything else (bus, ram speed etc.) I left intact. That lead to CPU clock dropping from 4.07 GHz to 2.96 GHz.

 

 

Here are benchmarks for both configurations:

4070 MHz

Boinc benchmark , 4/12 cpu's available for Boinc 4947 / 12113

CPU-Z , benchmark v17.01.64 single thread: 363.2 multi thread (use 4/12 threads): 1463.1

2960 MHz

Boinc benchmark , 4/12 cpu's available for Boinc 3614 / 8817

CPU-Z , benchmark v17.01.64 single thread: 263.6 multi thread (use 4/12 threads): 1058.4

 

Run times per task... for both configurations:

4x (4070 MHz) : 1586 sec

4x (2960 MHz) : 1871 sec

Reduction to CPU clock speed is 27 % but increase to run time per task is 18 %

* power draw for 4x is 0.1 A lower with 2960 MHz than with 4070 MHz - 0.1 A step on the display is round in some way, but here is 230 V grid so it must mean something between 12 - 34 W

EDIT: I had written nonsense about reduction per task time... but it's fixed now. There is of course increase to task times, not reduction.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 109

Is there a way to opt out of

Is there a way to opt out of the GPU tasks from the engineering run until they're able to perform better while still running CPU work from it?

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

DanNeely wrote:Is there a way

DanNeely wrote:
Is there a way to opt out of the GPU tasks from the engineering run until they're able to perform better while still running CPU work from it?

Hi! It should work if you set project preference settings like this::

'ON' for CPU and 'OFF' for all GPUs (AMD , Nvidia , Intel).
'YES' to Beta... "Run test applications".
Allow application "Gravitational Wave Engineering run on LIGO O1 Open Data".
'NO' to "Allow non-preferred apps". 

Then just match the chosen 'preference set' with your host 'location'.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110034483368
RAC: 22390440

Richie wrote:DanNeely

Richie wrote:
DanNeely wrote:
Is there a way to opt out of the GPU tasks from the engineering run until they're able to perform better while still running CPU work from it?

'ON' for CPU and 'OFF' for all GPUs (AMD , Nvidia , Intel).

I suspect Dan would just want to exclude O1OD1E GPU tasks and not FGRPB1G tasks as well.  Your suggestion excludes all types of GPU crunching.  Off the top of my head (I've never tried it) a possible way would be to use the app_config.xml mechanism and use both the name and plan class tags to identify just the GPU version.  Perhaps setting the cpu_usage and gpu_usage (or maybe the max_concurrent) for that combination to zero might effectively exclude those tasks without affecting anything else.  It would be worth experimenting.

 

Cheers,
Gary.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Gary Roberts wrote:I suspect

Gary Roberts wrote:
I suspect Dan would just want to exclude O1OD1E GPU tasks and not FGRPB1G tasks as well.

That may well be. I was thinking the easiest scenario Embarassed Plan class stuff is something I've never dealt with...

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

I was still wondering how

I was still wondering how strict the relation between CPU or GPU clock speeds and run times currently are.
A couple more of downclocking experiments (crunching 4x, app v0.11)...

R9 390 host:
27 % reduction to CPU clock speed (from 4.00 to 2.91 GHz)
lead to 14 % increase in run time per task.

RX 580 host:
28 % reduction to CPU clock speed (from 4.00 to 2.86 GHz) together with
23 % reduction to GPU clock speed (from 1431 to 1100 MHz)
lead to 28% increase in run time per task.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110034483368
RAC: 22390440

If you want to know how

If you want to know how sensitive something is to a change in operating conditions, don't change more than one thing at a time :-).  Also, if you make a substantial change, don't just assume the change will be linear.  You need a series of smaller changes to show the true relationship :-)  Slow and steady will ultimately save time (and perhaps erroneous conclusions) in the end :-).

Don't take this as any sort of criticism.  I'm extremely supportive and happy that you are willing to test your systems and report your findings like you are doing.  It's great to get the information.  Thank you for doing it.

 

Cheers,
Gary.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

I've been perhaps a bit manic

I've been perhaps a bit manic for the couple of days with these machines around, but I don't have the energy to get enough test results for a proper graph at this point Laughing Maybe later... if or when this app is announced to be close to being the final version. I just wanted to get a couple of snapshots with practical amount of change on the clock speeds. Those first results showed me pretty much what I was looking, for now.

But there was still room to set CPU multiplier lower. I set that 390 host to 2.18 GHz (minimum possible, without touching anything else) which is 46 % less than at start. This will give me already a curve... a rough one, but it will do Tongue Out

EDIT: 46 % lower CPU clock speed resulted in 38 % longer run times. Stronger effect already.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.