GPU crashes with multiple tasks

cecht
cecht
Joined: 7 Mar 18
Posts: 1537
Credit: 2914898643
RAC: 2130107

Zalster wrote:This will run 4

Zalster wrote:

This will run 4 CPU work units and 1 GPU work unit. If you want to run more than 1 GPU work unit, then you need to change the value of 1 to 0.5 in the <gpu_usage>1</gpu_usage> section and reduce the number of CPU work units from 4 to 3 in the CPU section under the <max_concurrent>4</max_concurrent>.  

The project max concurrent will limit the total amount of any work units to 5

Ah, now I get it. Thanks for laying out how to use max_concurrent with gpu_usage!

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

cecht wrote: Ah, now I get

cecht wrote:

Ah, now I get it. Thanks for laying out how to use max_concurrent with gpu_usage!

There is <max_concurrent> which needs to be between the <app> and </app> This will limit how many works will run on that specific application be it GPU or CPU

Then there is <project_max_concurrent> which is outside of <app></app> but is before </app_config> This will limit the total of either CPU and GPU running at any time.

 

 

cecht
cecht
Joined: 7 Mar 18
Posts: 1537
Credit: 2914898643
RAC: 2130107

Zalster wrote:This will run

Zalster wrote:

There is <max_concurrent> which needs to be between the <app> and </app> This will limit how many works will run on that specific application be it GPU or CPU

Then there is <project_max_concurrent> which is outside of <app></app> but is before </app_config> This will limit the total of either CPU and GPU running at any time.

Got it! Thanks much for the app_config crash course!

Ideas are not fixed, nor should they be; we live in model-dependent reality.

cecht
cecht
Joined: 7 Mar 18
Posts: 1537
Credit: 2914898643
RAC: 2130107

Gary Roberts wrote:You have

Gary Roberts wrote:
You have to be a bit careful with the (essentially) three ways of changing things.  If you have ever used local preferences, your website preferences for compute stuff will be ignored.  If you want to revert to website prefs you have to open the local prefs window in BOINC Manager and at the top you will see a warning and a button to click to remove the local prefs and go back to website prefs.

Thanks much for the insight on web preferences vs. local preferences. I did finally manage to run experiments quickly with on-th-fly edits of app_config.xml. And yes, after going down a preferences rabbit hole, I've given up on web preferences for all my hosts.

Gary Roberts wrote:
I think your issue is something other than the number of CPU tasks, but it would be a useful data point to see if 2 GPU tasks will run with no CPU tasks running at the time.

You're right about CPU usage not being the issue. I got the same results whether running 0.4, 1, or 2(!) CPU per GPU task. And it didn't matter whether my 6 cores were running 0 or 4 CPU-only tasks (Continuous Gravitational Wave Search O2).

In every CPU usage configuration, whenever I got simultaneous GPU tasks running (app_config gpu_usage =0.5), the GPU would crash, reset, and BOINC Manager would stop progressing the FGRPB1G tasks. BM would still be "running" the elapse time for the tasks, but task progress froze. During task freezes, the GPU usage(load) would be at 100%, or occasionally 0%, but in either case the GPU and memory clocks were at resting rates according to GPU-Z. The GPU worked fine only with 1 task per GPU.

All these experimental runs were without MSI Afterburner running.

When running a single GPU task (gpu_usage = 1), my average measured GPU load over several minutes varies between 85% and 97%, perhaps depending on the stage of task computation. That doesn't seem to leave much room to cram in another task, in which case a GPU crash would be expected. Or does it not work that way? My card has 2GB memory, and ~700MB (frame rate) are used during a run, so card memory doesn't seem to be limiting, if that's even a factor for FGRPB1G tasks.

So I guess that leaves the problem with either the card, the driver, or Windows 10 Pro, or some combination thereof?  The host is a Dell Precision T3500 host. Is it a motherboard limitation?  I have 6 GB of system memory, so that's not limiting. What about using a different BIOS for the card?  Ack! Now I'm talking crazy talk.

But since BOINC Manager merrily tools along with tasks that the GPU has either choked on or given up on, it seems like a communication issue between the app and the card, no? Maybe?

I'll eventually be moving this AMD card into a host I have at home that's running a NVIDIA GTX 750 (woo-hoo! dual GPUs!), so I'm not going to put much more effort into squeezing more performance out of this current host.  I'm having no joy running simultaneous GPU tasks on the NVIDIA card at home either, but haven't yet tried my new found app_config skills on it. My host at home is also a Windows machine, but I'm really tempted to switch it to a LINUX system.

Again, thank you all for your knowledge and patience.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117777741970
RAC: 34749922

cecht wrote:When running a

cecht wrote:
When running a single GPU task (gpu_usage = 1), my average measured GPU load over several minutes varies between 85% and 97%, perhaps depending on the stage of task computation. That doesn't seem to leave much room to cram in another task, in which case a GPU crash would be expected. Or does it not work that way? My card has 2GB memory, and ~700MB (frame rate) are used during a run, so card memory doesn't seem to be limiting, if that's even a factor for FGRPB1G tasks.

A couple of points to mention about this.  You're not really 'cramming in another task', nor does there seem to be any risk of this leading to a crash.  My (very imperfect) understanding is that you are just running one task at a time and rapidly switching between them to take advantage of lots of tiny snippets of time when the GPU would otherwise be idle with the current task.  Don't get hung up on GPU load.  Again my very imperfect understanding is that the 'granularity' of those measurements doesn't really allow you to see the true picture of what is really going on.  The one useful 'rule of thumb' is that you need to budget above 750MB of VRAM for each concurrent FGRPB1G GPU task.  For a 2GB card, 2 tasks is the limit.  This also explains why you won't be able to run more than 1 task on your 1GB GTX750.

In my case with RX 460s, a single task with current data takes around 24 mins (from rough memory) and running 2x give two tasks in about 40 mins approximately.  That's somewhere around 20% performance improvement for running 2x..  These are just rough numbers because there are lots of subtle (and not so subtle) variations in crunch time.  The subtle ones seem to be related to the pulsar spin frequency being tested and the not so subtle ones with the particular data file and/or command-line options being fed to the search app.

For your situation, either you just accept running 1x or you make some attempt to find another volunteer running on a Polaris GPU under Windows and see if they have the same problem if they switch to 2x.  Along those lines, there is a quite recent report that is for an RX 480 but the symptoms seem very similar.  That thread refers to a much earlier one (which I had quite forgotten about until I followed the link just now) also related to RX 480s that do run 2x but at a very slow rate.  Perhaps checking through those comments, where quite a few people contributed, might give some additional insight. I see there was also a comment there about the same sort of issue with Fiji series GPUs.

Quote:
So I guess that leaves the problem with either the card, the driver, or Windows 10 Pro, or some combination thereof?

I would add the app itself to that list and not so much suspect the card.  If the card is faulty, I imagine it wouldn't work correctly even with just one task.  I'm not a programmer, but it sort of just feels like a software issue more so than a hardware one.  If I had to guess, I'd suspect the driver.  You could try different driver versions or running under Windows 7 rather than 10 or (if you have Linux skills) you could try running it under Linux where I've never had a problem running your particular model at 2x.

Quote:
The host is a Dell Precision T3500 host. Is it a motherboard limitation?  I have 6 GB of system memory, so that's not limiting. What about using a different BIOS for the card?  Ack! Now I'm talking crazy talk.

Whilst I have flashed the motherboard BIOS on many boards (Asus, Gigabyte, Asrock - no experience with Dells) over the years, I have no knowledge about GPU firmware.  I guess it is possible to be a GPU firmware related problem.

 

Cheers,
Gary.

cecht
cecht
Joined: 7 Mar 18
Posts: 1537
Credit: 2914898643
RAC: 2130107

Gary Roberts wrote:The one

Gary Roberts wrote:
The one useful 'rule of thumb' is that you need to budget above 750MB of VRAM for each concurrent FGRPB1G GPU task.  For a 2GB card, 2 tasks is the limit.  This also explains why you won't be able to run more than 1 task on your 1GB GTX750.

That's really good to know, as is the rest of your explanation about how GPUs handle multiple tasks. Now I know why I wasn't able to run x2 tasks on my 1GB GTX750 (when I tried, each new task that was added soon ended with a computation error.)

Given everything I've tried, and the information in the discussions you linked above about  others' RX4xx issues, I'm going to stick to running single tasks on this Windows host, because it really seems to be a Windows issue.  I'll wait and work on getting more out of my RX 460 once I take it home and convert that host to a Linux machine.  Something to look forward to!

'Till next time,

Craig

Ideas are not fixed, nor should they be; we live in model-dependent reality.

cecht
cecht
Joined: 7 Mar 18
Posts: 1537
Credit: 2914898643
RAC: 2130107

I got my RX 460 running x2

I got my RX 460 running x2 tasks! But I had to move the card to my Windows 7 host  to do it. Okay, so there are three changed variables, the host architecture, the operating system, and the AMD drivers, but in general it is possible to run multiple tasks with a RX460 (RX4xx?) on "Windows".

I'll check on the average completion times tomorrow, but after a few runs it looks like about a 10% performance gain for dual vs single GPU tasks (again, not factoring in variables).

The most noticeable difference is with the card specs provided by GPU-Z; DirectCompute is not enabled on the Win7 system. Also, several of the Win7 card stats are missing (see pics below) and neither GPU-Z or SVI64 can read the GPU or memory frequencies or GPU temperature. I'm flying blind there. Oh, and the AMD setting says it can't do anything with the card because a display is not plugged into it; that wasn't an issue with AMD settings under Win10.

Card specs on Win10:

https://ibb.co/dJzSKy

Card specs on Win7:

https://ibb.co/jjg7Ky

I'm just happy that it's working, but any thoughts on what the critical factor might have been?

This Win7 fix is on a temporary host at work, so I'll have to see what happens when I eventually bring the card home and put it in my Win10 host there. 

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117777741970
RAC: 34749922

cecht wrote:.... any thoughts

cecht wrote:
.... any thoughts on what the critical factor might have been?

I know nothing about Windows.  Maybe someone who does might chime in with advice.

From your images, the Win10 driver is newer (18.6.1) and "Beta" as opposed to the Win7 version (18.5.1) which is "WHQL".  It wouldn't be the first time that there was a regression in a newer driver.  I'd be trying to get 18.5.1 for Win10 or at least looking for something newer again where hopefully the regression (if that's what it is) has been fixed.

 

Cheers,
Gary.

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3426116540
RAC: 3860884

I run my 580 in Win 7 with 2x

I run my 580 in Win 7 with 2x tasks at once. 3x ends up making the run times sky rocket for E@H and other projects as well. That has been reported here a long time ago.

cecht
cecht
Joined: 7 Mar 18
Posts: 1537
Credit: 2914898643
RAC: 2130107

Update: I got the Rx460 card

Update: I got the Rx460 card on the Win7 machine to report its sensors readings, report all stats, and connect with AMD Settings by simply briefly hooking it up to the display.  The card also now has Direct Compute 5.0 enabled, and is still ticking along with dual tasks (without the display plugged into it). Live and learn!

The average time (n=10) on the Win10 Dell was 1497 sec for single-run FGRP tasks, and on the Win7 HP was 1328 sec, for a 11% improvement in run time. I haven't tried overclocking.

mmonnin wrote:
I run my 580 in Win 7 with 2x tasks at once. 3x ends up making the run times sky rocket for E@H and other projects as well. That has been reported here a long time ago.

I had poked around in the Community posts, but clearly need to sharpen by search skills. Is there a page that lists E@H search operators and syntax? The simple searches I've done come up with a lot of irrelevant hits and my eyes go square.

Gary Roberts wrote:
I'd be trying to get 18.5.1 for Win10 or at least looking for something newer again where hopefully the regression (if that's what it is) has been fixed.

Will do!

 

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.