Observations on FGRBP1 1.18 for Windows

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

I know you all don't like

I know you all don't like using app xml files but you can limit how many GPU or CPU work units are run at any time by the project if you create app_config.xml and place <max_concurrent> </max_concurrent> in each of the different type of work units. That would prevent the system from using all of the CPU or GPU for which ever projects you want to run.

Example, if you wanted project x to run 4 instead of 7 CPU work units, you could place a <project_max_concurrent>4<project_max_concurrent> into that projects file in a app_config.xml

The same could be done for the project that uses the GPU

If you wanted, say Einstein, to use only 4 CPU for 4 CPU work units and 2 CPU for 2 GPU work units then you could place <max_concurrent>4<max_concurrent> and <max_concurrent>2<max_current> into the app_config.xml in the section for each type of work unit.  TL4 showed an example of this in another thread and it's a way to get the projects to follow what you want them to do.

I've used something like that in the past while running GW on the CPU and BRP on the GPUs in the past.

my 2 cents
Zalster

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023494931
RAC: 1808886

Zalster wrote: you can limit

Zalster wrote:
you can limit how many GPU or CPU work units are run at any time by the project if you create app_config.xml and place <max_concurrent> </max_concurrent> in each of the different type of work units.

We've noticed that while the current Windows Nvidia 1.18 application nominally keeps nearly a full core "busy" that the performance difference made by a big range in actual CPU capability is little.  People have suggested that nearly all the time the CPU task is actually just "spin-waiting" in a scheme which reduces latency in response to service requests, but is not doing much computing.  Yet the project distributes these as tasks requiring a full CPU core, with the crowding out effects out quantum mechanic and others have complained about, and also that some of us are probably running at 2X when we might more effectively run at 3X or even higher.

So a question for you Zalster, or for others:

Might it be that one could usefully instruct the scheduler to start more tasks than the project-distributed characterics allow by using the app_config.xml mechanism to allow an abnormally high max-concurrent?  Or would that just get ignored.

Just to use a concrete example: I am typing on a machine which has a modern, fast, 4-physical core non-HT CPU.  It also has two Pascal GPU cards, a 1070 and a 1060.  I run them on 1.18 at 2X.  I believe if I requested 3X I would not get it, as the scheduler would consider that it had been instructed that would require 6 cores.  Is there a way to use the app_config.xml mechanism to persuade the scheduler to start 3 tasks for each GPU in this case?

Possible is not the same question as wise or fruitful.  However the modest dependence of GPU task elapsed time on CPU capability leaves me supposing this might not work so very badly.

A related possible example on the same system would be whether one could use the mechanism to convince the scheduler that support of four GPU tasks (two cards at 2X) would only burn up two cores, and thus leave it willing to start two one or two CPU tasks.

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

In short no..The scheduler

In short no..

The scheduler is looking at how many cores you have. Now if you did hyperthread your chip, then it would allow you to start up a 3rd GPU work unit on each of those cards as it would now see 8 cores.

I run 3 work units on each of my GPUs as I have the chips in all my computers hyperthreaded. 

In order to get the applications to use less CPU per work units would require a more efficient OpenCL app. Yes you can specify only 0.5 in both app_config.xml or app_info.xml or website settings. But consider those GUIDELINES.  You now see that the application will use almost a full CPU per GPU work unit. This is because of how the app is designed. 

We reference Seti's OpenCl MB app a lot because it too was doing the same thing. However, we are fortunate that the creator of the OpenCl at Seti has a great knowledge of OpenCl  was able to tweak it for better performance and decrease the CPU usage (full core down to 18%, times to complete 1 hour down to 14 minutes). Without that tweaking, I don't believe it's possible to reduce CPU consumption by the app.  A lot of it has to do with how the app allocates calls to the Kernal. I know some of the terminology but we are getting above my ability to explain it. I can reference the designer's notes if you want to review them. Basically he increase the chucks of data per cycle, decreased the kernal calls and did more on the GPU before calling on the CPU for final processing.

Zalster

 

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 1580

I have 2 vs 3 at a time

I have 2 vs 3 at a time results in for my W10 i7-4970k + GTX980 system.  For whatever reason (slower GPU, less extraneous CPU load due to not running substantial usermode apps, something else?) going from 2 to 3 at a time only gained 2% more credit, vs 6% my 4790k/1080 box.  Under the circumstances I saw no reason to do a 4x test on this box.

 


1x   1109s    269k/day    81% CPU load

2x   1520s    394k/day    98% CPU load

3x   2225s    403k/day    98% CPU load

 

 

Darrell
Darrell
Joined: 3 Apr 05
Posts: 12
Credit: 379479578
RAC: 769552

Actually, you -can- tell the

Actually, you -can- tell the scheduler to treat your system as having more CPUs than it actually has.  See the Client configuration description at http://boinc.berkeley.edu/wiki/Client_configuration and look for <ncpus>N</ncpus> for details.

Thus with a 4-core non-hyper threading system, setting it at 6 would allow 3 WU/card support by 2 physical CPU/card.  Each task in a "spin loop" would burn its timeslice, then the next task would burn its timeslice, then ... until a GPU needed support whereupon the task would process to support the GPU again during the task's next timeslice.  This -assumes- all the tasks have the same priority.

 

Darrell
Darrell
Joined: 3 Apr 05
Posts: 12
Credit: 379479578
RAC: 769552

Another means to achieve 3

Another means to achieve 3 CPU intensive tasks on a graphics card is to tell the scheduler that each WU uses only 0.66 CPU and 0.33 GPU, even though they actually will use [waste in a spin loop] whatever CPU is available to each.  Then the scheduler will start 6 tasks using 6*0.66=4 CPUs, and 6*0.33=2 graphic cards [computations rounded up].

 

denjoR
denjoR
Joined: 9 Apr 08
Posts: 4
Credit: 139110089
RAC: 0

hello @ alli have problem

hello @ all

i have problem to run more then 2 wus on my rx 480 :( 

on 3 or 4 wus gpu-z shows 100% gpu load but the wus will  never be finisched because the crunching progress is extremly slow. the crad stays @ 100 gpu load but nothing happend :/

RX480 @ 1300 MHz / 2100MHz 

2x   960s    623k/day    88% GPU load

CPU load extreme low

so if anyone knows a solution to crunch more than 2 wus on the amd card tell me how ^^

 

 
archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023494931
RAC: 1808886

I had the idea that I could

I had the idea that I could quickly and temporarily explore the behavior when the support CPU applications were "crowded" into core sharing by using an affinity adjustment.  Rather than doing this by Process Lasso, which would give a durable change applying to later newly started instances of the same application name, I intervened directly in Process Explorer.

I quickly terminated the experiment without a useful answer.  First, I'd need to disable a current setting for that application in Process Lasso, else the Prccess Explorer specified change does not last.  But more disturbingly, I got my first computation error in days on this host.

Above and beyond the possibly greater sensitivity to some clock rates on some cards on some hosts we think we have seen, this application currently seems quite fragile with respect to such interventions as task suspend.  While I am quite ignorant of the code involved, the behavior has me wondering if there might be one or more portion of the code which needs to be protected as a critical section which is not so protected. 

Even if that is true, it may well be that our best hope for improvement in this and some other things is the possible prospect of a CUDA build.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109392086751
RAC: 35877905

denjoR wrote:i have problem

denjoR wrote:
i have problem to run more then 2 wus on my rx 480 :(

You should be able to crunch 3 or 4 tasks concurrently.

If you browse through your tasks list on the website, using the various categories (pending, valid, invalid, error) you will see other issues as well.  You already have a large number of validate errors which suggests you are trying to operate the card beyond safe clock speeds.  You should be aware that crunching puts considerable stress on your GPU and you need to be very careful with overclocking.  If you return the card to stock speeds, you may find the problems go away and you can run higher numbers of concurrent tasks.  There is probably little gain (and even potential loss) in going beyond x3.

The other 'problem' is the number of error tasks --  >200 on a brief check.  Most (perhaps all - I didn't check) seem to be aborted CPU tasks.  If you don't want to crunch CPU tasks (a perfectly natural course of action) why don't you just use your preferences to disable them in the first place?

I don't know if it's just clock speed that's preventing higher than x2.  I'm running a HD7950 at x3 with no issues so it's certainly possible.  I'm using Linux and the fglrx proprietary driver so it's a different situation to yours.  Please be aware that most of the benefit of concurrent tasks comes from just x2.  After that, it's usually a case of rapidly diminishing (or even negative) returns and the possibility of higher error rates.

Good luck with getting your card working reliably :-).

 

Cheers,
Gary.

denjoR
denjoR
Joined: 9 Apr 08
Posts: 4
Credit: 139110089
RAC: 0

yeah tested 1330 MHz @ 1.080v

yeah tested 1330 MHz @ 1.080v and 2150 MHz vram what causesd memory failures and was not at home for 12 hours ;) so i switched back to 1300MHz @ 1.05v and 2133 MHz but that has nothing to do with the other problem ^^

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.