FGRPB1G: CPU usage factor

Mr Anderson

Joined: 28 Oct 17

Posts: 45

Credit: 154618533

RAC: 44565

25 Nov 2018 21:49:26 UTC

Topic 217027

(moderation:

)

The FGRPB1G jobs that run on my machine show up in BOINC as "1 CPU + 1 AMD/ATI GPU" in the status column and it seems that the CPU usage limits in BOINC treat these tasks as if they are really using 100% of 1 CPU. However from looking in the Windows Task Manager and at the CPU vs Run time stats in my tasks list it is more like 10% (in my case at least) so I was wondering why. Is it a GPU/CPU architecture/OS thing?

An example of such a task is WU 379523608 (interestingly the other computer that computed this task did need much more CPU time).

I might be wrong but I also suspect this skews the resource sharing when crunching for other projects which don't have any GPU computing. In my case I'm also running SixTrack tasks for LHC but it seems to me that they are running more often than expected (thus blocking the Einstein CPU only tasks more than expected) and perhaps that is because the FGRPB1G jobs don't stop running and these are counted as full CPU tasks?

mikey

Joined: 22 Jan 05

Posts: 12945

Credit: 1884483953

RAC: 27552

Mr Anderson wrote:The FGRPB1G

27 Nov 2018 11:51:38 UTC

Message 167915

(moderation:

)

Mr Anderson wrote:

The FGRPB1G jobs that run on my machine show up in BOINC as "1 CPU + 1 AMD/ATI GPU" in the status column and it seems that the CPU usage limits in BOINC treat these tasks as if they are really using 100% of 1 CPU. However from looking in the Windows Task Manager and at the CPU vs Run time stats in my tasks list it is more like 10% (in my case at least) so I was wondering why. Is it a GPU/CPU architecture/OS thing?

An example of such a task is WU 379523608 (interestingly the other computer that computed this task did need much more CPU time).

I might be wrong but I also suspect this skews the resource sharing when crunching for other projects which don't have any GPU computing. In my case I'm also running SixTrack tasks for LHC but it seems to me that they are running more often than expected (thus blocking the Einstein CPU only tasks more than expected) and perhaps that is because the FGRPB1G jobs don't stop running and these are counted as full CPU tasks?

Yes a while back Einstein recoded their software to by default set aside one cpu core just to keep the gpu fed, it's a pain in the neck when some of us have already done that and then Einstein does it again meaning 2 cpu cores are reserved for the gpu!

WB8ILI

Joined: 20 Feb 05

Posts: 45

Credit: 1229721696

RAC: 1373059

My experience on this project

27 Nov 2018 13:36:52 UTC

Message 167917

(moderation:

)

My experience on this project and Milkyway is that the CPU and GPU requirements are just ESTIMATES.

Sometimes a task may uses less than 50% of a GPU, in which case you could run two or more tasks simultaneously.

Also note that a task may use the CPU and/or GPU heavily for a while and then barely at all for a while.

GPU-Z (Windows) is nice for tracking GPU usage (has a graph). I don't know how to easily track CPU usage for a single Windows task except by watching the Task Manager.

Mr Anderson

Joined: 28 Oct 17

Posts: 45

Credit: 154618533

RAC: 44565

The tasks track the net CPU

27 Nov 2018 20:10:26 UTC

Message 167925 in response to message 167917

(moderation:

)

The tasks track the net CPU usage themselves. If you look in the task list in the web site then you see CPU time as well as the Run time so divide the former by the latter times 100 and you have the CPU percentage. In my case the "Gamma-ray pulsar binary search #1 on GPUs" tasks typically show CPU time at around 220 and run time at about 2500 which is less than 10% meaning that the estimate is a factor of more than 10 times too large. Other systems may of course be different so if it would be possible to base the CPU requirements on these data then the scheduling would be better suited to the particular computer.

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

Einstein@home and Boinc are

27 Nov 2018 21:48:57 UTC

Message 167926

(moderation:

)

Einstein@home and Boinc are designed to try and work on all kinds of systems and thus won't run optimally on most systems. There is a few options in Boinc that the user can try to optimize for a particular system.
One of them is using a file called app_config.xml to fine tune the CPU and GPU usage of an app.
Do note that the app in question will always use the resources that it needs, no more and no less, regardless of what the project or the user sets. The settings is only there to tell Boinc how many tasks to start and how to manage the cache of work.

Here's a link to the documentation for using an app_config.xml.

And here's an example that Mr Andersson could try if he feels inclined:

<app_config>
   <app>
      <name>hsgamma_FGRPB1G</name>
         <gpu_versions>
            <gpu_usage>1.0</gpu_usage>
            <cpu_usage>0.1</cpu_usage>
         </gpu_versions>
    </app>
</app_config>

The above code should be placed in a plain text file and saved as app_config.xml (Be sure the there's not an .txt appended) in {BOINC Data directory}\projects\einstein.phys.uwm.edu. The Boinc data directory is listed in the startup messages in the event log. Open Boinc Manager and look under the menu item "Options" to find it.

What the code above would do is tell Boinc that the FGRPB1G tasks will use 1 GPU and 1/10th of a CPU, Boinc will then use this information together with the same kind of info for other tasks to schedule (start) enough tasks to use all available resources, both CPU and GPU.
What's not obvious is that the task might need more or less resources than the settings are telling Boinc and thus the tasks might not run optimally. The community wisdom at Einstein says that Nvidia GPU tasks needs to have a CPU thread reserved as support or the performance of the task is impaired, but AMD GPU tasks can make due with less.
For any given system one needs to experiment to find the most productive state. What this entails is trying different settings and recording the results, then comparing them and choosing the one that yields the best results.

Do take note that once an app_config.xml has been used then if one want's to back out and use the project supplied settings one has to reset the project and thus lose any tasks in the cache. Setting "No new tasks" and running down the cache would be the preferred way.

Mr Anderson

Joined: 28 Oct 17

Posts: 45

Credit: 154618533

RAC: 44565

That worked perfectly, thanks

28 Nov 2018 0:02:31 UTC

Message 167929 in response to message 167926

(moderation:

)

That worked perfectly, thanks for the info.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119687160561

RAC: 25320496

Mr Anderson wrote:That worked

28 Nov 2018 8:33:28 UTC

Message 167931 in response to message 167929

(moderation:

)

Mr Anderson wrote:

That worked perfectly, thanks for the info.

You might thinks it's working perfectly but it looks like it's far from optimal.

You should read again what Holmis wrote and think about the need to experiment. I had a look at some of the results from your host with the HD7700 series GPU. I saw results from a day or two ago where the elapsed time varied between 2400 secs for one task and up to 3600 secs for some others. I then looked at the most recently returned results which would be after your installation of the app_config.xml. The crunch times are now varying between 4000 secs and 4600 secs. I feel fairly confident that would be from the 0.1 CPU setting. I imagine BOINC is now allowing you to run CPU tasks on all your CPU cores which means the poor discrete GPU task has to fight for service (even though it only needs a tiny bit of service).

You really need to understand that setting 0.1 for ncpus does NOT guarantee that the GPU task will get what it needs and (particularly) when it needs it, even though its needs may be quite small. What it DOES guarantee is that BOINC is not going to be restricted in any way from allocating CPU work to all the cores. And when it does that, your GPU task is highly likely to run in a sub-optimal manner. The project defaults seem to give you a much better result than what you have currently chosen.

I'm guessing your GPU may be a 7750 or 7770. If you would like to know what a 7770 GPU should be capable of, take a look at the results for this host. It's a 2009 vintage Phenom II and I installed the 7770 in it when I first started GPU crunching here - probably around 2010-2011. It has run 24/7 ever since. Please notice how relatively uniform the crunch times are. Classic indicators of sub-optimal crunching are crunch times that show considerable variation like yours do.

If you are prepared to experiment a little to improve performance, you can make a simple adjustment to your local preferences through BOINC Manager and leave the app_config.xml file as it is. That adjustment would be to change the % of cores that BOINC is allowed to use from 100% (if that is what you have it set at) to 87.5%. You will have 1 less CPU task running but the GPU task should run much faster again.

There is a further potential issue you need to think about. You have an active internal (Intel) GPU that would also need some CPU support. I have never used one of these so have no suggestions to offer about how to properly support it along with the discrete GPU. I don't even know if there is a default for how much CPU support BOINC will budget for when Intel GPU tasks are running. The CPU time component is quite small so perhaps nothing is budgeted.

In any case, it actually seems to be pretty stable at the moment so perhaps it will continue that way. The crunch times for it might improve a bit if you make the change I've suggested. One experiment you should consider is to temporarily suspend all the Intel GPU tasks for long enough to see what it does to the discrete GPU crunch times and how it affects the number of CPU cores that are running other CPU only tasks.

I'd certainly be interested to hear how you go if you try some of these experiments.

Cheers,
Gary.

Mr Anderson

Joined: 28 Oct 17

Posts: 45

Credit: 154618533

RAC: 44565

Thanks Gary for the heads-up.

29 Nov 2018 6:30:54 UTC

Message 167951 in response to message 167931

(moderation:

)

Thanks Gary for the heads-up. I was experimenting with running the BRP4 tasks to give the on-board Intel GPU something to do but I hadn't noticed that when these tasks were running the run times of the FGRPB1G jobs was so much worse. Today I let the BRP4 tasks that I had run out but didn't accept any more and the FGRPB1G run times appear to have gone back to normal. If the credit awarded is a reasonable reflection of their value, the BRP4 jobs just aren't worth running if they have such a negative impact on the throughput of the FGRPB1G jobs.

I actually have the CPU usage limit set fairly low (usually 30%) because this is my work PC and the fan just gets too noisy when the CPU gets busier, so the GPU tasks should have had all the CPU time available that they would have required. I suspect there is something funny about the Intel GPU/CPU architecture which maybe limits the CPU performance when the Intel GPU is working but I'm just guessing and it doesn't really matter since I'm not bothering with it any further. I will monitor the run times of the FGRPB1G jobs from the times when the PC is running and I'm not at work to see if the run time variations are perhaps because of me actually using the PC (the only problem is that when I remote desktop in to have a look, then that stops the FGRPB1G jobs - but that's another story).

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6596

Credit: 340168118

RAC: 264521

Holmis wrote:The community

29 Nov 2018 7:44:00 UTC

Message 167952 in response to message 167926

(moderation:

)

Holmis wrote:

The community wisdom at Einstein says that Nvidia GPU tasks needs to have a CPU thread reserved as support or the performance of the task is impaired, but AMD GPU tasks can make due with less.

I've had a long/hard look at the OpenCL standard, naturally it only specifies an interface not the implementation. One can choose to invoke some API calls as blocking, or non-blocking where one then relies on querying an event object to synchronise activity. That is : OpenCL has no concept of threads and one needs a CPU thread to launch kernels on the GPU via enqueued requests. What is certain is that NVidia implements OpenCL using CUDA, and thus lies both lesser efficiency and probably the need for other thread(s). The detail is within the driver, which then must determine the true CPU usage fraction ( as opposed to some declared ratio ) from its main thread and whatever others it may create. For the brave : profilers do exist ....

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119687160561

RAC: 25320496

OK, your computer has a total

29 Nov 2018 8:38:43 UTC

Message 167953

(moderation:

)

OK, your computer has a total of 8 threads. If you usually use only 30% for the allowed CPU usage (30% of 8 is 2.4) that means that BOINC will restrict crunching so that only 3 CPU threads can be running. With the default setting of 1 CPU thread to support a FGRPB1G task, that 3 would have reduced to just 2 for any CPU only tasks (2.4-1.0=1.4 which rounds up to 2 threads).

By using 0.1 in app_config.xml, BOINC would see the allowed threads as 2.4-0.1=2.3 and since that is still greater than 2, 3 CPU threads would be allowed.

For simplicity, you could have achieved the same result (without using app_config.xml to change the default) just by setting the allowed cores to 50%. Boinc would be allowed to use 4 of your cores but since the FGRPB1G task would 'reserve' an extra one, there would just be 3 allowed CPU threads.

From the tone of your earlier comments, I thought you were trying to get all (or most) cores crunching. If you are comfortable with adjusting the %cores setting to achieve the desired outcome, there's really no benefit to changing the cpu_usage parameter at all. It doesn't change what a GPU task consumes and would make no difference to temperature/fan speed. It's just a number in a calculation that specifies what BOINC is allowed to run.

I didn't try to work out when and for how long you had been running Intel GPU tasks. Yes, you are quite correct in thinking that the 'work content' is very small. Some years ago BRP4 tasks were 16 times larger and the credit was set at 1000. That is why the current 16 times smaller tasks only generate 62 credits. They are really designed for low power mobile type devices. With your 30% cores setting, there should have been no problem for CPU support for FGRPB1G tasks. In that case, the longer and variable run times you experienced very likely were caused by simultaneously using the Intel GPU. That's something I have no direct experience with but I recall others saying things along these lines in the past.

Without those tasks running it will be interesting to see if the FGRPB1G task times stabilise.

Cheers,
Gary.

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6596

Credit: 340168118

RAC: 264521

Further research indicates

29 Nov 2018 9:32:00 UTC

Message 167956

(moderation:

)

Further research indicates that the 1 CPU allocation per 1 GPU is an Nvidia OpenCL implementation 'bug' present since 2011. It's a busy wait scheme ie. the CPU thread spins while awaiting for a kernel to finish, and doesn't yield otherwise. How inefficient this is depends upon the duration of the kernels, quicker kernels will engage the CPU more as they complete. An alleged reason for this design choice is that busy wait ( "Are we there yet ?" ) also equals instant response upon completion. However the only known mechanism to turn this behaviour off is to use CUDA ( the cudaDeviceScheduleYield option and CudaUseBlockingSync platform property ). IMHO it is probably an intended 'feature' rather than an error.

Since we're still talking about this in late 2018, I suspect pigs-may-fly/hell-freezes-over/cows-come-home before NVidia alters this.

Cheers, Mike.

( edit ) Presumably then, the extra thread needed for NVidia cards is replete with non-blocking CUDA calls.

( edit ) FWIW : OpenCL kernels are not equivalent to CPU threads. A kernel is ( typically/probably ) instantiated onto distinct GPU compute units, whereas a CPU thread is a matter of time slicing a given CPU core. One is a spatial division of labour the other is temporal.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

FGRPB1G: CPU usage factor

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner