Support for (integrated) Intel GPUs (Ivy Bridge and later)

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250377385
RAC: 35103

RE: Since the BOINC server

Quote:
Since the BOINC server scheduler code update for intel_gpu was contributed by Oliver Bock from this project

Would be news to me. Support for intel_gpu was added mainly by D.A., I backported this to the older server (scheduler) code we use on E@H.

There was a bug in our scheduler customization code that had to do with anonymous platform and BRP4 WUs, but that was independent of the intel_gpu support. This bug had been fixed on Sep 12, but maybe there's another one. To debug this the sched_request file would be of most help.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2954923256
RAC: 714765

RE: RE: Since the BOINC

Quote:
Quote:
Since the BOINC server scheduler code update for intel_gpu was contributed by Oliver Bock from this project

Would be news to me. Support for intel_gpu was added mainly by D.A., I backported this to the older server (scheduler) code we use on E@H.

There was a bug in our scheduler customization code that had to do with anonymous platform and BRP4 WUs, but that was independent of the intel_gpu support. This bug had been fixed on Sep 12, but maybe there's another one. To debug this the sched_request file would be of most help.

BM


Sorry, it turns out I slightly misled you with the reference to Oliver. His name was on the current git checkin, but that was part of the confusion from SVN to git, first as BOINC, then as BOINC-v2. This code was in a batch that got lost the first time round, and Oliver tidied up manually.

I had a visitation from another old friend yesterday - the client that goes on asking, and asking, and asking: thank goodness for a daily quota of 384 tasks. Those are mainly CUDA, on the a machine which also has an intel_gpu. I think there might be a small scheduler problem with machines/venues which allow both - request 0 sec cuda, many secs intel_gpu, no tasks sent, no reference to intel gpu in server log.

Digging out files - logs and sched_request - to illustrate all of these is on my ToDo list.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250377385
RAC: 35103

RE: There are two active

Quote:

There are two active mechanisms for reporting server preferences back to the client.

Old:
1
1
0
1

New:
CPU
ATI
intel_gpu

Einstein is sending the 'old' version properly, including , but the client is only recognising cpu/ati/cuda. With your old server code, it isn't sending the 'new' format at all. (NB - small bug, it's actually duplicating the old format and sending it twice)

It would be helpful for me to get a complete sched_reply file from a project that replies with a correct 'new' format.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2954923256
RAC: 714765

I'll add that to the list.

I'll add that to the list.

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 984
Credit: 25171376
RAC: 43

Hi

Hi Richard,

Quote:
Quote:
Quote:
Since the BOINC server scheduler code update for intel_gpu was contributed by Oliver Bock from this project

Would be news to me. Support for intel_gpu was added mainly by D.A., I backported this to the older server (scheduler) code we use on E@H.

There was a bug in our scheduler customization code that had to do with anonymous platform and BRP4 WUs, but that was independent of the intel_gpu support. This bug had been fixed on Sep 12, but maybe there's another one. To debug this the sched_request file would be of most help.

BM


Sorry, it turns out I slightly misled you with the reference to Oliver. His name was on the current git checkin, but that was part of the confusion from SVN to git, first as BOINC, then as BOINC-v2. This code was in a batch that got lost the first time round, and Oliver tidied up manually.

Just for clarification: which commit are we talking about? Is it the one you mentioned in another post: this one?

I just want have a look at how boinc-v2.git still conveys that I'm the author, if at all.

Thanks,
Oliver

Einstein@Home Project

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 984
Credit: 25171376
RAC: 43

Richard wrote: 2) I'm

Richard wrote:


2) I'm intrigued by the difference in CPU usage between the two applications.

All tasks for computer 8864187 Einstein - CPU used for 3% of runtime.
All tasks for computer 67008 SETI Beta - CPU used for 99% of runtime.

Good for us :-)

To make the numbers more meaningful it would be good to measure the GPU utilisation of both apps though. Not sure if there are tools out there by now to get at those figures.

Also, keep in mind the CPU usage is relative to the GPU power: the more powerful the GPU, the shorter is the relative time of the GPU tasks, thus the more often the CPU is used per time interval (determining the "usage").

Quote:

Even though the Einstein app uses very little CPU time, I did find it needed to have a CPU core free to use: these tasks were run (both projects) with BOINC set to use 75% of processors, so three instances of SIMAP were running alongside and Einstein tasks were taking 11 minutes. With four SIMAP running (100% usage), the Einstein app had barely passed half way after 45 minutes, a huge difference.

Right, that's the case for all GPU apps. GPUs are co-processors, not additional resources. They have to be "fed" by the CPU. If it's not available it'll slow down the GPU, in particular when the CPU parts are executed serially (not in parallel to the GPU parts), which is the case for our apps. This makes the feeding dependency even stronger.

Quote:

I also noted a significant difference in power consumption. With Einstein running on the iGPU, total system draw is 88W: with SETI Beta, power draw rises by 10%, to ~98W.

I'm not sure I understand this correctly: do mean by adding SETI or when running solely SETI you see a 10% increase?

Quote:

I'm in discussion with the SETI app developer (Raistmer) as to why this might be the case - he speculates that it might be a difference in kernel lengths. But it seems unlikely that OpenCL spin-wait loops would use so much power.

I don't think OpenCL spins but rather yields control to the OS. That's why our app gets as low as 3% CPU usage. It would be up to 100% if it would spin. CUDA offers a setting to define this, OpenCL doesn't. Fortunately the default choice is the right one for the BOINC use case.

Anyhow, in terms of power this might indicate that the GPU is simply more power-efficient than the CPU - something that doesn't really surprise me.

HTH,
Oliver

Einstein@Home Project

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2954923256
RAC: 714765

Thanks for coming back to

Thanks for coming back to this.

To make it clear: I'm talking about host 8864187 as a comparison tool. That's a Haswell i5 with the Intel HD 4600 activated, but no other coprocessor or GPU.

Quote:
Richard wrote:

2) I'm intrigued by the difference in CPU usage between the two applications.

All tasks for computer 8864187 Einstein - CPU used for 3% of runtime.
All tasks for computer 67008 SETI Beta - CPU used for 99% of runtime.

Good for us :-)

To make the numbers more meaningful it would be good to measure the GPU utilisation of both apps though. Not sure if there are tools out there by now to get at those figures.

Also, keep in mind the CPU usage is relative to the GPU power: the more powerful the GPU, the shorter is the relative time of the GPU tasks, thus the more often the CPU is used per time interval (determining the "usage").


Seems like GPU-Z has enough to get us started.

On the left, idle. On the right running BRP (Arecibo) v1.34 for intel_gpu.

I'm trying to run a back-to-back comparison with SETI's (Beta) AstroPulse v6.05, also for intel_gpu - so straight comparison, same hardware, same running conditions. Unfortunately, they went into weekly maintenance before I could get a power/utilisation grab for that app - I'll do one later.

Quote:
Quote:
Even though the Einstein app uses very little CPU time, I did find it needed to have a CPU core free to use: these tasks were run (both projects) with BOINC set to use 75% of processors, so three instances of SIMAP were running alongside and Einstein tasks were taking 11 minutes. With four SIMAP running (100% usage), the Einstein app had barely passed half way after 45 minutes, a huge difference.

Right, that's the case for all GPU apps. GPUs are co-processors, not additional resources. They have to be "fed" by the CPU. If it's not available it'll slow down the GPU, in particular when the CPU parts are executed serially (not in parallel to the GPU parts), which is the case for our apps. This makes the feeding dependency even stronger.


I'm most familiar with that from the CUDA case. Those apps seem to be able to use the vestigial time-slices a multi-tasking OS (like Windows) makes available, even when the CPU is also running a full set of CPU-demanding BOINC science tasks - plus Windows itself, this browser, and a great deal of housekeeping besides.

The 'loaded' GPU-Z shot above was taken with three copies of the BOINC SIMAP application running. With four copies running, it looks like

I get nothing like that reduction in co-proc usage when I run a fully-loaded CPU plus a CUDA app - any apps, any project.

Quote:
Quote:
I also noted a significant difference in power consumption. With Einstein running on the iGPU, total system draw is 88W: with SETI Beta, power draw rises by 10%, to ~98W.

I'm not sure I understand this correctly: do mean by adding SETI or when running solely SETI you see a 10% increase?


No - I was keeping CPU usage constant (three copies of SIMAP), and comparing power usage if either SETI or Einstein was running on the Intel co-processor. If SETI replaces Einstein, the power draw from the wall socket increases (good for Einstein again :-)).

Quote:
Quote:
I'm in discussion with the SETI app developer (Raistmer) as to why this might be the case - he speculates that it might be a difference in kernel lengths. But it seems unlikely that OpenCL spin-wait loops would use so much power.

I don't think OpenCL spins but rather yields control to the OS. That's why our app gets as low as 3% CPU usage. It would be up to 100% if it would spin. CUDA offers a setting to define this, OpenCL doesn't. Fortunately the default choice is the right one for the BOINC use case.

Anyhow, in terms of power this might indicate that the GPU is simply more power-efficient than the CPU - something that doesn't really surprise me.

HTH,
Oliver


Again, it's a comparison between two different OpenCL apps (both running on the same co-processor hardware) that I'm trying to tease out.

It doesn't surprise me that OpenCL doesn't offer a setting to control yielding to CPU - I'd guessed as much, but that's the first time anyone has put it in writing. Raistmer is getting his 99% CPU usage with OpenCL 1.1 - you are getting 3% with OpenCL 1.2: could that be significant?

Alternatively, if it isn't down to the OpenCL version in use, Raistmer was wondering if you could supply any figures for the typical and minimal size of single kernel call in Einstein? I think his theory is that the longer the kernels run, the less often the app has to yield to the CPU, and the less demand is placed on the CPU. Or something like that - I'm just carrying messages to-and-fro here.

I'll invite Raistmer to read what we've discussed so far, and pass back any messages (I don't know whether he has enough RAC to post directly on this forum - I suspect not)

Thanks,
Richard

P.S. The Einstein task took 40 minutes to reach 50% with four SIMAP running - normally, they take 11 minutes to complete with three SIMAP. Reverting...

P.P.S. I'll come back and sort out those git commits later.

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 984
Credit: 25171376
RAC: 43

RE: I'm most familiar with

Quote:


I'm most familiar with that from the CUDA case. Those apps seem to be able to use the vestigial time-slices a multi-tasking OS (like Windows) makes available, even when the CPU is also running a full set of CPU-demanding BOINC science tasks - plus Windows itself, this browser, and a great deal of housekeeping besides.

The 'loaded' GPU-Z shot above was taken with three copies of the BOINC SIMAP application running. With four copies running, it looks like

[ ... ]

I get nothing like that reduction in co-proc usage when I run a fully-loaded CPU plus a CUDA app - any apps, any project.

Ok, but keep in mind that:

a) when we started out with CUDA volunteers wondered why we reserved a full CPU core for those apps. It was exactly because of this slowdown. I think we could change that only when BOINC raised the default process priority of GPU apps (idle -> normal). There should be threads about all this in this forum.

b) you might be comparing apple to oranges here as we don't know how CUDA implements its CPU/GPU threading model and how that compares to OpenCL's implementation. OpenCL might simply be more demanding in terms of its feeding requirements or context/thread switching respectively. The individual drivers play a crucial role here and, frankly, let's not discuss those...

Quote:

Raistmer is getting his 99% CPU usage with OpenCL 1.1 - you are getting 3% with OpenCL 1.2: could that be significant?

I doubt it. As far as I'm aware we don't use any OpenCL 1.2 specific features. In fact I'm surprised that we're actually using 1.2 to build our apps. Are you sure about that? Where did see this?

Quote:

Alternatively, if it isn't down to the OpenCL version in use, Raistmer was wondering if you could supply any figures for the typical and minimal size of single kernel call in Einstein? I think his theory is that the longer the kernels run, the less often the app has to yield to the CPU, and the less demand is placed on the CPU. Or something like that - I'm just carrying messages to-and-fro here.

Hm, it's not trivial to profile individual kernels in terms of absolute runtime with the tools at our disposal. Also, it's not just a matter of the individual kernels but rather the workgroup size you use and the total number of work items as both parameters define the "length" of the actual kernel call. One also needs to take into account that each GPU series can have different limits for the workgroup size so you have to probe and adjust them for optimal performance (hint!). This concerns AMD GPUs much more than NVIDIA GPUs, not sure about the Intel GPUs...

Before we go deeper I'd be interested to see the GPU utilisation of the SETI app to put the 99% CPU usage into perspective.

Best,
Oliver

Einstein@Home Project

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2954923256
RAC: 714765

RE: Just for clarification:

Quote:

Just for clarification: which commit are we talking about? Is it the one you mentioned in another post: this one?

I just want have a look at how boinc-v2.git still conveys that I'm the author, if at all.

Thanks,
Oliver


I think the confusion arose when I looked at the changelog for cs_account.cpp - the file which Claggy posted in full.

There are a couple of lines:

Quote:
@1174b00 8 months oliver.bock - client/manager: tweaks to Intel GPU code
@ce87ec9 8 months oliver.bock OpenCL: First pass at adding support for Intel Ivy Bridge GPUs


and things like http://boinc.berkeley.edu/trac/changeset/ce87ec9848643a094337f67f78a1d5077cf7f772/boinc-v2/client/cs_account.cpp - all arising from the clean-up in March - which perhaps still lead to the blame being wrongly put on Oliver....

With regard to the underlying issue - the XML code controlling intel_gpu preferences - I think we've got to the nub of the problem via email, and I gather from Rom that David is planning to have another look at it this week.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2954923256
RAC: 714765

RE: As far as I'm aware we

Quote:

As far as I'm aware we don't use any OpenCL 1.2 specific features. In fact I'm surprised that we're actually using 1.2 to build our apps. Are you sure about that? Where did see this?

Best,
Oliver


It crops up periodically in threads like this morning's does Einstein have WUs for an INTEL GPU? :P

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.