BRP4 Intel GPU app feedback thread

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 589204716

RAC: 129444

Well, everything depends

31 Aug 2016 20:07:27 UTC

Message 149311 in response to message 148990

(moderation:

)

Well, everything depends strongly on the code being run. Collatz is a best-case scenario for the iGPU as it hardly needs any main memory bandwidth at all. Hence it doesn't disturb other tasks, in contrast to running Einstein or SETI on that GPU. POEM did also behave well, but they never made it past the beta due to seemingly random incorrect results (random in the way that it wasn't possible to reproduce them, so they couldn't be fixed).

MrS

Scanning for our furry friends since Jan 2002

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 589204716

RAC: 129444

Christian, thank you for

5 Sep 2016 20:31:01 UTC

Message 149476 in response to message 149041

(moderation:

)

Christian, thank you for trying to resolve this long-standing issue! However, I'm not convinced that MADs are the reason due to the following reason:

- Skylake doesn't seem to be any faster than prior Intel GPUs per clock & Shader. Fusing 2 operations should have had an effect, wouldn't it?

- the current AMD and nVidia GPUs also support MAD, or more precisely: need it to reach their peak performance. One would think their compilers would use it as well and should run into the same rounding issue.

Or was the example simply too simplified?

MrS

Scanning for our furry friends since Jan 2002

Raistmer*

Joined: 20 Feb 05

Posts: 208

Credit: 182132462

RAC: 6983

Christian Beer wrote:This

10 Oct 2016 10:38:39 UTC

Message 150495 in response to message 149041

(moderation:

)

Christian Beer wrote:

This goes down to the level of assembler code that is executed on the GPU. Here is the most basic explanation I got from Intel:
Say you have the following:
Answer_mul = float0 * float1; Answer_add = Answer_mul + float2;
This gets converted to the following in assembly.....
  Mul %answer_mul, %float0, %float1
  Add %answer_add, %answer_mul, %float2
The value in the register "answer_mul" is rounded before it does the addition.
In the Intel case (and AARch64 too) these two instructions get fused into a "mad" instruction
  Mad %answer_mad, %float0, %float1, %float2
The result of the mad instruction is more precise for it does not do the rounding after the multiply.
And because we do a lot of summing of multiplications the seemingly small rounding errors turn out to be significant in the end. No random numbers involved.

If it was some Intel's forum conversation instead of direct mail exchange could you post link to that Intel's forum thread please.

We @SETI have similar precision issues with OpenCL iGPU app on some of iGPU models.

Christian Beer

Joined: 9 Feb 05

Posts: 595

Credit: 197661508

RAC: 18428

It was a direct mail exchange

10 Oct 2016 14:17:37 UTC

Message 150499 in response to message 150495

(moderation:

)

It was a direct mail exchange with an Intel developer where I got the explanation from.

Raistmer*

Joined: 20 Feb 05

Posts: 208

Credit: 182132462

RAC: 6983

On SETI we were able to

16 Oct 2016 19:59:07 UTC

Message 150798 in response to message 150499

(moderation:

)

On SETI we were able to improve results precision up to acceptable level by:

1) disabling -cl-unsafe-math-optimizations

2) adding #pragma FP_CONTRACT OFF at the beginning of kernels file.

Hope this will help.

Jim Potter

Joined: 20 Mar 05

Posts: 8

Credit: 2295053325

RAC: 206428

I have a brand new I7-7700 PC

13 Apr 2017 14:54:00 UTC

Message 157084

(moderation:

)

I have a brand new I7-7700 PC with an NVIDIA 1050 GPU and an Intel HD Graphics 630 GPU. I've got 32GB of system memory and 4GB of NVIDIA graphics memory. None of my Einstein tasks for the Intel GPU ever get dispatched. They all say

"Postponed: Not enough free CPU/GPU memory available! Delaying next attempt for at least 15 minutes...".

The Intel Graphics Settings app tells me I have about 16GB of memory (1/2 system memory) available for the Intel GPU. There are no applicable settings in BIOS.

Any thoughts?

Stick

Joined: 24 Feb 05

Posts: 790

Credit: 34057556

RAC: 8783

I got a new laptop about 10

29 Sep 2017 2:12:43 UTC

Message 162024

(moderation:

)

I got a new laptop about 10 days ago with an INTEL Intel(R) HD Graphics 620 (3218MB) GPU and it's having trouble with Binary Radio Pulsar Search tasks. Tasks are finishing OK but a large (and growing) number are getting marked invalid. See Invalid Tasks for Computer 12571639. FYI: About a week ago, I finished installing the latest Windows updates, BIOS, drivers as well as BOINC 7.8.2. And, I have double-checked that my GPU driver is up-to-date.

When I initially reported this problem here 2 days ago the invalids total was 8. It is now 18. If you read my exchange with Gary Roberts, you'll see that we discussed the possibility of a problem with my GPU driver and/or a problem with v1.34 (opencl-intel_gpu-Beta) windows_x86_64.

As I told Gary, I changed my project preferences to stop running Beta apps to see if changing to a stock app would change things - i.e. eliminate the GPU driver as the problem. But things didn't go as I expected. Instead of getting a stock app downloaded, I get a BOINC message saying "see scheduler log messages on https://einsteinathome.org/host/12571639/log", and, in short, that long log essentially says that there are no non-Beta apps that will run on my GPU. (I plan to reset the project and try that again as soon as the 3 "In Progress" Continuous Gravitational Wave tasks on this computer have finished.)

Just wondering if there might be any Einstein set-up parameters I might have overlooked and need to tweak?

If not and you suspect that my invalids problem is a programming issue with the Beta app, I am offering my host to help out.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119594624568

RAC: 24804226

In my previous answer, I

29 Sep 2017 7:35:17 UTC

Message 162026

(moderation:

)

In my previous answer, I said,

Quote:

At one stage the standard app was only allowed to get work if a known good driver was present. As hardware and driver versions changed, this became too hard to keep updated so a beta app without driver restrictions was produced.

I'm sorry that I didn't make things a bit more clear so I'll try to rectify that now.

The problem at the time tended to be that newer drivers were giving invalid results. The driver restrictions mentioned above were to prevent the standard BRP4 app from getting work if the driver being used wasn't one of the older 'known good' versions. As time marched on, people wanted to test newer versions to see if there was any improvement. So the Devs released a beta app (the exact same app I believe - just relabeled as beta) without restrictions on the driver version so that such tests could be performed. Caveat Emptor conditions applied, i.e. no guarantees whatsoever :-). The assumption was that people willing to run beta test apps would be responsible enough to observe and bail out if the results were invalid - and then perhaps rinse and repeat with a different driver.

There was no talk about there being a fixable problem with the app. It was all about driver problems. I had no interest in using an iGPU so I just skimmed what was being posted at the time. Chances are I've completely forgotten crucial details. I think most of the stuff was about 4xxx series iGPUs but there were some reports about later ones as well. If you're keen to pursue this, it would be worthwhile to have a good look at what has already been reported. I'm pretty sure there were several reports about skylake series iGPUs.

Cheers,
Gary.

Stick

Joined: 24 Feb 05

Posts: 790

Credit: 34057556

RAC: 8783

Gary, All I can do is

29 Sep 2017 23:51:57 UTC

Message 162034 in response to message 162026

(moderation:

)

Gary,

All I can do is confess to being a bit dense and to letting my preconceived ideas get in the way of understanding what I read. I have to say that Einstein's approach to app/driver mismatches is very different from what I have seen at other projects - in particular, the non-traditional use of the term Beta in an app's name. And, I am more used to seeing computation errors in mismatch cases than completed but invalid results. I am sure those differences got in the way of my understanding what you wrote. But I understand now (I think). And I will monitor my computer for driver updates and try again here whenever that happens.

Although we haven't solved anything here, this little episode has, at least, brought back some fond memories from years ago. And for that, I am happy.

Stick

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119594624568

RAC: 24804226

Stick wrote:... Einstein's

30 Sep 2017 2:40:24 UTC

Message 162037 in response to message 162034

(moderation:

)

Stick wrote:

... Einstein's approach to app/driver mismatches is very different from what I have seen at other projects - in particular, the non-traditional use of the term Beta in an app's name. And, I am more used to seeing computation errors in mismatch cases than completed but invalid results.

Just to add a bit more explanation for the benefit of any others lurking here. In the project preferences there is a sub-heading entitled "Beta Settings". There is a single Y/N question there, "Run test applications?". The default answer is 'No'. People have to make a conscious choice to participate in these tests and appropriate warnings are given.

So the word 'beta' has become synonymous, not necessarily with 'quality' but rather with something being tested. It could be brand new code of unknown quality or it could be well tested code where something else rather than the code itself is being tested. Whenever any type of test is being performed, it is usual, even for mature code, to have the word 'beta' added to the app name to draw attention to that fact.

This mechanism is used to test all sorts of things and has a number of advantages. It means that the standard 'set and forget' operation can continue undisturbed whilst a variety of possible changes/enhancements are being tested by just a sub-set of willing participants. I believe that you will get test app tasks in preference to standard app tasks if both are available and you have opted-in. One of the things being tested can be the validator itself. Only one task in a quorum is allowed to be a test app task. This prevents test results that are actually incorrect from agreeing with each other and being accepted. Small batches of test tasks can be used with shorter deadlines and when these are gone, a participating host can revert to the standard tasks. This way, proposed changes can be quickly verified or backed out if something goes wrong, without any interruption to standard operations.

With regard to your comment about invalid results being associated with computation errors rather than some other problem with successfully completed results, the reverse is often true here. There are lots of examples where the problem is how closely the two individual results in the quorum do match and not that the computation itself failed.

If you read back through this thread, you will find discussion about the possible causes of the invalid results with iGPUs and it was to do with results not agreeing closely enough. The reason was suggested to be a loss of precision with compounding rounding errors accumulating over large numbers of multiply + add operations either done using separate instructions or being combined in a single instruction. It is possible that particular driver versions were more affected than others. There was talk of relaxing the validator a little but I don't know if there was a beneficial outcome or not.

Cheers,
Gary.

BRP4 Intel GPU app feedback thread

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports