BRP4 Intel GPU app feedback thread

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752710280
RAC: 1461291

RE: Yes, it's a shame Intel

Quote:

Yes, it's a shame Intel doesn't fix this.

MrS


All we know at this stage is that the newer Intel drivers, and the older Einstein apps, are not compatible with each other. I haven't seen anyone with technical/development expertise - on this or any other project - investigate what's going wrong, and identify who has to fix what.

disturber
disturber
Joined: 26 Oct 14
Posts: 30
Credit: 57155818
RAC: 0

After rollback, I no longer

After rollback, I no longer produce any invalid intel_cl workunits. So I am happy.

I had to search the forum for this problem. Is there a specific sticky that would have the info not to use any of the newer Intel drivers? It would be a shame for some people not knowing this and producing a high percentage of invalid work units, if they can't find the information easily.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 536534330
RAC: 186841

RE: All we know at this

Quote:
All we know at this stage is that the newer Intel drivers, and the older Einstein apps, are not compatible with each other. I haven't seen anyone with technical/development expertise - on this or any other project - investigate what's going wrong, and identify who has to fix what.


You're right with this.. but it did work using the older drivers and it's still the same executable. Does this leave any realistic wiggle room that it was not Intel who broke something?

MrS

Scanning for our furry friends since Jan 2002

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023354931
RAC: 1816661

RE: Does this leave any

Quote:

Does this leave any realistic wiggle room that it was not Intel who broke something?

MrS


Yes.

While that type of "swap and see" debugging dominates modern practice, it is because true diagnosis is so difficult, and often requires information not available, not because it always gets the right answer.

disturber
disturber
Joined: 26 Oct 14
Posts: 30
Credit: 57155818
RAC: 0

Does anyone have an idea as

Does anyone have an idea as to why I still get some invalid workunits on the 4600 gpu and none on the 4000 gpu, using the same version Intel driver?
Error rate is something like 2 out of 200, not a lot.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752710280
RAC: 1461291

RE: RE: All we know at

Quote:
Quote:
All we know at this stage is that the newer Intel drivers, and the older Einstein apps, are not compatible with each other. I haven't seen anyone with technical/development expertise - on this or any other project - investigate what's going wrong, and identify who has to fix what.

You're right with this.. but it did work using the older drivers and it's still the same executable. Does this leave any realistic wiggle room that it was not Intel who broke something?

MrS


Yes. For a worked example, take the launch of GPGPUs into the Distributed Computing world with NVidia's donation of a SETI@Home application in 2008. All went well (give or take a few bugfixes) until the release of the Fermi GPU architecture in 2010. It turned out that NVidia had taken a few 'optimisation' shortcuts in their own application, and when Fermi came along, the rules were tightened, and the application didn't work any more.

OK, so that's card firmware rather than a driver, but the principle is still close: something which a developer assumed (with supporting experience) would work in an early release runtime, may not be supported in the same way with a more mature release.

boinc127
boinc127
Joined: 17 Mar 11
Posts: 23
Credit: 4003975
RAC: 1

I've noticed what looks like

I've noticed what looks like another spike in invalid BRP4 workunits. I've got 180 valid, 14 invalid (mostly since the 25th of November), with 104 pending workunits with 30 inconclusives (as far as I'm concerned an unacceptable ratio that would normally cause me to stop crunching and check my computer for problems). Either I've got a problem with my computer or my wingmen are using the affected Intel GPU drivers again. I've kept my Intel driver at 10.18.10.3621 (just verified the version while writing this) because as far as I know its the most recent version that works with HD 4600's and Einstein BRP4 workunits.

So is this another problem caused by automatic updates with Windows or something? Windows 8.1 does automatically update drivers unless you specifically tell it not to. I have had my Intel driver upgraded without my knowledge or even an indication it was even upgraded, except for the customary screen flicker.

If enough results get validated by 2 affected wingmen, won't that affect the data?

Maximilian Mieth
Maximilian Mieth
Joined: 4 Oct 12
Posts: 128
Credit: 9885011
RAC: 2193

I have also noticed a

I have also noticed a problem. Usually I never have invalids. Now after reading the last posts I checked my task list and noticed five invalid BRP4s from my HD4000. In all cases my wingmen were running a HD4600. Also in all cases the third task generated for this workunit was allocated to other wingmen with HD4600s and they validated without any problems. This seems very strange ...
I am running driver 10.18.10.3621.

boinc127
boinc127
Joined: 17 Mar 11
Posts: 23
Credit: 4003975
RAC: 1

Same here. If I look at the

Same here. If I look at the most recent invalid task I have done, from November 30, both wingmen who were used to validate each other and invalidate my results, one host has 184 invalid tasks and validate errors, and the other host has 249 invalids and validate errors. It seems the wingmen are using a bad Intel driver and both have Haswell GPUs. Hopefully not too many invalid workunits are being validated. If too many invalid results get validated, won't that contaminate the data being collected?

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 536534330
RAC: 186841

If it was not clear by now..

If it was not clear by now.. we have a real problem here: the affected Haswell hosts, which presumably run the newer driver, have an average success rate of only 42.7% +/- 9.3%.

Here's what I did to get this value:
- I collected data from all 18 invlid WUs currently in the log of my main PC (success rate 97.3%)
- in all cases two Haswell hosts validated each other, against my result
- I collected the data only for "Binary Radio Pulsar Search (Arecibo)"
- and noted the number of valid and invalid tasks
- overall those 36 cases involved 31 distinct hosts (I didn't count any twice)
- in sum I counted 2976 valid and 4000 invalid WUs, which leads to an average success rate of 42.7%
- the standard deviation of the individual data is 9.3%

Apart from wasted crunching time and frustrated users, from my point of view there are a few possible consequences of this:

1. If the results from the Haswells (which presumably run the newer driver) are incorrect, Einstein is collecting more and more of those, as the share of Haswell iGPUs and the newer drivers increases. In this case either the error should be fixed, maybe together with Intel, of the new driver should be excluded from being used at Einstein.

2. If the old results are incorrect, using the new drivers should be encouraged or even mandatory.

3. If both results are correct enough the validator should be adapted to accept both.

In case 1 and 2 the users should be warned & informed about this using the BOINC message system. Case 3 would obviously be the best for us.

Edit: thanks Richard and archae, I'll tame myself better in the future :)

MrS

Scanning for our furry friends since Jan 2002

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.