Widespread BRP4 validation errors

boinc127
boinc127
Joined: 17 Mar 11
Posts: 23
Credit: 4003975
RAC: 0
Topic 197717

There appears to be a problem with some of the workunits using the Intel GPU. I have been carefully tracking my BRP4 Intel GPU workunits ever since I upgraded my Intel driver and then downgraded it after it came up with serious validate errors and invalids. I just got a workunit from 09/13/2014 invalidated by 2 other 4600 HD Intel GPUs. From what I can tell those hosts GPUs are also having validate and invalid errors, possibly from the upgraded driver. The problem I had was with driver version 10.18.10.3907.

I was having several (a hundred or so) validate errors and invalids with my BRP4 workunits until I saw the errors and downgraded my driver. Now all of my BRP4 workunits for Intel validate and are correct. I haven't had any invalid tasks since 09/07/2014. However I spotted one today. It was invalidated by 2 other Intel GPUs hosts (HD 4600) with what I suspect are the problem drivers as well.

Workunit http://einsteinathome.org/workunit/198965460

If you look at the workunit history of the 2 other hosts that validated against mine, (BRP4 Intel GPU) they are also having quite a few invalid and validate errors.

http://einsteinathome.org/host/10443395/tasks
http://einsteinathome.org/host/10027292/tasks

According to the lists host 104433395 has had 178 invalids and validate errors, while host 10027292 has had 272 errors.

Although one or two workunits being invalidated by other hosts isn't a big deal to me, it seems the real problem lies with the drivers. From what I can tell, the old driver (10.18.10.3621) is coming up with different solutions to the workunit than the newer driver is (10.18.10.3907). I don't know the odds of 3 HD 4600 Intel GPUs validating against each other, but if the problematic driver is the cause of this, can't the data become contaminated if this happens enough?

P.S.
Please see message http://einsteinathome.org/node/197052&nowrap=true#133451 for my original problem with the Intel GPU driver ver. 10.18.10.3907. I've probably run tens of thousands of BRP4 workunits and have never had errors until I upgraded to the 10.18.10.3907 driver.

boinc127
boinc127
Joined: 17 Mar 11
Posts: 23
Credit: 4003975
RAC: 0

Widespread BRP4 validation errors

The problem may be worse than I thought. It looks like those hosts that validated against my computer are having workunits being validated and invalidated by other hosts with the same type of issues (many BRP4 workunits being validated by other HD 4600 Intel GPUs, but also having several, in the hundreds, workunits being invalidated and having validate errors). I fear if not enough workunits have been errored out by the max errors there could be some big problems down the line.

I hope the project administrators see this before it grows into a huge problem later on.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2142
Credit: 2774324793
RAC: 852310

RE: There appears to be a

Quote:
There appears to be a problem with some of the workunits using the Intel GPU. I have been carefully tracking my BRP4 Intel GPU workunits ever since I upgraded my Intel driver and then downgraded it after it came up with serious validate errors and invalids. I just got a workunit from 09/13/2014 invalidated by 2 other 4600 HD Intel GPUs. From what I can tell those hosts GPUs are also having validate and invalid errors, possibly from the upgraded driver. The problem I had was with driver version 10.18.10.3907.


I am currently using Intel(R) Graphics Driver: 10.18.10.3621, released 21 May 2014, download name Win64_153322.exe/zip. Host 5744895 is only showing 5 error/invalid and over 800 completed tasks, so I think we can recommend that driver version.

It is currently showing as 'latest' on the Intel driver download pages, so perhaps the 3907 variant has been withdrawn at source.

Maximilian Mieth
Maximilian Mieth
Joined: 4 Oct 12
Posts: 128
Credit: 9911906
RAC: 1283

I'm also using the .3621

I'm also using the .3621 Intel driver and do not have any invalids or errors. The new driver seems to have caused some trouble. See also this post.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 540314127
RAC: 130774

Running driver 10.18.10.3621

Running driver 10.18.10.3621 on 2 hosts with HD4000. I am seeing a slightly increased rate of invalids since about 1 week. It used to be approximately one WU evry 2 weeks or so, whereas now it's about one per day. It may well be due to other hosts running the new questionable driver.

MrS

Scanning for our furry friends since Jan 2002

boinc127
boinc127
Joined: 17 Mar 11
Posts: 23
Credit: 4003975
RAC: 0

I think there actually may be

I think there actually may be a connection between the bad driver (3907) and the actual GPU hardware. It seems from what I can tell the hosts with the HD 4600 GPU seem to have the problem. Maybe there is a bug in the Haswell processor that the new Intel driver uncovered. They already uncovered one issue with the TSX instruction set. I'm actually still getting validation inconclusive for some workunits, a few more than I've noticed before (perhaps I only noticed it more because I've been watching BRP4 more closely).

The three latest validation inconclusives I've gotten have been from 3 different hosts, each with HD 4600s, and they have what I would consider a high number of invalids for their BRP4 Intel GPU workunits. I know if I noticed that amount of invalid workunits from my computer I would definitely investigate what the problem was. They probably don't even realize there might be an issue. BRP4 is such a reliable program and workunits rarely have issues like this.

The problem is if 2 affected hosts validate each other, they are potentially validating invalid results. If that happens enough it could become problematic for the Einstein project, especially considering Windows 8.1 can upgrade drivers automatically now through Windows Update. I had my Intel driver upgraded once without me even knowing about it, until the computer said it would restart itself in one day.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.