AMD 14.7 RC3 Driver Performance Loss vs 14.4

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,852
Credit: 111,038,927,360
RAC: 34,837,835

You have lots of validate

You have lots of validate errors, both for 2x and 3x.

I would guess that either you have a heat problem or a clocking problem.

EDIT: You could investigate these and see if they spread over both GPUs or are confined to just one of them. The stderr.txt output on the website should tell you which GPU(s) the errors are coming from.

EDIT2: Sorry, scratch the last bit. Stderr.txt doesn't seem to list the particular GPU. I'm sure there used to be a way of seeing that - GPU0 or GPU1. I don't have any dual GPU setups so it's not something I'm familiar with.

Cheers,
Gary.

Titan
Titan
Joined: 29 Aug 13
Posts: 19
Credit: 25,868,802
RAC: 0

Gary, sorry about the

Gary, sorry about the validation errors, I noticed it a few hours into the run testing 13.12, unfortunately it took me a few hours to get home to shut down the program. I noted the task numbers that each of the cards was working on and the problem seemed to be from the second card in my dual configuration, it was failing on most of its tasks.

I pulled both cards and placed the second in the first PCI-e slot and tested it solo, I placed it under load using the MSI Kombuster utility and the temperatures on the card were normal and no errors were encountered with the cards rendering output, so I reverted the drivers back to 14.7 RC3 and removed my systems 2% BLCK overclock and ran the second card solo with x3 tasks. It produced no invalid work units under these conditions, so I reinstalled the other card and let it run also no invalid work units popped up. Temperatures logged during testing were 70C on the top card and 60C on the lower card.

Best I could tell it was a result of either a software issue with the driver installation or the overclocking of the PCIe bus via the BLCK.

Not one to leave well enough alone, as the 14.7 RC3 driver x3 tasks were being produced at a rate of 8800-8900, I tried to re install the 13.12 drivers and give it another go, so far running it during the first 10 hours of 9/6 with the 13.12 drivers haven't produced any invalid work units. So I upped the BLCK to see if the PCIe overclocking would reproduced the issue, I'll let it kick out a few unit works in the next few hours and see how it goes.

Titan
Titan
Joined: 29 Aug 13
Posts: 19
Credit: 25,868,802
RAC: 0

It looks like the GPU 1 card

It looks like the GPU 1 card is not as tolerant of increases to the BLCK while using the 13.12 driver, it produced a few invalid work units during the last few hours while under a 2% increase. It is strange that the driver has this effect of lowering the systems tolerance to overclocking. However 13.12 produces faster task completion times than the 14.4 and 14.7 RC3, even without the PCI e overclocking on the order of about 8-9%.

mountkidd
mountkidd
Joined: 14 Jun 12
Posts: 175
Credit: 11,167,953,251
RAC: 4,373,724

Hi Gary, RE: I would

Hi Gary,

Quote:
I would guess that either you have a heat problem or a clocking problem.


Could you explain 'clocking problem' wrt Validation Errors?

Gord

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,852
Credit: 111,038,927,360
RAC: 34,837,835

A validate error implies the

A validate error implies the returned result was so broken that the validator could recognise it as such immediately without bothering to compare it to its companion. I consider it a bit more serious than an eventually 'invalid' result which the validator thought was 'good enough' initially to be put to the comparison.

My gut feeling is that really broken hardware tends to cause hosts and/or apps to crash, whilst otherwise good hardware operating just beyond its limits doesn't necessarily cause an immediate crash. I tend to see rubbish answers, with the degree of 'rubbishness' probably being related to how far beyond the limits we are. This can be followed by a crash or lock-up. So if a system suddenly crashes or locks up for no apparent reason, I reboot (if possible) and run hardware checks to look for hardware problems. If the hardware checks out OK, I will then check for environmental issues.

If I see a system apparently running OK but giving validate errors or invalids, I usually suspect the environment first. I check heat sinks and fans and often, this gives an immediate fix. If the problem continues, I will replace the thermal grease and make sure the fan is completely free-running. I re-lubricate fans that show any apparent 'stiffness' in rotation. This fixes even more of these issues. If the issue continues, I consider clock speed, particularly if anything has been overclocked. I have a batch of long running (5 years+) Q8400 quad core hosts (2.66GHz stock) clocked to around 3.2+GHz. For several years, they were quite stable. In recent times, I've had to cut back some of them a bit to around 3.1GHz to avoid problems. Possibly the ambient is a bit warmer than earlier on, but I'm guessing the CPUs probably have 'deteriorated' a bit with age. The PSUs are of the same vintage so perhaps the power isn't as clean and stable as it should be. Whatever the true cause, reducing the clock speed a little has certainly worked.

At the time I wrote the response to the OP, I was thinking more of core and mem clocks on the GPU. Nothing had been said about these but with all the GPU clocking tools around, I'm sure people can easily push things too far. Whilst I love overclocking when appropriate, I don't run the PCIe bus at anything above 100. With the subsequent information posted, it looks like this particular clock speed might be causing the problem the OP is having.

Cheers,
Gary.

mountkidd
mountkidd
Joined: 14 Jun 12
Posts: 175
Credit: 11,167,953,251
RAC: 4,373,724

Hi Gary, Thanks for the

Hi Gary,

Thanks for the great response - this is the most info I have seen to date on Validation Errors.

I am sorting through a collection of Invalids as my 818 host may be the current 'King of the Invalid Hill'. Not all wu generate errors and the errors are scattered throughout each day. Resolving the issues is complicated by the delays in reported results as the effects of a tweak today may not show for a week or so.

In your last response to Titan you mentioned stderr.txt. Is there anything specific to look for? Stderr.txt for Valids appears to be the same as for Invalids and wu's from both groups do go through to normal completion.

Gord

Khangollo
Khangollo
Joined: 17 Feb 11
Posts: 42
Credit: 928,047,659
RAC: 0

Catalyst 14.9 for Linux is

Catalyst 14.9 for Linux is available for over a week now.
Anyone (brave enough) tried it yet?

MaU38.gif

mikey
mikey
Joined: 22 Jan 05
Posts: 12,068
Credit: 1,834,324,605
RAC: 17,298

RE: Catalyst 14.9 for Linux

Quote:
Catalyst 14.9 for Linux is available for over a week now.
Anyone (brave enough) tried it yet?

344.16 for Windows was to fix a 970/980 problem, I wonder if this is the same thing? By fix a 970/980 problem I mean the report was it was NOT suitable for other gpu's, JUST the 970/980 gpu's.

Here's the post at GpuGrid where I saw it "Looks like 344.16 is a special release for GTX 980/970 only. This will be a problem for people trying to mix new and older cards in the same system, until NVidia can re-unify their driver packages."

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,142
Credit: 2,793,954,890
RAC: 699,654

RE: RE: Catalyst 14.9 for

Quote:
Quote:
Catalyst 14.9 for Linux is available for over a week now.
Anyone (brave enough) tried it yet?

344.16 for Windows was to fix a 970/980 problem, I wonder if this is the same thing? By fix a 970/980 problem I mean the report was it was NOT suitable for other gpu's, JUST the 970/980 gpu's.

Here's the post at GpuGrid where I saw it "Looks like 344.16 is a special release for GTX 980/970 only. This will be a problem for people trying to mix new and older cards in the same system, until NVidia can re-unify their driver packages."


Catalyst (AMD/ATI) drivers are rushed out for a different reason - 14.9 had to be out by the end of September (month 9) 2014. Nobody's mentioned any actual changes to support new applications or hardware, so far as I'm aware. Pure calendar pressure, so we can see that they're still alive, as far as I can tell.

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740,030,628
RAC: 2

RE: Catalyst 14.9 for Linux

Quote:

Catalyst 14.9 for Linux is available for over a week now.
Anyone (brave enough) tried it yet?

I built driver 14.9 for kernel 3.12.28 yesterday. This new driver is approximately 4.2% slower than the previous 14.6 Beta driver I had installed on this system.

4x BRP5 tasks on an AMD 7970

14.6 Beta: 7765.6 s Avg (148331 RAC)
14.9: 8109.7 s Avg (142037 RAC)

Jeroen

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.