All tasks reporting invalid

Ryan
Ryan
Joined: 25 Nov 14
Posts: 36
Credit: 51,795,000
RAC: 506
Topic 198393

As title I havent been able to submit a valid unit for a while it seems

https://einsteinathome.org/host/11700141/tasks&offset=0&show_names=0&state=5&appid=

Anyone able to shed any light?

I run Seti / Miky way and Collatz on this GPU and they all come out fine?

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1,714,254,072
RAC: 281,652

All tasks reporting invalid

Can you share some information about the GPU? Model, maker, any overclocking etc?
What does your CPU run if anything? Free cores etc?

Mumak
Joined: 26 Feb 13
Posts: 325
Credit: 1,826,197,549
RAC: 0

How many work units are you

How many work units are you running in parallel ?
Fiji's (and some other AMD GPUs) seem to have problems when running more than 1 WU at once.

-----

Ryan
Ryan
Joined: 25 Nov 14
Posts: 36
Credit: 51,795,000
RAC: 506

Ok its a Saphhire r9 fury, it

Ok its a Saphhire r9 fury, it has had half its fused off cores reenabled, I thought this may be a issue at first but other projects run fine, Furmark runs overnight fine and never had a crash on it, so elimitating that.

Only running the one unit on it, CBA messing around to get all my projects to run multiple units on the card.

Mumak
Joined: 26 Feb 13
Posts: 325
Credit: 1,826,197,549
RAC: 0

Unlocked shaders might indeed

Unlocked shaders might indeed be the issue. There's (sometimes) a reason why they are disabled and only AMD knows exactly why - whether they really didn't pass all validation tests, or they are just artificially disabled.
The problem might not have manifested in other projects - it might not cause a crash, just a wrong computing result, which might not be visible in tests like Furmark.
If you want to be sure, try to disable those shaders back and try here again.

-----

Ryan
Ryan
Joined: 25 Nov 14
Posts: 36
Credit: 51,795,000
RAC: 506

Im confident its not the

Im confident its not the shaders but ill flip it to the backup bios for a bit to check, is there anything in the details for the units? There is a lot of detail in there and I am not sure what I am looking at?

Ryan
Ryan
Joined: 25 Nov 14
Posts: 36
Credit: 51,795,000
RAC: 506

Can anyone recomend anything

Can anyone recomend anything that will stress the card from a compute point of view to test stability? something that will either crash the machine if unstable or output "failed"?

Christian Beer
Christian Beer
Moderator
Joined: 9 Feb 05
Posts: 595
Credit: 96,904,763
RAC: 0

I just looked at the logfiles

I just looked at the logfiles of the validation. Your host is producing slightly different results than everyone else. And not only by a small margin. Just a little example. We accept a difference of less than 0.00005 (5e-5) but your GPU produces values that differ at least in 0.00010 (10e-5) up to 0.05 in most of the data. So they are rejected by the validator.

Other project may use different instructions on the GPU or they could also do a not so strict validation of the results. I don't think that stress testing the card will help. Depending on how the Stress test is executed the goal may only be to produce stress and not test if the calculations are done correct.

Ryan
Ryan
Joined: 25 Nov 14
Posts: 36
Credit: 51,795,000
RAC: 506

Ok thanks, ill try swapping

Ok thanks, ill try swapping the bios tonight back to the stock number of shaders and see if that helps :)

Does it sound like something that could be caused by a slightly faulty cluster of cores?

archae86
archae86
Joined: 6 Dec 05
Posts: 2,809
Credit: 3,193,794,083
RAC: 2,586,113

RE: Does it sound like

Quote:
Does it sound like something that could be caused by a slightly faulty cluster of cores?


I'll chime in here with some perspectives of someone whose nearly entire career involved the design, testing, manufacture, and reliability issues of microprocessors.

Some home truths:
1. there is no such thing as a complete full-coverage test.
2. there is an incredibly diverse set of possible defects--and amazingly enough every unit shipped contains many locations which if you viewed them carefully you would consider defective--but most of these happen not to harm the correct logical or speed operation at any condition of interest--so shipping those is OK. But the primary containment of these is testing, and as the test is not perfect, defects that matter do ship, regularly.
3. the popular notion that running some "highly stressful" test constitutes a complete test, and that any system that passes that is perfect, so any malfunction must come from something other than the system is nonsense. Such a universal "perfect test" is likely far less comprehensive in coverage than is the manufacturer's final test--and that for certain is far from complete in coverage, and escapees reach the wild at an appreciable rate.

Getting back to the specifics of your case, I think it is clear your system is getting wrong answers, if, for the purpose of this discussion, we define "right" as the answer that would be given by a preponderance of systems with identical installed hardware and software.

This might be simply due to one or more defects in the hardware which the manufacturer marked out of service which you somehow put to use. Or it could be a defect elsewhere. Or...

My 2 cents.

In your shoes the first two things I'd do would be to turn off the revivified shaders, and if that made no difference turn down the speeds on any accessible clocks. (assuming I'd already looked over the fans to assure they were turning as intended and the dust bunnies were well under control).

Good luck

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,196
Credit: 41,687,200,142
RAC: 45,293,876

RE: ... the first two

Quote:
... the first two things I'd do would be to turn off the revivified shaders ....


I think he took that first option soon after he posted and seems to have all good results from that point onward.

The interesting thing is that the good results and the bad results both have about the same elapsed times, on a very rough inspection. In other words, even if there was no downside (bad results) from unlocking the extra shaders, there was no performance upside either.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.