why does BRP4 produces much more errors than S6LV1 ?

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 802589562

RAC: 1237785

RE: So two more failed to

20 May 2012 18:26:10 UTC

Message 109116 in response to message 109115

(moderation:

)

Quote:

So two more failed to validate, both against cuda-pc's.
Normally my mainsys is very reliable, nothing overclocked.
It's intresting, until now all failing wu's came from the newer HD6950, not from the older HD5850. Are there known issues?

Yes, we saw hints of this trend (weaker validation with 69xx cards than with 5xxx) during tests on Albert@Home but needed more data. I'm confident that even the somewhat reduced precision from the 6900s is sufficient to make scientifically valid results, so it might be enough to adjust the validator. Still we need to understand the cause of this trend.

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 521116812

RAC: 247647

RE: Still we need to

21 May 2012 8:04:36 UTC

Message 109117 in response to message 109116

(moderation:

)

Quote:

Still we need to understand the cause of this trend.

http://einsteinathome.org/workunit/123571937
http://einsteinathome.org/workunit/123557679
http://einsteinathome.org/workunit/123477212

29 still waiting for validation; I'll keep you informed.

Alexander

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 521116812

RAC: 247647

RE: I'm confident that

21 May 2012 8:37:03 UTC

Message 109118 in response to message 109116

(moderation:

)

Quote:

I'm confident that even the somewhat reduced precision from the 6900s is sufficient to make scientifically valid results

Back to the slide-rule?
This corrupts my understanding of computation. I've seen comments about fpu's producing different results and problems with gpu-programming. But about precision? OK, single and double, but my understanding was: single precision results should be all equal. Is there anywhere a discussion thread or a deeper explanation about that? Would be very intresting!

Alexander

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 521116812

RAC: 247647

RE: RE: Still we need to

21 May 2012 11:34:34 UTC

Message 109119 in response to message 109117

(moderation:

)

Quote:

Quote:
Still we need to understand the cause of this trend.

http://einsteinathome.org/workunit/123571937
http://einsteinathome.org/workunit/123557679
http://einsteinathome.org/workunit/123477212

29 still waiting for validation; I'll keep you informed.

Alexander

add this: http://einsteinathome.org/workunit/123475517

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 521116812

RAC: 247647

Until now 14 wu's from the

23 May 2012 6:50:08 UTC

Message 109120

(moderation:

)

Until now 14 wu's from the HD6950 are marked as invalid. Some more pending.
I think I'll stop crunching here until I find a message that this issue is solved.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 802589562

RAC: 1237785

Hi all! We are currently

23 May 2012 13:50:30 UTC

Message 109121

(moderation:

)

Hi all!

We are currently investigating this, and we think we have an idea of what's going on, but this needs further tests and some work on a fix.

What we can say confidently now is this:

* all HD6900 series cards should be affected by this
* the validation rate to expect for this type of card in the long run is roughly 50%
* no other ATI/AMD card was yet found to have this behavior
* improvement of validation rate for the HD 6900 will require a new app version and will involve a performance penalty (just how severe will have to be seen)

Stay tuned, we'll let you know when we have news. Those HD 6900 owners who prefer to stop crunching ATI/AMD apps for BRP4 for now can do so in a couple of ways:

* Deselect ATI apps (for all projects!!!) altogether in the global preferences (for a certain "venue"),or
* deselect the BRP4 app altogether in the project specific settings (also affecting any CUDA cards for hosts in that venue),or
* deselect GPU processing on a particular host in the Boinc Manager local settings (also affecting other project other than E@H),or
* if you are familiar with editing the client configuration file cc_config.xml (see

http://boinc.berkeley.edu/wiki/Client_configuration), there is a setting (search for "") which probably does exactly what is convenient for this problem, on a host-per-host basis.

Cheers
HB

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 521116812

RAC: 247647

RE: Hi all! We are

23 May 2012 20:02:54 UTC

Message 109122 in response to message 109121

(moderation:

)

Quote:

Hi all!

We are currently investigating this, and we think we have an idea of what's going on, but this needs further tests and some work on a fix.

If you need a tester contact me.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 802589562

RAC: 1237785

Hi! Thanks! Actually we

24 May 2012 17:53:46 UTC

Message 109123 in response to message 109122

(moderation:

)

Hi!

Thanks! Actually we have just now put a new app version on the test project at albert.phys.uwm.edu (requires BOINC 7.0.27) that should improve the HD 6900 validation rate, although at a performance cost. Our initial tests indicate that the penalty on HD 6900s is well below the ca 50% validation failure rate (on average) with the current version, so we thought it was a good idea to test this quick fix for the HD6900 problem. We will probably come up with something that has a smaller performance penalty, later.

So anyone with a HD6900 who wants to help testing the fix is invited to join the "Albert" test project at the http://albert.phys.uwm.edu

Cheers
HB

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 521116812

RAC: 247647

RE: Actually we have just

24 May 2012 17:59:15 UTC

Message 109124 in response to message 109123

(moderation:

)

Quote:

Actually we have just now put a new app version on the test project at albert.phys.uwm.edu (requires BOINC 7.0.27) that should improve the HD 6900 validation rate, although at a performance cost.

.. crunching @ albert

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 521116812

RAC: 247647

Speed loss: 1:45 @ Einstein :

24 May 2012 20:06:41 UTC

Message 109125

(moderation:

)

Speed loss:
1:45 @ Einstein : 2:09 @ Albert

why does BRP4 produces much more errors than S6LV1 ?

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports