Pascal again available, Turing may be coming soon

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7223814931

RAC: 1000079

We have a a new report of an

24 Oct 2018 21:18:55 UTC

Message 167440

(moderation:

)

We have a a new report of an Einstein contributor who just swapped in a 2080 Ti for a 1080 and is seeing fast failures on high-pay WUs, which seem to my eye to be in the same failure syndrome as I and SYBIE have reported on two different 2080 cards. The card swap appears for reporting after late on 23 October UTC so no new low-pay WUs are currently available to it to test the "high-pay bad, low-pay good" observation.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7223814931

RAC: 1000079

And now there are four Turing

25 Oct 2018 2:29:45 UTC

Message 167446 in response to message 167440

(moderation:

)

And now there are four Turing systems sharing the high-pay WU fast fail syndrome on Einstein GRP work. Two each of 2080 and of 2080 Ti.

bcavnaugh has tried a host with a 2080 Ti on Einstein GRP high-pay WUs and failed five out of five tries with symptoms that seem to me like the other three.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7223814931

RAC: 1000079

And now there are five. The

25 Oct 2018 20:58:25 UTC

Message 167458 in response to message 167446

(moderation:

)

And now there are five.

The SETI user working with the user name Vyper (with supplemental punctuation marks) used my portable test case on his 2080 Ti system, and got the same failure syndrome seen on all high-pay WUs by all Turing users here at Einstein to date.

One piece of very good news about this is that the hoped to be "portable" test case (using Juha's methods) was portable enough to run on his system, and that my instructions sufficed. So real soon now I must gird my loins and attempt to submit a trouble report to Nvidia. Of course (as Vyper pointed out), an open possibility is that the real trouble is in the Einstein application, and that some difference of timing, etc. causes it consistently to manifest this way with Turing cards.

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 317357366

RAC: 368575

Good work ! :-)The

26 Oct 2018 0:22:00 UTC

Message 167460 in response to message 167458

(moderation:

)

Good work ! :-)

The discriminator would be if there are any non-Turing E@H hosts failing in the same way ie. if one could be sure only Turings were involved then one has a smoking gun for an Nvidia problem.

Cheers, Mike.

( edit ) Pardon my ignorance, but are these units written using OpenCL or CUDA ?

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3562358667

RAC: 0

IIRC Open CL. NVidia's

26 Oct 2018 1:09:58 UTC

Message 167462

(moderation:

)

IIRC Open CL. NVidia's implementation is responsible for the 100% core issue, they only make CPU light and efficient work in CUDA.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7223814931

RAC: 1000079

Just to be clear--while Vyper

26 Oct 2018 2:43:00 UTC

Message 167463 in response to message 167460

(moderation:

)

Just to be clear--while Vyper is a SETI user, the test case which I provided him which failed is Einstein code. Turings work at SETI (and most other places). They even work here on a subset of the work types running under that same application.

mmonnin

Joined: 29 May 16

Posts: 291

Credit: 3404736540

RAC: 3166323

SETI also has some CUDA apps

26 Oct 2018 18:47:09 UTC

Message 167474

(moderation:

)

SETI also has some CUDA apps so maybe it could also be a CUDA is ok but OpenCL isn't.

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1590792368

RAC: 772794

AFAIK the best app on newer

26 Oct 2018 19:39:22 UTC

Message 167476 in response to message 167474

(moderation:

)

AFAIK the best app on newer cards on Seti use the SOG openCL app.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117638969504

RAC: 35188920

Mike Hewson wrote:... are

27 Oct 2018 0:21:20 UTC

Message 167477 in response to message 167460

(moderation:

)

Mike Hewson wrote:

... are these units written using OpenCL or CUDA ?

The last apps that were written using CUDA were the BRP4/5/6 radio pulsar apps. Gamma-ray pulsar searches using GPUs have always used OpenCL only.

Bernd did make a attempt (low priority) to port the code to CUDA. From memory, his last report about that (quite a while ago) was to the effect that he had something that sort of worked (maybe) but that it was rather poor/inefficient - so poor, that it wasn't worth the effort of trying to test it - or something along those lines. Whilst he didn't exactly say that he wouldn't keep trying, I got the impression that he was putting it on the very back, back burner :-).

I'd been hoping for better success because my first foray into GPU crunching in the old BRP days was via a GTX550Ti and a much larger group of GTX650 cards. I had tested a single GTX650 against a AMD HD7770 which were similarly priced (GTX650 slightly more expensive so I had wanted the 7770 to 'win' :-) ). The result was that the GTX650 worked 'out of the box' whilst the 7770 needed jumping through hoops to get it to crunch. I was prepared to accept that but in a 'head-to-head' at the time, the 650 had at least 10% better performance, so I ended up with a bunch of 650s.

It's quite ironical that in less than 12 months from making that decision, AMD got their driver act together and for me, that became an out-of-the-box install as well, with a significant performance improvement to boot. In my case, the 10+% deficit became around a 15% 'win' for the 7770.

That 7770 is still crunching today. It's completing the current 'fast' tasks in about 26mins. The last time I tried a 650 it was taking around 2.5 - 3 hours. That was probably more than a year ago when the standard work was equivalent to the current 'slow' work of a week or so ago. Whilst the 'hi-pay' and 'lo-pay' terminology has become quite sexy and fashionable to use :-), it's a bit of a misnomer since all GPU tasks 'pay' the same 3465 credits (5x693 - the standard CPU task offering - since they are 5x the work content of a standard CPU task, supposedly). I'm not at all complaining about the terminology - I'm just wondering when one of these 'hi-pay' tasks is gonna pay me more than 3465 :-).

So, every now and then I'll take a look at that box full of 650s gathering dust on the shelf and wonder if they'll ever get fired up again. Even if a decent CUDA app suddenly appeared, I no longer have any spare PCIe slots to put them in. Apart from 2x750Tis (which perform at very little over half of what the old 7770 does) all the slots they used to occupy, plus all the slots from previously retired CPU only crunchers, are filled with AMD Polaris GPUs. I guess it would be interesting to see how a 750Ti would go on a CUDA app compared to OpenCL.

Cheers,
Gary.

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

Betreger wrote:AFAIK the best

27 Oct 2018 0:32:38 UTC

Message 167478 in response to message 167476

(moderation:

)

Betreger wrote:

AFAIK the best app on newer cards on Seti use the SOG openCL app.

Actually the best app at Seti uses Cuda 9 called the Cuda special . There's a cuda 10 out but it's still buggy. Just to give you an idea of the time difference. Best OpenCl SoG takes 2 min 25 secs to complete. Cuda 9 special takes 82 seconds. All these on 1080Ti OC'ed. But these are apps that have developed over the past 3 years with continued refinements and testing by the users at Seti.

Pascal again available, Turing may be coming soon

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner