Pascal again available, Turing may be coming soon

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023244931
RAC: 1830275

We have a a new report of an

We have a a new report of an Einstein contributor who just swapped in a 2080 Ti for a 1080 and is seeing fast failures on high-pay WUs, which seem to my eye to be in the same failure syndrome as I and SYBIE have reported on two different 2080 cards.  The card swap appears for reporting after late on 23 October UTC so no new low-pay WUs are currently available to it to test the "high-pay bad, low-pay good" observation.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023244931
RAC: 1830275

And now there are four Turing

And now there are four Turing systems sharing the high-pay WU fast fail syndrome on Einstein GRP work.  Two each of 2080 and of 2080 Ti. 

bcavnaugh has tried a host with a 2080 Ti on Einstein GRP high-pay WUs and failed five out of five tries with symptoms that seem to me like the other three.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023244931
RAC: 1830275

And now there are five. The

And now there are five.

The SETI user working with the user name Vyper (with supplemental punctuation marks) used my portable test case on his 2080 Ti system, and got the same failure syndrome seen on all high-pay WUs by all Turing users here at Einstein to date.

One piece of very good news about this is that the hoped to be "portable" test case (using Juha's methods) was portable enough to run on his system, and that my instructions sufficed.  So real soon now I must gird my loins and attempt to submit a trouble report to Nvidia.  Of course (as Vyper pointed out), an open possibility is that the real trouble is in the Einstein application, and that some difference of timing, etc. causes it consistently to manifest this way with Turing cards.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6534
Credit: 284700169
RAC: 113726

Good work ! :-)The

Good work ! :-)

The discriminator would be if there are any non-Turing E@H hosts failing in the same way ie. if one could be sure only Turings were involved then one has a smoking gun for an Nvidia problem. 

Cheers, Mike.

( edit ) Pardon my ignorance, but are these units written using OpenCL or CUDA ?

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 1580

IIRC Open CL.  NVidia's

IIRC Open CL.  NVidia's implementation is responsible for the 100% core issue, they only make CPU light and efficient work in CUDA.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023244931
RAC: 1830275

Just to be clear--while Vyper

Just to be clear--while Vyper is a SETI user, the test case which I provided him which failed is Einstein code.  Turings work at SETI (and most other places).  They even work here on a subset of the work types running under that same application.

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3229263984
RAC: 1137427

SETI also has some CUDA apps

SETI also has some CUDA apps so maybe it could also be a CUDA is ok but OpenCL isn't.

Betreger
Betreger
Joined: 25 Feb 05
Posts: 987
Credit: 1421527908
RAC: 810826

AFAIK the best app on newer

AFAIK the best app on newer cards on Seti use the SOG openCL app.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109383756145
RAC: 35948744

Mike Hewson wrote:... are

Mike Hewson wrote:
... are these units written using OpenCL or CUDA ?

The last apps that were written using CUDA were the BRP4/5/6 radio pulsar apps.  Gamma-ray pulsar searches using GPUs have always used OpenCL only.

Bernd did make a attempt (low priority) to port the code to CUDA.  From memory, his last report about that (quite a while ago) was to the effect that he had something that sort of worked (maybe) but that it was rather poor/inefficient - so poor, that it wasn't worth the effort of trying to test it - or something along those lines.  Whilst he didn't exactly say that he wouldn't keep trying, I got the impression that he was putting it on the very back, back burner :-).

I'd been hoping for better success because my first foray into GPU crunching in the old BRP days was via a GTX550Ti and a much larger group of GTX650 cards.  I had tested a single GTX650 against a AMD HD7770 which were similarly priced (GTX650 slightly more expensive so I had wanted the 7770 to 'win' :-) ).  The result was that the GTX650 worked 'out of the box' whilst the 7770 needed jumping through hoops to get it to crunch.  I was prepared to accept that but in a 'head-to-head' at the time, the 650 had at least 10% better performance, so I ended up with a bunch of 650s.

It's quite ironical that in less than 12 months from making that decision, AMD got their driver act together and for me, that became an out-of-the-box install as well, with a significant performance improvement to boot.  In my case, the 10+% deficit became around a 15% 'win' for the 7770.

That 7770 is still crunching today.  It's completing the current 'fast' tasks in about 26mins.  The last time I tried a 650 it was taking around 2.5 - 3 hours.  That was probably more than a year ago when the standard work was equivalent to the current  'slow' work of a week or so ago.  Whilst the 'hi-pay' and 'lo-pay' terminology has become quite sexy and fashionable to use :-), it's a bit of a misnomer since all GPU tasks 'pay' the same 3465 credits (5x693 - the standard CPU task offering - since they are 5x the work content of a standard CPU task, supposedly).  I'm not at all complaining about the terminology - I'm just wondering when one of these 'hi-pay' tasks is gonna pay me more than 3465 :-).

So, every now and then I'll take a look at that box full of 650s gathering dust on the shelf and wonder if they'll ever get fired up again.  Even if a decent CUDA app suddenly appeared, I no longer have any spare PCIe slots to put them in.  Apart from 2x750Tis (which perform at very little over half of what the old 7770 does) all the slots they used to occupy, plus all the slots from previously retired CPU only crunchers, are filled with AMD Polaris GPUs.  I guess it would be interesting to see how a 750Ti would go on a CUDA app compared to OpenCL.

 

Cheers,
Gary.

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

Betreger wrote:AFAIK the best

Betreger wrote:
AFAIK the best app on newer cards on Seti use the SOG openCL app.

 

Actually the best app at Seti uses Cuda 9 called the Cuda special .  There's a cuda 10 out but it's still buggy.  Just to give you an idea of the time difference. Best OpenCl SoG takes 2 min 25 secs to complete. Cuda 9 special takes 82 seconds.  All these on 1080Ti OC'ed.   But these are apps that have developed over the past 3 years with continued refinements and testing by the users at Seti.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.