Pascal again available, Turing may be coming soon

archae86
archae86
Joined: 6 Dec 05
Posts: 3,064
Credit: 5,778,334,057
RAC: 3,916,365

As Gary posted a few hours

As Gary posted a few hours ago, work issue just switched from lowpay units of the 1039L group to highpay 104X units.  I just now ran a 104X, which failed in the expected way on my Turing.

Possibly this may surface additional Turing users.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,541
Credit: 76,489,393,202
RAC: 65,105,752

archae86 wrote:Possibly this

archae86 wrote:
Possibly this may surface additional Turing users.

Yep, sure has! :-).

 

Cheers,
Gary.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,281
Credit: 150,061,754
RAC: 128,289

In case this is not known I

In case this is not generally known, I will quote an admission by NVidia :

"Limited test escapes from early boards caused the issues some customers have experienced with RTX 2080 Ti Founders Edition.

We stand ready to help any customers who are experiencing problems.

Please visit www.nvidia.com/support to chat live with the NVIDIA tech support team (or to send us an email) and we’ll take care of it."

I presume 'test escapes' refers to quality control failure.

Cheers, Mike.

 

 

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1,358
Credit: 2,808,475,201
RAC: 2,666,070

Mike Hewson wrote:In case

Mike Hewson wrote:

In case this is not generally known, I will quote an admission by NVidia :

"Limited test escapes from early boards caused the issues some customers have experienced with RTX 2080 Ti Founders Edition.

We stand ready to help any customers who are experiencing problems.

Please visit www.nvidia.com/support to chat live with the NVIDIA tech support team (or to send us an email) and we’ll take care of it."

I presume 'test escapes' refers to quality control failure.

Cheers, Mike.

 

 

 

If you look at any gaming site, those are cards dying completely after a few hours of use.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,064
Credit: 5,778,334,057
RAC: 3,916,365

Mike Hewson wrote:I presume

Mike Hewson wrote:
I presume 'test escapes' refers to quality control failure.

To belabor the obvious for a moment, a manufacturing process to produce gigantic logic chips makes lots of defective chips. It is the job of testing to detect and discard the ones that are different from the "good" ones. No test is perfect in accomplishing this goal, so there are always test escapes. Nvidia seems to be saying they had rather more of these than usual.

But the difficulty Turing cards have in running successfully Einstein high-pay work units under Windows appears to affect all cards uniformly (I think we have about eight reported so far, on all three shipping models of card), and is thus not plausibly a test escape issue.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 2,117
Credit: 5,304,670,416
RAC: 19,194,194

Gary Roberts wrote:Keith

Gary Roberts wrote:
Keith Myers wrote:
Yep, I am going to give it a try as all posts of incompatibility with both projects have been with Windows hosts.

Hi Keith,
I'm sure we'd all love to know if a task based on this latest data file will fail or not on your new card running under Linux. Are you still able to give it a try, thanks?

I just downloaded a hundred of those 104X units.  So when it comes their turn to run on the 2080 I will find out if they run correctly on my card and under Linux.

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 2,117
Credit: 5,304,670,416
RAC: 19,194,194

I just suspended all my other

I just suspended all my other projects to force the Einstein 104X units to run.  Don't remember the failure mode right now without going back through the thread.

One is currently running on the 2080 is showing very poor progress times compared to a 104X task running on the 1080Ti right now.  Around 3 minutes in so far and showing 11 minutes to completion.

The 1080Ti task finished in 5 minutes and has already started on another.

The 1080Ti did the 105X task in 300 seconds.  The 1st task started on the 2080 is not running at all well.  Currently showing 31% completion and currently 20 minutes elapsed time. Cpu support is at 99.9%.  The card does not seem run these new fast tasks at all well.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3,064
Credit: 5,778,334,057
RAC: 3,916,365

Keith Myers wrote:The 1st

Keith Myers wrote:
The 1st task started on the 2080 is not running at all well.  Currently showing 31% completion and currently 20 minutes elapsed time. Cpu support is at 99.9%.  The card does not seem run these new fast tasks at all well.

While not satisfactory, that is actually very, very different compared to this type of task running under Windows on any of the Turing cards (2080Ti, 2080, and 2070) reported here.  Those all failed in under a half a minute of run time, and only kept the GPU active for well under 5 seconds.  

So that your Linux case even kept running for 20 minutes of clock time is massively different.  As most likely your hardware is like ours, and most likely your Work Unit is like ours, that leaves the Linux driver you are using and the Linux Einstein application you are running as probable points of difference. 

Sounds to me like it is malfunctioning, but with a different set of symptoms.  I hope you let at least one run to completion, as the information may help in addressing this situation.

Thanks for the report.

 

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 453
Credit: 233,070,989
RAC: 121
Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 2,117
Credit: 5,304,670,416
RAC: 19,194,194

    archae86 wrote:Keith

 

 

archae86 wrote:
Keith Myers wrote:
The 1st task started on the 2080 is not running at all well.  Currently showing 31% completion and currently 20 minutes elapsed time. Cpu support is at 99.9%.  The card does not seem run these new fast tasks at all well.

While not satisfactory, that is actually very, very different compared to this type of task running under Windows on any of the Turing cards (2080Ti, 2080, and 2070) reported here.  Those all failed in under a half a minute of run time, and only kept the GPU active for well under 5 seconds.  

So that your Linux case even kept running for 20 minutes of clock time is massively different.  As most likely your hardware is like ours, and most likely your Work Unit is like ours, that leaves the Linux driver you are using and the Linux Einstein application you are running as probable points of difference. 

Sounds to me like it is malfunctioning, but with a different set of symptoms.  I hope you let at least one run to completion, as the information may help in addressing this situation.

Thanks for the report.

 

Yes, though I hate wasting the power on that task and card but I will let it run to completion to see if it is finally finished and validated.  Currently at 61% completion at the 50 minute mark with 10 seconds remaining.  That part is obviously wrong.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.