As Gary posted a few hours ago, work issue just switched from lowpay units of the 1039L group to highpay 104X units. I just now ran a 104X, which failed in the expected way on my Turing.
Possibly this may surface additional Turing users.
I presume 'test escapes' refers to quality control failure.
To belabor the obvious for a moment, a manufacturing process to produce gigantic logic chips makes lots of defective chips. It is the job of testing to detect and discard the ones that are different from the "good" ones. No test is perfect in accomplishing this goal, so there are always test escapes. Nvidia seems to be saying they had rather more of these than usual.
But the difficulty Turing cards have in running successfully Einstein high-pay work units under Windows appears to affect all cards uniformly (I think we have about eight reported so far, on all three shipping models of card), and is thus not plausibly a test escape issue.
Yep, I am going to give it a try as all posts of incompatibility with both projects have been with Windows hosts.
Hi Keith,
I'm sure we'd all love to know if a task based on this latest data file will fail or not on your new card running under Linux. Are you still able to give it a try, thanks?
I just downloaded a hundred of those 104X units. So when it comes their turn to run on the 2080 I will find out if they run correctly on my card and under Linux.
I just suspended all my other projects to force the Einstein 104X units to run. Don't remember the failure mode right now without going back through the thread.
One is currently running on the 2080 is showing very poor progress times compared to a 104X task running on the 1080Ti right now. Around 3 minutes in so far and showing 11 minutes to completion.
The 1080Ti task finished in 5 minutes and has already started on another.
The 1080Ti did the 105X task in 300 seconds. The 1st task started on the 2080 is not running at all well. Currently showing 31% completion and currently 20 minutes elapsed time. Cpu support is at 99.9%. The card does not seem run these new fast tasks at all well.
The 1st task started on the 2080 is not running at all well. Currently showing 31% completion and currently 20 minutes elapsed time. Cpu support is at 99.9%. The card does not seem run these new fast tasks at all well.
While not satisfactory, that is actually very, very different compared to this type of task running under Windows on any of the Turing cards (2080Ti, 2080, and 2070) reported here. Those all failed in under a half a minute of run time, and only kept the GPU active for well under 5 seconds.
So that your Linux case even kept running for 20 minutes of clock time is massively different. As most likely your hardware is like ours, and most likely your Work Unit is like ours, that leaves the Linux driver you are using and the Linux Einstein application you are running as probable points of difference.
Sounds to me like it is malfunctioning, but with a different set of symptoms. I hope you let at least one run to completion, as the information may help in addressing this situation.
The 1st task started on the 2080 is not running at all well. Currently showing 31% completion and currently 20 minutes elapsed time. Cpu support is at 99.9%. The card does not seem run these new fast tasks at all well.
While not satisfactory, that is actually very, very different compared to this type of task running under Windows on any of the Turing cards (2080Ti, 2080, and 2070) reported here. Those all failed in under a half a minute of run time, and only kept the GPU active for well under 5 seconds.
So that your Linux case even kept running for 20 minutes of clock time is massively different. As most likely your hardware is like ours, and most likely your Work Unit is like ours, that leaves the Linux driver you are using and the Linux Einstein application you are running as probable points of difference.
Sounds to me like it is malfunctioning, but with a different set of symptoms. I hope you let at least one run to completion, as the information may help in addressing this situation.
Thanks for the report.
Yes, though I hate wasting the power on that task and card but I will let it run to completion to see if it is finally finished and validated. Currently at 61% completion at the 50 minute mark with 10 seconds remaining. That part is obviously wrong.
As Gary posted a few hours
)
As Gary posted a few hours ago, work issue just switched from lowpay units of the 1039L group to highpay 104X units. I just now ran a 104X, which failed in the expected way on my Turing.
Possibly this may surface additional Turing users.
archae86 wrote:Possibly this
)
Yep, sure has! :-).
Cheers,
Gary.
In case this is not known I
)
In case this is not generally known, I will quote an admission by NVidia :
"Limited test escapes from early boards caused the issues some customers have experienced with RTX 2080 Ti Founders Edition.
We stand ready to help any customers who are experiencing problems.
Please visit www.nvidia.com/support to chat live with the NVIDIA tech support team (or to send us an email) and we’ll take care of it."
I presume 'test escapes' refers to quality control failure.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Mike Hewson wrote:In case
)
If you look at any gaming site, those are cards dying completely after a few hours of use.
Mike Hewson wrote:I presume
)
To belabor the obvious for a moment, a manufacturing process to produce gigantic logic chips makes lots of defective chips. It is the job of testing to detect and discard the ones that are different from the "good" ones. No test is perfect in accomplishing this goal, so there are always test escapes. Nvidia seems to be saying they had rather more of these than usual.
But the difficulty Turing cards have in running successfully Einstein high-pay work units under Windows appears to affect all cards uniformly (I think we have about eight reported so far, on all three shipping models of card), and is thus not plausibly a test escape issue.
Gary Roberts wrote:Keith
)
I just downloaded a hundred of those 104X units. So when it comes their turn to run on the 2080 I will find out if they run correctly on my card and under Linux.
I just suspended all my other
)
I just suspended all my other projects to force the Einstein 104X units to run. Don't remember the failure mode right now without going back through the thread.
One is currently running on the 2080 is showing very poor progress times compared to a 104X task running on the 1080Ti right now. Around 3 minutes in so far and showing 11 minutes to completion.
The 1080Ti task finished in 5 minutes and has already started on another.
The 1080Ti did the 105X task in 300 seconds. The 1st task started on the 2080 is not running at all well. Currently showing 31% completion and currently 20 minutes elapsed time. Cpu support is at 99.9%. The card does not seem run these new fast tasks at all well.
Keith Myers wrote:The 1st
)
While not satisfactory, that is actually very, very different compared to this type of task running under Windows on any of the Turing cards (2080Ti, 2080, and 2070) reported here. Those all failed in under a half a minute of run time, and only kept the GPU active for well under 5 seconds.
So that your Linux case even kept running for 20 minutes of clock time is massively different. As most likely your hardware is like ours, and most likely your Work Unit is like ours, that leaves the Linux driver you are using and the Linux Einstein application you are running as probable points of difference.
Sounds to me like it is malfunctioning, but with a different set of symptoms. I hope you let at least one run to completion, as the information may help in addressing this situation.
Thanks for the report.
You might as well wait for
)
You might as well wait for the AMD Navis too.
https://wccftech.com/amd-rx-3080-3070-3060-navi-gpu-specs-prices-leaked-rtx-2070-gtx-1070-gtx-1060-challengers-at-249-199-129/
archae86 wrote:Keith
)
Yes, though I hate wasting the power on that task and card but I will let it run to completion to see if it is finally finished and validated. Currently at 61% completion at the 50 minute mark with 10 seconds remaining. That part is obviously wrong.