Pascal again available, Turing may be coming soon

Betreger
Betreger
Joined: 25 Feb 05
Posts: 987
Credit: 1435928571
RAC: 538446

Slightly off topic but here

Slightly off topic but here is some data about Turing over at Seti.

https://setiathome.berkeley.edu/forum_thread.php?id=81962&postid=1970134
And   here is the whole thread

https://setiathome.berkeley.edu/forum_thread.php?id=81962

Gandolph1
Gandolph1
Joined: 20 Feb 05
Posts: 180
Credit: 389404264
RAC: 8191

i've had to suspend my

i've had to suspend my Einstein@home project due to driver failures anytime the GPU is working on this.  SETI continues to run fine, but I guess I will have to wait for either Nvidia to update their driver to support this project, or someone here will have to update the project to properly support the RTX2080TI FE.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110025522834
RAC: 22504014

Unfortunately, there's no

Unfortunately, there's no indication of a speedy resolution to the problem for Turing GPUs.

While you wait, it would be appreciated if you could abort and return the outstanding problem tasks so that they can be recycled immediately rather than having to wait for a timeout, thanks.

 

Cheers,
Gary.

Gandolph1
Gandolph1
Joined: 20 Feb 05
Posts: 180
Credit: 389404264
RAC: 8191

Just aborted them...

Just aborted them...

 

Gandolph1
Gandolph1
Joined: 20 Feb 05
Posts: 180
Credit: 389404264
RAC: 8191

FYI - I have escalated this

FYI - I have escalated this problem with Nvidia directly;

 


Hello,

Your case is being escalated to our Level 2 Tech Support group.  The Level 2 agents will review the case notes and may attempt to recreate your issue or find a workaround solution if possible.  As this process may take some time we ask that you be patient and a Level 2 tech will contact you as soon they can to help resolve your issue.

Best Regards,

NVIDIA Customer Care

 


Customer By CSS Web (xxxx xxxxx) (12/16/2018 07:53 AM)


I have attached the application dump files the system generates when trying to run the Einstein@home task. It generated 9 of these in the very short space of time I allowed it to run. If you need the other 5 let me know.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7059824931
RAC: 1136022

gandolph1 wrote:...The Level

gandolph1 wrote:
...The Level 2 agents will review the case notes and may attempt to recreate your issue or find a workaround solution if possible.  As this process may take some time we ask that you be patient and a Level 2 tech will contact you as soon they can to help resolve your issue.

As mentioned, I have created a portable test case--a zip file with instructions and all the materials needed to demonstrate the problem on a Windows machine with a Turing (and to not have the problem on the same machine with, e.g., a Pascal card).

The file may be found and downloaded at:

pastoll.info/BOINC

It may also be useful to advise them that this issue has been reported to Nvidia by the feedback mechanism, and was assigned bug number 2434391 by them by November 1, 2018.

 

 

Gandolph1
Gandolph1
Joined: 20 Feb 05
Posts: 180
Credit: 389404264
RAC: 8191

archae86 wrote:gandolph1

archae86 wrote:
gandolph1 wrote:
...The Level 2 agents will review the case notes and may attempt to recreate your issue or find a workaround solution if possible.  As this process may take some time we ask that you be patient and a Level 2 tech will contact you as soon they can to help resolve your issue.

As mentioned, I have created a portable test case--a zip file with instructions and all the materials needed to demonstrate the problem on a Windows machine with a Turing (and to not have the problem on the same machine with, e.g., a Pascal card).

The file may be found and downloaded at:

pastoll.info/BOINC

It may also be useful to advise them that this issue has been reported to Nvidia by the feedback mechanism, and was assigned bug number 2434391 by them by November 1, 2018.

 

 

Peter,

IF I am lucky enough to hear from the level 2 tech I will make them aware of your information.  I am using the fact that Nvidia has had a LOT of 2080TI failures (My first card included) to question the integrity of the GPU in this particular case.  If they cannot identify a software related issue then I will be pushing them for another replacement GPU. 

I have already proven to them that my 1080 GPU had none of these problems, therefore the problem is either related to the 2080TI GPU or its related driver/software.   If I can get them to at least acknowledge the issue exists that will make me a little more comfortable with my 1200.00 dollar purchase. 

To be honest, with my second card this is the only remaining issue I have found.  But like you I have been running SETI and Einstein for a LONG time and I consider this a significant issue.

 

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3232287015
RAC: 122892

Is E@H looking at the data

Is E@H looking at the data file and saying what's causing the error? I'd figure as end users we would more likely get an answer that way before NV responds. Is it a precision thing or something? Using float calculation on an int or something. I could easily see NV saying that it works on one data set but not another so it must be the data set. Even if it works on all other cards. Unless it's clearly determined to be x command on a given variable I could see NV completely blowing this off.

Gandolph1
Gandolph1
Joined: 20 Feb 05
Posts: 180
Credit: 389404264
RAC: 8191

mmonnin wrote:Is E@H looking

mmonnin wrote:
Is E@H looking at the data file and saying what's causing the error? I'd figure as end users we would more likely get an answer that way before NV responds. Is it a precision thing or something? Using float calculation on an int or something. I could easily see NV saying that it works on one data set but not another so it must be the data set. Even if it works on all other cards. Unless it's clearly determined to be x command on a given variable I could see NV completely blowing this off.

 

The reason NV shouldn't blow this off is because it is causing the video driver to crash as well.  Even if there is a problem with the data set the video should be able to handle the exception without crashing.

Most people don't realize that when their screen freezes with these errors it's because windows is having to restart the video driver.  This can easily be seen in the Windows event viewer. I have also had the bad experience of the video driver crashing and then taking windows with it  causing the whole system to restart.  On one of the crashes this caused Windows corruption (due to write behind caching used on my SSD) and required me to do a complete restore from backup.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7059824931
RAC: 1136022

With a new report today from

With a new report today from user rjs5, I have fourteen Turing hosts on my list of those which appear to suffer the common early termination failure on either of two groups of Einstein Gamma-Ray Pulsar tasks.  Probably there are others which have not come to my attention, as nearly all of these volunteered (or complained of) their trouble on these forums, and I suspect some users are not inclined to post.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.