Pascal again available, Turing may be coming soon

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6537
Credit: 286446952
RAC: 96712

th3tricky wrote:So would

th3tricky wrote:
So would rolling back one driver version have any effect? 

Worth a try. It may answer a question about driver dependency of the error.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

I still think it's the app. 

I still think it's the app.  At Seti we have a coder who bought a Turing Card and tweaked the code along with adapting it to cuda 9 and last version was for cuda 10 only under Linux.  Then a second coder tweaked it more to make it backward compatible to older cards.  So I believe that is why they are having success where others are failing.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6537
Credit: 286446952
RAC: 96712

That may also be true as

That may also be true as NVidia implements OpenCL using CUDA .... in any case there may be several problems co-existing. The surprise would be if NVidia coded their CUDA wrong, LOL !

Cheers, Mike

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2140
Credit: 2769773735
RAC: 938942

Mike, remember that this

Mike, remember that this thread started with Task failures - tasks from one set of data files ran successfully on Turing cards, tasks from a different set of data files crashed (under Windows), or span wheels without making progress (under Linux).

That doesn't feel (to me) like a gross driver compatibility problem. It's more subtle than that. The nearest equivalent I can think of is the launch of NVidia Fermi cards at SETI in 2010. Ultimate bottom line: the developers of the previous application (NVidia themselves!) had used a simplified assumption as an optimisation, but the specification was tightened up for Fermi and the assumption was no longer valid. Again, some task types succeeded, but most failed. I'll see if I can dig up a reference.

I really think somebody has to look at the application and the OpenCL components together, and see why the app triggers the crash under defined circumstances - at least we have the error messages for Windows.

Edit - found it.

http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/Fermi_Compatibility_Guide.pdf

Search for the keyword 'volatile'.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6537
Credit: 286446952
RAC: 96712

Thanks for the comments

Thanks for the comments Richard. The Fermi compiler optimisation that caused data incoherency amongst threads within a warp is very interesting case indeed. Threads of differing indices could not read logically correct/intended intermediate values from global memory - because the compiler kept them away from it ! One has to add the 'volatile' keyword to force write-backs and re-reads. Subtle indeed as this 'optimisation' would not apply for thread numbers more than a warp's worth ie. 32. That, in turn, depends upon the problem setup - the kernel environment as it were.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7055954931
RAC: 1607321

There is another new report

There is another new report of a Turing user seeing the fast-fail problem.  I took the opportunity to compile a list of hosts/users reporting problems running Einstein GRP on Turing cards.  I spotted eleven from forum posts and may have missed some (added two more reported by Yeti after initial post).  One of these did not report to these forums and was accidentally discovered, and there quite likely are others.

Account ID          Host     RTX     OS          Latest Driver
139940  Archae86    12260865 2080    Windows 10  417.17
130556  bcavnaugh   12707326 2080 Ti Windows 10  417.01
24633   CElliott    12591228 2070    Windows 8.1 416.34
87612   Keith Myers 12291110 2080    Linux       410.78
143397  Penguin     12614077 2080    Windows 10  417.22
Unknown Anonymous   12711628 2080 Ti Windows 10  417.22
31398   th3tricky   12735904 2070    Windows 10  417.01
215546  Ouiche      12735193 2080    Windows 10  417.22
232598  Sybie       12578265 2080    Windows 10  417.22
30420   gandolph1   11869044 2080 Ti Windows 10  416.94
77248   Dougga      12747881 2080    Windows 10  417.01
9428    Yeti        12662252 2080    Windows 10  416.34
2690    csbyseti    712484   2070    Windows 10  416.34
bluestang
bluestang
Joined: 13 Apr 15
Posts: 34
Credit: 2492970228
RAC: 2903

Has anyone tried a Windows 7

Has anyone tried a Windows 7 and RTX combo?  Just a thought.

Yeti
Yeti
Joined: 17 Nov 04
Posts: 59
Credit: 1273245412
RAC: 1724319

You overlooked my machine:

You overlooked my machine: RTX 2080 https://einsteinathome.org/de/host/12662252

csbyseti: RTX 2070: https://einsteinathome.org/de/host/712484

Supporting BOINC, a great concept !

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6537
Credit: 286446952
RAC: 96712

Well, so much for any driver

Well, so much for any driver dependency .... :-(

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

csbyseti
csbyseti
Joined: 18 May 06
Posts: 8
Credit: 654478900
RAC: 6994

Yes, my computer with RTX2070

Yes, my computer with RTX2070 failed also.

Worked for some weeks without problems, then starts to fail at WU-Startup.

All other computers (GTX 1080  - GTX970 and Ati RX570 and Vega56) worked without problems.

So it's a problem with the new sort of WU's and turing card.

 

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.