As the title says, starting today, I began getting computation errors just as the tasks are beginning on my RTX 2080 GPU. Primegrid, Seti, working fine. This project was working fine until I restarted it later today. It had been working the past week on both a 1070 and a 2080. Today it will only work on the 1070, not the 2080.
I saw similar happen with asteroids@home and gpugrid when I first got the 2080 so I set BOINC not to use the 2080 on those projects. As of today I have had to set Einstein@home to do the same. But, before today, never a problem. Things ran smoothly without error.
So, what changed today?
Copyright © 2024 Einstein@Home. All rights reserved.
Type of GPU data set changed.
)
Type of data set for those GPU tasks changed. That happens here now and then, it's normal. But RTX 2080's haven't been able to run these type of tasks successfully so far, only the other type.
Here's plenty of messages pointing out that problem:
https://einsteinathome.org/content/pascal-again-available-turing-may-be-coming-soon
https://einsteinathome.org/content/latest-data-file-fgrpb1g-gpu-tasks
Richie wrote:Type of data set
)
Yes I see in the data file thread this:
Unfortunately it's this new slew of tasks, but fortunately it's a known issue and is limited to the new data set and it's only with these new tasks. There are similar issues on a few other sites regarding this with the Turing cards. It has to do with the way the programs are compiled on those projects, not sure what it could be here.
Penguin wrote:There are
)
Do these projects perchance use OpenCL, to your knowledge ?
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Keith was testing his new
)
Keith was testing his new card on both Seti, GPUGrid, and Einstein under Linux to see if that made a difference. Both Seti and GPUGrid use Cuda. Seti works, GPUGrid errored out, he hasn't said anything yet about Einstein under Linux.
I bumped into an anonymous
)
I bumped into an anonymous host with a RTX 2080Ti. Seems like the work cache size might have been increased quite a bit recently. I noticed a whole lot of new GPU tasks a couple of days ago, at much the same time and all for the previous data file. There are no tasks for the new file yet. I hope the owner has a look at the boards and notices the potential problem before it hits.
The CPU is an AMD threadripper 2950X so 32 threads. There are also *lots* of CPU tasks so maybe the client has already gone into panic mode. That might explain why there are no recent GPU tasks. The average turnaround time is listed as 11.7 days so lots future grief for that host, even without the new data file problem.
Cheers,
Gary.
Mike Hewson wrote:Penguin
)
Asteroids@home's app says (cuda55) after it, fails with the 2080, fine on a 1070.
at Primegrid all sub projects work on both the 1070 and 2080. Both CUDA and OpenCL
seti@home seems to use opencl... app is called opencl_nvidia_SoG. All tasks work on the 1070 and the 2080.
GPUGrid only works on the 1070, their gpu apps say (cuda80) so I guess CUDA
and milkyway at home works on both the 1070 and 2080, I think, I can't get any new tasks right now to double check that, not sure if they use CUDA or OpenCL.
So perhaps OpenCL apps are ok? CUDA apps giving problems with the RTX series.
I don't remember where, possibly at seti, where there was a post saying they needed to be compiled using the latest CUDA versions... I really don't know what that means or where I saw it or if I'm repeating it correctly.... I just remember seeing something about it on one of the BOINC project sites.
Penguin wrote:So perhaps
)
I understand the current Einstein Windows application for Gamma-ray Pulsar search to be OpenCL and not CUDA.
One clue is that the executable has the string opencl in the file name:
hsgamma_FGRPB1G_1.20_windows_x86_64__FGRPopencl1K-nvidia.exe
Other clues are that stderr does not include the string CUDA, while it does contain these lines:
archae86 wrote:Penguin
)
OK, so that disproves the cuda apps being an issue alone then...
I have an RTX 2070 and am
)
I have an RTX 2070 and am experiencing this problem. Every time a WU is aborted this message appears in the Event Log: "Display driver nvlddmkm stopped responding and has successfully recovered." A similar message pops up in the system tray.
When a similar problem occurred in early November, I contacted NVidia tech support, which seemed only too eager to help solve the problem. However, how is NVidia to lay hold of a faulty WU with which to reproduce the problem?
So, my question is, is anyone at Einstein@Home working with NVidia to get them to fix whatever is ailing the display driver WRT the RTX 2070 and 2080?
CElliott wrote:So, my
)
archae86 at https://einsteinathome.org/content/pascal-again-available-turing-may-be-coming-soon?page=6#comment-167615 has a portable test case and an NVidia bug number from Nvidia driver feedback. You might consider joining forces with him.