Hello,
As the title says, I have recently started to get computation errors for one of my hosts:
Have you got any ideas why is this happening, or is it just my old PC taking it's last breath?
Thank you!
Copyright © 2024 Einstein@Home. All rights reserved.
I can't say for sure, but it
)
I can't say for sure, but it may be that your 1GB video card just doesn't have enough VRAM for the new tasks.
I'm blacklisted from the new tasks due to my architecture, but maybe someone else can confirm VRAM use on the new 3001L00 tasks so see if they might be using more than 1GB
_________________________________________________________________________
Your problem is associated
)
Your problem is associated with attempts to process the new flavor of GPU GRP tasks which have task names starting with LATeah3001. Your system was previously successful with task names starting with LATeah2049L.
There are several threads here on more than one forum triggered by the wide observation that all modern Nvidia cards (Volta, Turing, Ampere...) fail on this new series of tasks. Delivery of GRP GPU work to the "modern hosts" is been disabled. The threshold for "modern" as I have termed it is Compute Capability of 7.0 or higher.
However your GT 710 is not a "modern" card by the terms in use at all. Wikipedia lists it with a Compute Capability of 3.5 and as a member of the Kepler generation.
Meanwhile, my advice to you is to use your Project Preferences settings to opt out of GPU Gamma-Ray Pulsar work here at Einstein. You can monitor the forums to see whether some progress is made in issuing new applications or in ceasing the distribution of tasks which trigger this problem.
Thanks for your report.
I threw a 1060 6GB on my
)
I threw a 1060 6GB on my Ubuntu testbench, and during these 3001L00 tasks, they do use a bit more VRAM than the older files needed, at about 785MB. and with running the desktop environment on the same card, it's using about 1GB total.
but this is on linux. if the Windows app needs slightly more, or running the windows desktop needs slightly more than required for my linux desktop, I could see you bumping into that 1GB limit.
just spitballing though.
_________________________________________________________________________
Hello, Thank you all
)
Hello,
Thank you all for your support!
One more question on my side, if I'd bump into the 1GB limit, wouldn't tasks have different running times? I see that all fail after 22.something k seconds. Looking at the stderr reported, looks like every task succeeds the main analysis and then crush at the very last step (at least that's how I interpreted, please correct me if I'm wrong).
Thank you!
others have noticed that the
)
others have noticed that the final phase of computation (89.999%-100%) does seem to take longer on these new tasks. I think it's doing some recalculation in double precision during this time. the best clue is that you get error code -36 in your stderr.txt file, but a dev would have to decode what that error code referrs to.
a quick check on that GPU model GT 710 1GB does reveal that it is capable of DP, and even though there were several slightly different models of GT 710 released, your system is at least self identifying it at having CC 3.5 (via your last sched request log).
it's really hard to say if this is just yet another issue with these new tasks manifesting in a new way due to marginally capable hardware, or a VRAM limit, or something else entirely. your GPU def doesnt like these tasks though.
if you feel like playing around with things, you could try newer or even older drivers to see if it makes any difference, but I don't have high hopes that you'd see different results.
_________________________________________________________________________
Andrei-Costin Babaua
)
In addition to what the others have mentioned, I took a look at the stderr output returned to the project for one of your failed tasks and compared it to what I get for one of mine that doesn't have the problem. My GPU is old too, but is rather more capable than yours. It's an AMD HD7850 that I bought in 2013 and it still runs very well.
Firstly, here is the last checkpoint written on your failed task, right near the bottom of the log. Up to that point you can see all the previous checkpoints. The final number (in this case 920) is the total number of 'skypoints' processed to that point. You can see this number steadily increasing by about 27 for each checkpoint written. This is all perfectly normal.
% C 0 920
Following this last checkpoint the error message is
ERROR: /home/bema/source/fermilat/src/bridge_fft_clfft.c:1073: clFinish failed. status=-36
For comparison, my GPU's last checkpoint was
% C 0 937
which was then followed by a very normal output line when there isn't any problem.
FPU status flags: PRECISION
Because, the total number of skypoints in the two examples are very similar, my guess is that your task had successfully completed processing the data and that it was the transition to the 'followup' stage where the 'toplist' of candidate signals is being recalculated in double precision that caused the issue. This is quite different from what the modern nvidia GPUs are having an issue with.
I know none of this helps you resolve your problem but you can take consolation from the fact that you have probably given the Devs something else to ponder while they try to sort out the more pressing modern nvidia GPU problem :-).
My guess is that if there isn't a quick resolution as to why LATeah3001L based tasks are having problems for some, there might be a fairly prompt reverting to the earlier LATeah2nnnn based tasks to buy some time in sorting it all out. If that happens, there should be a note (maybe Technical News) to alert you to try again.
I'm actually quite surprised that a GT 710 was able to do the tasks in the first place. I don't know how you had the patience to wait nearly 9 hours to see a result though :-).
Cheers,
Gary.
Thank you all again for the
)
Thank you all again for the answers!
I have already tried that before posting here. Sadly, as you said, it didn't help at all.
That wasn't my intention at all :D. All I wanted was to find out if the problem is my PC, which kind of is, because it's old.
Given that I interact with that PC once, maybe twice a month (to start it up after a power outage, or for driver updates, but nothing more than that), long running times are not that big of a problem for me, as long as tasks are running normally. It sits nicely in a corner, forgotten by the world and aging silently. Well, as silent as an overheated CPU can be :).
I am also starting to get
)
I am also starting to get computation errors on my computer with a 1080 Ti - https://einsteinathome.org/de/host/12819241/tasks/6/0?page=18
As far as I can tell from looking thru a few error messages, they all seem to fail with the same error.
Maybe it can help the devs to figure out whats going wrong with the new set...
Stefan Ledwina wrote:I am
)
This is a different problem to the one being discussed here. I don't own any of these recent model nvidia GPUs but I imagine you will find your problem is the same as has been discussed in this different thread over the last few days. You could add your information there if you wish.
The Devs are already working on a solution.
Cheers,
Gary.