AMD Ryzen 9 3900X 12-core processor
AMD Radeon RX 5600 XT
32 gigs of memory
Running the latest Windows and Radeon updates
I randomly have Computation Errors. The logs don't say much.
11/8/2020 4:31:11 PM | Einstein@Home | Aborting task h1_0390.20_O2C02Cl4In0__O2MDFS2_Spotlight_390.45Hz_408_1: exceeded elapsed time limit 9292.52 (2880000.00G/309.93G)
The UI shows "Computation Error (0.9 CPUs + 1 AMD/ATI GPU)". It looks like the tasks is frozen and runs over 24 hours.
The problem seems to happen more now, so I'm investigating. I don't know the pattern.
I normally run WCG, Rosetta, and Einstein.
Any recommendations on how to fix?
Copyright © 2024 Einstein@Home. All rights reserved.
People might better be give
)
People might better be give you advice if you unhide your computers.
Web site account|Preferences|Privacy
Adjust: Should Einstein@Home show your computers on its website?:
Then Save Changes
magdaddy wrote: AMD Ryzen 9
)
Until you unhide your computer it's hard to help you.
I unhide my
)
I unhide my computers.
https://einsteinathome.org/host/12818634 has the problem.
magdaddy wrote: I unhide my
)
Thanks that helps!!
Do you run one task at a time or several at once?
I have a hard time figuring out how to find the errors in that file but did see this that may or may not be important:
I remember reading something about the 'Hawaii' gpu's but don't remember the context so can't speculate if it's important in this case.
Thanks for the
)
Thanks for the unhiding.
The host you described in your original post had a 5600 XT GPU, but the host ID number you mentioned here points to a computer with AMD Radeon (TM) R9 390 Series (8192MB).
Sorry, copy and pasted the
)
Sorry, copy and pasted the wrong computer. https://einsteinathome.org/host/12842478
CPU type:AuthenticAMD AMD Ryzen 9 3900X 12-Core Processor [Family 23 Model 113 Stepping 0]
Number of processors:24
Coprocessors:AMD AMD Radeon RX 5600 XT (6128MB)
Operating system:Microsoft Windows 10 Core x64 Edition, (10.00.19041.00)
I run one GPU task at a
)
I run one GPU task at a time. The rest are CPU tasks.
magdaddy wrote:I run one GPU
)
How many cpu tasks at the same time... how much work there is running in total ?
It could be useful to let all the currently running cpu tasks to complete but not let any new cpu tasks start. Then you could see how those gpu tasks run by them own while there's nothing else happening at the same time.
magdaddy
)
Actually, both of your computers have issues, albeit quite different ones.
Firstly, the above one that you originally linked (with the hawaii GPU) has quite a few validate errors. All tasks were completed and returned without visible error. The problem arose when basic sanity checks were performed on the data returned. As of now, 12 results have been rejected because of 'validate' errors. In simple terms, this means that the data returned was recognisably rubbish - so much so that they were rejected outright and no comparison was performed against a duplicate result from a second computer. These validate errors were shared between both GRP and GW types of GPU tasks.
Validate errors usually point to hardware issues on the host machine. Classic causes are things like overheating (check the cooling system) or faulty power - eg., too much ripple or unstable voltages. The fact that both types of GPU searches were being affected does point to a hardware issue rather than to a particular science app causing the problem.
For the other machine with the RX 5600 XT, all results are terminating with a particular exit status - 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED. You can see this for yourself by clicking on the TaskID link for any of your failed tasks on the website. Every task has a time limit that should be a lot more than needed for crunching to be completed. If you look at your results, you can see that time limit is 9,294 secs. Your GPU should take a lot less than that.
I can think of two possibilities for this sort of problem - either you don't have the correct driver/OpenCL libraries installed, or you have loaded up all available CPU threads with CPU tasks and are therefore starving the GPU of the necessary CPU support to allow the GPU tasks to run at the correct speed.
As Richie has suggested, temporarily suspend your CPU tasks and see if that allows the GPU to run at the proper speed. If so, it should give you a dramatic speedup for GPU tasks, with no further tasks hitting the time limit. In that case, you would just need to put a limit (say at least 2 less than currently running) on future CPU tasks that can run concurrently. The easiest way to do that is to restrict the % of CPU cores BOINC is allowed to use.
If freeing up the CPU threads doesn't solve the problem, you should investigate driver issues. I don't run Windows at all so can't offer any advice about that.
Cheers,
Gary.
https://einsteinathome.org/ho
)
https://einsteinathome.org/host/12818634/tasks/5/0 This is my older system. I did clean out the dust, because it was overheating. I haven't had problems for months now. Should I remove it from boinc? Is it causing problems for Einstein@home and the other projects? I haven't noticed long running tasks, like my other computer. Should I free up CPU threads on his system, if I Einstein tasks are completing?
https://einsteinathome.org/host/12842478 I freed up CPU threads and tasks are completing now.