Computation Error

magdaddy
magdaddy
Joined: 29 Mar 20
Posts: 5
Credit: 83008041
RAC: 0
Topic 223938

AMD Ryzen 9 3900X 12-core processor

AMD Radeon RX 5600 XT

32 gigs of memory

Running the latest Windows and Radeon updates 

I randomly have Computation Errors.  The logs don't say much.

11/8/2020 4:31:11 PM | Einstein@Home | Aborting task h1_0390.20_O2C02Cl4In0__O2MDFS2_Spotlight_390.45Hz_408_1: exceeded elapsed time limit 9292.52 (2880000.00G/309.93G)

The UI shows "Computation Error (0.9 CPUs + 1 AMD/ATI GPU)".  It looks like the tasks is frozen and runs over 24 hours.

The problem seems to happen more now, so I'm investigating.  I don't know the pattern.

I normally run WCG, Rosetta, and Einstein.

Any recommendations on how to fix?

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7226494928
RAC: 1072585

People might better be give

People might better be give you advice if you unhide your computers.

Web site account|Preferences|Privacy

Adjust: Should Einstein@Home show your computers on its website?:

Then Save Changes

mikey
mikey
Joined: 22 Jan 05
Posts: 12694
Credit: 1839100099
RAC: 3704

magdaddy wrote: AMD Ryzen 9

magdaddy wrote:

AMD Ryzen 9 3900X 12-core processor

AMD Radeon RX 5600 XT

32 gigs of memory

Any recommendations on how to fix? 

Until you unhide your computer it's hard to help you.

magdaddy
magdaddy
Joined: 29 Mar 20
Posts: 5
Credit: 83008041
RAC: 0

I unhide my

I unhide my computers.  

https://einsteinathome.org/host/12818634 has the problem.

 

mikey
mikey
Joined: 22 Jan 05
Posts: 12694
Credit: 1839100099
RAC: 3704

magdaddy wrote: I unhide my

magdaddy wrote:

I unhide my computers.  

https://einsteinathome.org/host/12818634 has the problem. 

Thanks that helps!!

Do you run one task at a time or several at once?

I have a hard time figuring out how to find the errors in that file but did see this that may or may not be important:

2020-11-01 07:22:11.6935 (4576) [normal]: OpenCL Device used for Search/Recalc and/or semi coherent step: 'Hawaii (Platform: AMD Accelerated Parallel Processing, global memory: 8192 MiB)'

I remember reading something about the 'Hawaii' gpu's but don't remember the context so can't speculate if it's important in this case.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7226494928
RAC: 1072585

Thanks for the

Thanks for the unhiding.

The host you described in your original post had a 5600 XT GPU, but the host ID number you mentioned here points to a computer with AMD Radeon (TM) R9 390 Series (8192MB).

 

 

magdaddy
magdaddy
Joined: 29 Mar 20
Posts: 5
Credit: 83008041
RAC: 0

Sorry, copy and pasted the

Sorry, copy and pasted the wrong computer.  https://einsteinathome.org/host/12842478

 

CPU type:AuthenticAMD AMD Ryzen 9 3900X 12-Core Processor [Family 23 Model 113 Stepping 0]

Number of processors:24

Coprocessors:AMD AMD Radeon RX 5600 XT (6128MB)

Operating system:Microsoft Windows 10 Core x64 Edition, (10.00.19041.00)

magdaddy
magdaddy
Joined: 29 Mar 20
Posts: 5
Credit: 83008041
RAC: 0

I run one GPU task at a

I run one GPU task at a time.  The rest are CPU tasks.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

magdaddy wrote:I run one GPU

magdaddy wrote:
I run one GPU task at a time.  The rest are CPU tasks.

How many cpu tasks at the same time... how much work there is running in total ?

It could be useful to let all the currently running cpu tasks to complete but not let any new cpu tasks start. Then you could see how those gpu tasks run by them own while there's nothing else happening at the same time.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117704305757
RAC: 35067726

magdaddy

magdaddy wrote:
https://einsteinathome.org/host/12818634 has the problem.

Actually, both of your computers have issues, albeit quite different ones.

Firstly, the above one that you originally linked (with the hawaii GPU) has quite a few validate errors.  All tasks were completed and returned without visible error.  The problem arose when basic sanity checks were performed on the data returned.  As of now, 12 results have been rejected because of 'validate' errors.  In simple terms, this means that the data returned was recognisably rubbish - so much so that they were rejected outright and no comparison was performed against a duplicate result from a second computer.  These validate errors were shared between both GRP and GW types of GPU tasks.

Validate errors usually point to hardware issues on the host machine.  Classic causes are things like overheating (check the cooling system) or faulty power - eg., too much ripple or unstable voltages.  The fact that both types of GPU searches were being affected does point to a hardware issue rather than to a particular science app causing the problem.

For the other machine with the RX 5600 XT, all results are terminating with a particular exit status - 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED.  You can see this for yourself by clicking on the TaskID link for any of your failed tasks on the website.  Every task has a time limit that should be a lot more than needed for crunching to be completed.  If you look at your results, you can see that time limit is 9,294 secs. Your GPU should take a lot less than that.

I can think of two possibilities for this sort of problem - either you don't have the correct driver/OpenCL libraries installed, or you have loaded up all available CPU threads with CPU tasks and are therefore starving the GPU of the necessary CPU support to allow the GPU tasks to run at the correct speed.

As Richie has suggested, temporarily suspend your CPU tasks and see if that allows the GPU to run at the proper speed.  If so, it should give you a dramatic speedup for GPU tasks, with no further tasks hitting the time limit.  In that case, you would just need to put a limit (say at least 2 less than currently running) on future CPU tasks that can run concurrently.  The easiest way to do that is to restrict the % of CPU cores BOINC is allowed to use.

If freeing up the CPU threads doesn't solve the problem, you should investigate driver issues.  I don't run Windows at all so can't offer any advice about that.

Cheers,
Gary.

magdaddy
magdaddy
Joined: 29 Mar 20
Posts: 5
Credit: 83008041
RAC: 0

https://einsteinathome.org/ho

https://einsteinathome.org/host/12818634/tasks/5/0 This is my older system.  I did clean out the dust, because it was overheating.  I haven't had problems for months now.  Should I remove it from boinc?  Is it causing problems for Einstein@home and the other projects?  I haven't noticed long running tasks, like my other computer.  Should I free up CPU threads on his system, if I Einstein tasks are completing?

https://einsteinathome.org/host/12842478 I freed up CPU threads and tasks are completing now.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.