Hello,
After many years of number crunching on Einstein and many other projects I have a problem that I cannot figure out. My computer will crash and from what I can tell is that it happens on any GPU task. I am posting this in the einstein forum because that project was the only one I was able to get a screenshot of the error message under the task tab. All other failures just would restart the PC.
What I have done. So I recently got this workstation back online from just sitting in the basement dur to life and wanted to have it start crunching numbers. It has sat for several years due to becoming a parent and such so I do not remember where I left off with the workstation other than it was used as my backup workstation for home stuff and crunching numbers. When I got it back online I added all the projects that I always have been crunching for as well as some new ones. Off the top of my head the projects are: einstein, LHC, milkyway, world community, mind modeling, climateprediction, number fields, rosetta, RNA, universe. The new ones I added were ithena, ibercivis and BOINC@TACC. The new projects really didn't have any tasks for me when I first started this rig back up and going (approximately 5 days ago). I just got a bunch of tasks from my regulars - einstein, world community, mind modeling, and rosetta.
The first day or two everything ran fine then the next morning I came into my office and the computer had restarted. I saw some updates were applied and assumed that was it. I changed the windows update settings so it would not install and restart on its own. I thought I was good. After that I would come into my office and just always find my workstation at the log in screen. I checked event viewer and saw errors but nothing that made sense to me nor with any of the basic online searching I performed. I then started removing projects from BOINC trying to pinpoint if a certain project was giving me problems. I was down to only having einstein on it. Ran overnight and no problems. I then added world community and it ran for an extended period of time and completed task with no problems. When I added milkyway it crashed on the first task. Removed it and tried Rosetta.
At this point I was more closely watching what tasks were running when I would leave my computer each time. I had several CPU tasks running and one GPU task set to run. When I am using my PC it doesn't run GPU tasks until screen saver starts and I am not using it. It crashed and this time it was an einstein GPU task.
Recap: CPU task work fine and GPU task crash. The only real information I have on the GPU crash is on the einstein GPU task where it said "computation error 0.9 CPU + 1 NVIDIA GPU".
Average workload: I usually have 2-3 CPU tasks running and 1 GPU task. This is what I always used to run on this workstation for years.
System information: Asus Crosshair formula IV, AMD Phenom II X6 1035T, 16gb DDR3-1333, GIGABYTE GeForce GTX 750 Ti 2gb, Windows 10 Pro
I used a hardware info logging software when I started having problems to see if anything was overheating and the CPU and Motherboard saw a highest temperature of 34°C and the GPU saw 41°C which are all under their max or even hot range. voltages to CPU and GPU are good.
I have my BOINC settings stock minus I gave it extra hard drive space than the standard 10GB. So CPU is like 50% max and such.
The only thing I can point to is that I did switch to a new graphics card sometime before I stopped using this workstation previously. Not sure if I was crunching numbers but the MOBO, CPU, CPU cooler, etc. have all been crunching numbers on and off for 4-6 years.
Thank you
Copyright © 2024 Einstein@Home. All rights reserved.
Hi! Those GW GPU tasks
)
Hi!
Those GW GPU tasks need more memory than what your GTX 750 Ti with 2GB has. Currently a fail safe solution with 2GB GPUs would be deselecting 'Gravitational Wave search O2 Multi-Directional GPU' app on the project preferences. Your GPU is able to run tasks from 'Gamma-ray pulsar binary search #1 on GPUs' though.
Thank you for that
)
Thank you for that insight.
I have a new to me (used) XFX AMD Radeon RX580 GTS 8GB GDDR5 on the way. I got it cheap and it will be installed in the near future. Maybe this newer graphics card can handle those GPU tasks in which I am failing
I did have another failure last night in which my PC restarted. Upon logging in and opening BOINC I noticed there were no failed GPU tasks or even any GPU tasks in que. This kind of throws me for a loop now.
MN_Firefighter wrote:So I
)
Yep, it should be able to run extremely happily also GW GPU tasks.
Perhaps PSU on it has seen its best days even if the voltages seem good. Maybe it can't quite stand even moderate stress anymore and shows instability which is then enough to restart the computer. If you got a fresh PSU for temporarily testing that could reveal if there's a problem with the old unit.
Another thing to try out is stress test the machine 'outside Boinc' and also test RAM. Here's a few ideas for softwares:
https://www.gearprimer.com/technology/best-tools-stress-test-pc-cpu-ram-gpu/
https://www.tomshardware.com/reviews/stress-test-cpu-pc-guide,5461.html
Memtes86+ , Prime95 and OCCT are free.
3DMark is currently on sale.
MN_Firefighter wrote:I have a
)
That particular GPU should give you no problems with GW GPU tasks. The 750Ti can't do them but could do the gamma-ray pulsar tasks. However, the RX580 would give 5 to 10 times the output on the GRP tasks so you are much better off replacing the 750Ti, whichever type of task you choose to run.
My only concern with 'used' GPUs is whether the previous owner has given it a hard life or done BIOS mods - eg for crypto-currency mining, etc. Also check the condition of fans when it first arrives. Some people dispose of such GPUs at the first hint of fans starting to deteriorate. Pay particular attention to how long each fan takes to stop after removing power. Check for any sign of 'wobble' in the bearings. It's usually a quite short time from the first signs of a problem to complete fan failure.
From your descriptions of your current setup, you have some sort of hardware issue when under load. The most likely culprits are the motherboard or the PSU. The Crosshair IV Formula was first released in 2010, so is quite old. My first thought was capacitors but a bit of research seems to show the board has all polymer caps so the old problem with electrolytics that 'dry out' and swell (and fail) shouldn't be the cause.
Can you please state the make, model and 12V current rating for your PSU? If it's an older type that had high current ratings on 3.3V and 5V, it's quite likely that the PSU is the problem. PSUs like that often have electrolytic caps that swell and fail. I've repaired many, many older PSUs with the 'capacitor plague'. Please note that an RX 580 will use quite a bit more power than a 750Ti so you really need to have a decent PSU.
Please realise that the current CPU (AMD Phenom II X6 1035T) is quite power hungry on its own so when you add the RX 580, you should have a PSU that is able to deliver at least 500 - 550W on the 12V rail (eg. 45 amps current rating), and with low ratings on the 3.3V and 5V rails. It's highly desirable to have at least an 80+ rated PSU - Bronze or better would be good to have. The GPU needs an 8-pin PCIe power connector. Does your current PSU meet those requirements?
Cheers,
Gary.
Richie, Thank you for
)
Richie,
Thank you for those software options as an external choice for stress testing. I will look into those.
Gary,
Thank you for the response. My PSU does have an available 8 pin for the new graphics card. I will check over the XFX when it arrives. Those are good items to look at.
The PSU is a Thermaltake 1000W Toughpower. p/n W0155RU. This build was before modular PSUs were readily available so I have a lot of extra lines off the PSU that I am not using.
After the problems started I did update the bios to the newest stable (non-beta) edition to see if it made any fixes to my issue and it didn't Graphics card has the newest update.
The CPU fan is a huge CoolerMaster that is a dual fan push/pull set up.
Outside of any additional thoughts I will just continue to crunch CPU tasks until the new graphics card shows up and then I can try the GPU tasks and update.
I kind of forgot how old this workstation was until I looked up the specs on the motherboard and realized it was PCIE 2.0 and no m2 slots.
Thank you
MN_Firefighter wrote:The PSU
)
The specs for that PSU are more than adequate for running an RX 580 - no issue there.
However, this 2007 review for your exact model, whilst being quite favourable, shows just how old the unit potentially is. Since the machine had some years stored away, the electrolytic capacitors in the PSU will tend to have suffered from lack of use and may well have dried out and be causing the problem when loaded. If the filter capacitors aren't filtering, you may damage other components.
I've seen the behaviour you describe happen in PSUs from old servers that have been out of service for a while. They fire up and run OK at idle but then tend to shut down or cause a reboot when pushed under load. If this continues when you install the RX 580, the best test would be to try a known good PSU if you can beg/borrow/steal one for that purpose.
Cheers,
Gary.
Gary, That is great
)
Gary,
That is great information to know.
Thank you
New graphics card is in and
)
New graphics card is in and running GPU based projects without any issues. Ran overnight without any issues whereas the previous graphics card shut run into issues 30-90 minutes. The graphics card showed up in great condition.
MN_Firefighter wrote:New
)
I'm very glad the new 8GB RX 580 solved the issues for you. Well done!
I had a look at your GW tasks list - no errors or invalid results since the changeover. The completed tasks from 9 Dec 2020 have a couple at around 36 mins but the time then reduces to a fairly stable value of around 18 minutes. This looks about right for running tasks singly so perhaps you were running multiple concurrent tasks initially when the first couple gave the much longer times.
I'm running some machines with a basic quad core CPU and an 8GB RX570 using the same GW search. There are no CPU tasks being run so all cores are available for GPU support. My experience is that a 580 and a 570 seem to give much the same crunch time behaviour, despite the difference in spec.
Here are the GW GPU results for a machine with a basic athlon 3000G processor. All tasks have been crunched at a multiplicity of 4. You can see (at the time of writing) a bunch of validated results with no errors or invalids. The 8GB VRAM seems (at the moment) to be sufficient to run x4 quite safely. There is about 1.8 times the throughput by using x4 instead of x1. 4 tasks are completed in under 38 mins on average - ie. 9.5 mins for each task. Task times when run singly are around 17 mins. The approximate 'per task' run times when crunched at multiplicities of x1, x2, x3 and x4 are 17min, 12min, 10.3min and 9.5min respectively. You can see that going to x2 gives the biggest benefit but that the throughput keeps increasing right up to x4. That final step is still around 8% better than x3.
The memory requirements will probably keep increasing in the future as the frequency terms in the task name get larger. From previous experience, the multiplicity might need to be lowered at some point so I intend to monitor for any signs of compute problems. All seems good at the moment.
Cheers,
Gary.
Gary, Great information!
)
Gary,
Great information! Happy to be crunching at full scale again.