ok I I am not supper techy. But had this error code on host 6018738 a old dell gtx620.The last day or two it has done a restart on its own. I then retart boinc and apperrently its done it a few times. I just leave this comp unattended.Can anyone point me in the right direction?? Thanx in advance. No oc. My new system is down with overheat problems have a new cpu cooler on way hope to have back up by wes.
Copyright © 2024 Einstein@Home. All rights reserved.
ERROR -1073741819 (0xffffffffc0000005)
)
Ive tried a refresh of os gpu temp was steady at 50-53. It is a gx620 the new host # is 7602725 . The comp does fine until I start crunching then runs about 10 min and then says ran into problem then restarts. This comp has been running 24/7 about 6 months now this.
RE: ok I I am not supper
)
This could have to do with memory. You can install "WhoCrashed". This is a free download and shows often but not always, what the cause is from a blue screen or reboot.
Greetings from
TJ
RE: This could have to do
)
THANX. That gave me some good info. I ended up going back to an older driver for the gpu. Detached Einstein then removed boinc. Reinstalled and reattached. Will see if that takes care of it. THANX again
Robert
Well that did not take care
)
Well that did not take care of the problems. I thought maybe it was the gpu so switched out to a gt630 I had sitting.The gpu runs about 2-3 mins then mem use,mem control, & gpu load all drop to 0 for about 1-2 secs then resumes . The elapsed time on the wu resets to 0 and starts ... repeat.. The comp did not restart during this. I suspended gpu tasks for now. Cpu tasks run normal. Any thoughts would be appreciated .The WhoCrashed pointed to the gpu driver as a maybe also ntuskrnl.exe (nt+0x78040) but os boots and runs fine. My limited expertise is about used up. THANX
Robert
Quick & dirty
)
Quick & dirty diagnosis:
Install free app 7-Zip. Its as good as WinRAR or WinZip. No problem having it installed with any other archive application; just don't let it hijack the file associations. That way when your archive application of choice will remain the default launcher when clicking on archive file types.
On the menubar, click on 'tools' and then benchmark.
If it abends during decompression probably issues exist with one or more memory modules (I'd say better than 9:10 odds). That's what the xC00005 is typically indicative of.
You can also the tried and true Memtest 86. Test 5 & 8 will horn in on the problem.
The one thing you want to ensure is that all the memory has similar timings.
I recently ran into a problem with my mobo being finicky in that it didn't want to run 6.5ns (2-2-2-5) & 7ns (2-2-2-6) memory sticks concurrently (despite individually they tested o.k.).
Thank you for your input Ray.
)
Thank you for your input Ray. RAM checks out ok using the 7-zip benchmark test. Also ok with the WIN8 RAM test. They are matched sticks and the timings check the same. I may run Memtest later sun. Going to hit the hay and have a wake to go to Sun afternoon. I will check this thread after. A thought though could this be a symptom of a power supply that is starting to wane ?? I don't think its cpu overheat because its been crunching cpu tasks now several hours at about 98% Also it has not restarted on its own again.
If the PC runs normally for a
)
If the PC runs normally for a while and then shuts down or boots, it could be due to overheating. The CPU has a built in prevention system.
You could download "Core Temp" to see the temperature of the CPU.
If it boots when the CPU and the GPU is running, try first run CPU only for a few hours and watch the temperatures.
If okay then run GPU only and see what happens. You can use "GPU-Z" to see the temperature of the GPU, its load, memory use and more information.
If Who Crashed came up with a kernel message this has in many cases to do with a driver issue. Not he GPU driver, but an unfamiliar one. I have had it once that I got a blue screen after 8 minutes running BOINC. With no BOINC the rig stayed up for hours. It went out to be a Roxio driver.
Greetings from
TJ
That's your Win8 box that's
)
That's your Win8 box that's doing this? I don't see that host ID. I see three boxes - the WinXP box hasn't been heard from since Dec 2012 - and the Win 7 box seems to be a champ. Where are you seeing the C thousand 5 error? None of your hosts show WU abends.
The 7-zip benchmark should be run for sevreal iterations with the largest dictionary size that it'll allocate. Let that run for about 30 minutes. That should be a pretty good test of the RAM. If it passes that, I'm unconvinced there's a problem with RAM modules.
Quite frankly, if it passes 7-zip benchmark, I'd run Prime95 for a few hours. That's going to stress test the CPU & the memory sub-system. If it passes 7-zip and fails Prime95, the prollem is either the CPU or the mobo.
Check the air intake on the CPU cooler. Make sure its not clogged with dust. Vacume that crap out while using a trim paint-brush to get into the fins of the heat-sink. A little bit of dust isn't going to be a prollem. Its the mondo-to-grimace proportion doorway-darkening wooly-boogers that are highly suspect as gremlins here. If it really was caked on, then removing the fan from the heat-sink may be what needs to be done here so as to get in between the heat-sink fins. Or shoot compressed air into it and spew dust all over; at least it won't be in your CPU cooler.
Speedfan is a decent free utility to implement for CPU / GFX / mobo temp monitoring and fan control. Your CPU should be at < 60 deg. C. at full load. But unless temps are in excess of 100, the CPU should just throttle down. Thermal trip don't happen with the newer CPU like in the peterodactyl days.
If the fan on the CPU cooler isn't spinning at all (or very slowly), SpeedFan will tell you that the CPU fan isn't spinning (or spinning very slowly). If the CPU fan isn't spinning (or spinning very slowly), it'll tell you that the CPU is pretty warm too. OR you could just watch the CPU fan and make sure its spinning; your call.
Its possible the PSU can be at issue. They can get wonky when they get older. How old is that one? 5 years is pretty much when you can begin to see flakey problems. Cheaper ones can go wonky in as short as two years. How many watts is it rated at? Ideally one wants to size PSU's at 200% of requirements. That's especially true for the el-cheapo units. This is more for lengevity than anything else. Heat kills electronic components, and the more stressed a PSU is the hotter it'll run (and by extension the shorter its lifespan).
SpeedFan has monitoring functionality for different power supply rails of your PSU. You shold be within 8% of nominal. If you observe spikes outside that range a metaphysicist would be a good consultant then. They're into the affect of electrical spikes and sags on how that affects feng shui and the chi of your PC.
Ideally you want to condition power to the PSU by implementing a batery backup unit that has smart-boost and smart-trim funcionality. The better units do that w/out tripping onto battery. One advantage to battery backups are that the better ones do power conditioning and filter out noise on A/C mains. Noise over time can play havoc with D/C circuitry of the mobo.
Albeit standby power supplies have surge protection, you want to have external surge protection. Ideally you want to discard surge protectors every 5 years. There's no way to discern how stressed a surge protector has been; death by a thousand cuts. Your secondary line of defense is the standy-power supply internal surge protection. Its cheaper to replace the external surge protection rather than the SPS. You want to spend $50+ on surge protector having the highest joule rating in quickest time. Once a surge protectors joule rating is exceeded its toast. If it sees 100 surges at 1% its max rating, the surge protector is toast.
There could be an issue with power quality on mains, i.e., either sags, spikes, noise, etc. PSU that's getting long in the tooth may not be able to handle transients as well as when younger. If you're using that 4 pole molex - for the CPU - power supply quality tolerances are much more tight than the older CPU's.
The other prollem child could be HDD issues. Bad spots on the HDD - either hard or soft - could cause C thousand 5 errors. Soft errors can be fixed with a HDD drive dx / repair utility. Hard errors intimate imminent HDD failure.
Spontaneous reboot suggests a mobo problem (if its not related to PSU). Check to ensure that system isn't configured to reboot on BSOD. That'll be in startup & recover settings. That way you can establish if the system is BSOD to reboot. Check the event logs too. There may be a clue in either the application or system event logs.
Thanks Ray starting the
)
Thanks Ray starting the prime95 test now. This host is now # 7611027 .The only other true errors on this box were two time outs from the fist runs of theGamma-ray pulsar search #2 v1.09 Then later came the two that happen with this reboot thing. Did the cleaning again to be sure. Cpu fan is ok. I have an intel 160gb ssd it checks ok. The psu is the stock dell rated 305 watts and is min. 7 years old. The gt 640 card I had in the pci ex x16 slot drew 65watts. ( The 640 oem w/1g GDDR% is the largest or best card that works in this box because it only uses the power from the slot. No ext pci plug wich this box doesn't have ) That's why I am leaning ps. This box was running at least 4 months 24/7 with that card .I think the ps is just starting to fade.I will let you know how the prime95 goes. REALLY appreciate the asst. also just found this in the event logs...
Log Name: System
Source: Microsoft-Windows-Kernel-Power
Date: 6/29/2013 2:01:24 AM
Event ID: 41
Task Category: (63)
Level: Critical
Keywords: (2)
User: SYSTEM
Computer: RLCESARZ
Description:
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
Event Xml:
41
3
1
63
0
0x8000000000000002
415
System
RLCESARZ
307
0x1
0x784
0x0
0x0
0
0
0
I don't know if that confirms psu or cpu problem
Also 2 hrs so far on prime95 and not a flutter all passing ..so far
Rob
And thanks TJ I have been
)
And thanks TJ I have been doing those things to and will still have to go thru the drivers next. Rob