Very unuseal non responding XP64 system, no BSOD, no other messages.

Fred J. Verster
Fred J. Verster
Joined: 27 Apr 08
Posts: 118
Credit: 22,451,438
RAC: 0
Topic 195366

When I came home, about 21:00 (UTC+2), saw that BOINC had stopped, mouse didn't respond either and a 'hard' RESET, was necessary.
BOINC displayed a message on an Einstein Global Correlations S5 search #1 3.02xxxxxxxxxxxxxxxx on the progress page, waiting for GPU memory?
Still visible, but when I wanted to do a 'screenshot', it was gone.
A moment later this WU was gone, one ran all the time and a second one started.
On the CPU, never saw them running on a GPU!?
Never noticed such an unexplainable halt, heat isn't the reason, AC was on, it's still 19C outside!
(Crazy)

Two other projects run on this rig, GPUgrid and SETI@home, both, especially the latter uses the GPU
98%, can hear fan coming fast, the 470 blows all the hor air out the rear, but the 480, which has 4 big heat-pipes, blows also straight up.
That why I still run them without a case!

Hard to track down what has happened.....

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,579
Credit: 307,046,605
RAC: 153,726

Very unuseal non responding XP64 system, no BSOD, no other messa

You have two GPU's on say, PCI express slots? Is there a fan on the North bridge?

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Fred J. Verster
Fred J. Verster
Joined: 27 Apr 08
Posts: 118
Credit: 22,451,438
RAC: 0

Sorry for my, sometimes too

Message 99933 in response to message 99932

Sorry for my, sometimes too often, late reply, but no, no fan on the NorthBridge. And yes PCI-Ex16 (2.0), but uses version 1.0
It's an ASUS P5E Mobo, QX9650 CPU & GTX 470 & 480, PSU=850Watt, 4x +12V, 17A rails.
And this host has NO case, no heat problem, though.
The below described host does get hot, CPU sometimes 99C, but it keeps on, until
it reaches >105C and throtles back, that all, no errors or reBOOT's.

Problem appears, when 2 NVIDIA GPU's on this Mobo are used, maybe XP64, also has something to do with this 'random stops' situation.

Have a 2nd ASUS P5E with 2 ATI cards, runs WIN XP 32BIT Pro, (EAH4850 & 5870)
PSU=650Watt and CPU Q6600, all stock. And never gave out. Used it for testing
ATI OpenCL/BROOK/CAL in SETI Bêta testing of Astropulse WU's.

This also happened with a GTX8900+ and GTS250.
Could be the PSU, as these FERMI's are known Amp eaters :)
Gonna try another Mobo, looking around...............as I still have no clue, why one Mobo with NVidia cards, behaves odd!

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,579
Credit: 307,046,605
RAC: 153,726

RE: And this host has NO

Message 99934 in response to message 99933

Quote:
And this host has NO case, no heat problem, though.
The below described host does get hot, CPU sometimes 99C, but it keeps on, until
it reaches >105C and throtles back, that all, no errors or reBOOT's.


So these are CPU temps? This is way too hot. Marginal for immediate damage and long term endurance! Plus I'm not sure if one could reliably diagnose any other issue in the presence of this. My view is that thermal throttling is a last resort measure, not the preferred normal instance, and one ought arrange matters to radically cool that CPU. My preference is for heavy lumps of sculpted copper [ HLOSC :-) ]. Gary Roberts admits to lapping the bases of these with fine abrasive paste for max CPU contact. My Northbridge comment refers to the GPU/CPU talk being via that chip - try simply touching the face of it with one's fingertip, if it's uncomfortable to do that, then that needs a cooling solution as well.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

mikey
mikey
Joined: 22 Jan 05
Posts: 12,555
Credit: 1,838,836,975
RAC: 23,916

RE: Sorry for my, sometimes

Message 99935 in response to message 99933

Quote:

Sorry for my, sometimes too often, late reply, but no, no fan on the NorthBridge. And yes PCI-Ex16 (2.0), but uses version 1.0
It's an ASUS P5E Mobo, QX9650 CPU & GTX 470 & 480, PSU=850Watt, 4x +12V, 17A rails.
And this host has NO case, no heat problem, though.
The below described host does get hot, CPU sometimes 99C, but it keeps on, until
it reaches >105C and throtles back, that all, no errors or reBOOT's.

Problem appears, when 2 NVIDIA GPU's on this Mobo are used, maybe XP64, also has something to do with this 'random stops' situation.

Have a 2nd ASUS P5E with 2 ATI cards, runs WIN XP 32BIT Pro, (EAH4850 & 5870)
PSU=650Watt and CPU Q6600, all stock. And never gave out. Used it for testing
ATI OpenCL/BROOK/CAL in SETI Bêta testing of Astropulse WU's.

This also happened with a GTX8900+ and GTS250.
Could be the PSU, as these FERMI's are known Amp eaters :)
Gonna try another Mobo, looking around...............as I still have no clue, why one Mobo with NVidia cards, behaves odd!

You run mismatched gpu's in the same machine? Doesn't that cause Boinc to run at the slowest of the 2 speeds and therefore waste the higher performance of the better one? Now this could make your heat problem WORSE but it seems to me you are wasting some money here. Try taking out the slower of the 2 cards and see if the heat problem dissipates a little and if you don't crunch the units a bit faster. Do you also crunch cpu units in those machines? I am guessing you do and that also could be contributing to your heat issues. We ALL crunch cpu and gpu units but in your case it seems to be causing a heat related issue. Try crunching with only 3 of the 4 cpu's for a little bit and see if it cools down alot, if so you have found your problem, and a more efficient cpu cooler may be in your future.

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1,079
Credit: 341,280
RAC: 0

RE: You run mismatched

Message 99936 in response to message 99935

Quote:
You run mismatched gpu's in the same machine? Doesn't that cause Boinc to run at the slowest of the 2 speeds and therefore waste the higher performance of the better one?


BOINC has no control over the speed of the GPUs.

The only backdraw with mismatched GPUs is that BOINC will never succeed in calculating the correct amount of tasks to download, because the Duration Correction Factor always changes.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

mikey
mikey
Joined: 22 Jan 05
Posts: 12,555
Credit: 1,838,836,975
RAC: 23,916

RE: RE: You run

Message 99937 in response to message 99936

Quote:
Quote:
You run mismatched gpu's in the same machine? Doesn't that cause Boinc to run at the slowest of the 2 speeds and therefore waste the higher performance of the better one?

BOINC has no control over the speed of the GPUs.

The only backdraw with mismatched GPUs is that BOINC will never succeed in calculating the correct amount of tasks to download, because the Duration Correction Factor always changes.

Gruß,
Gundolf

From the Dnetc boards:
"Problem with mixing cards is your speeds will default to whichever card has the lower speed, or at least that's what I've found to be the case ... "
http://www.dnetc.net/forum_thread.php?id=269 This is from a man that has a rac of OVER 7.25 MILLION credits!!! This has been brought up by MANY, MANY people over on both Dnetc and Colaltz MANY times, that is why I asked.

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1,079
Credit: 341,280
RAC: 0

RE: RE: RE: You run

Message 99938 in response to message 99937

Quote:
Quote:
Quote:
You run mismatched gpu's in the same machine? Doesn't that cause Boinc to run at the slowest of the 2 speeds and therefore waste the higher performance of the better one?

BOINC has no control over the speed of the GPUs.

The only backdraw with mismatched GPUs is that BOINC will never succeed in calculating the correct amount of tasks to download, because the Duration Correction Factor always changes.

Gruß,
Gundolf

From the Dnetc boards:
"Problem with mixing cards is your speeds will default to whichever card has the lower speed, or at least that's what I've found to be the case ... "
http://www.dnetc.net/forum_thread.php?id=269 This is from a man that has a rac of OVER 7.25 MILLION credits!!! This has been brought up by MANY, MANY people over on both Dnetc and Colaltz MANY times, that is why I asked.


But that still doesn't mean BOINC can do anything about that. As I understand it, that would be a thing for the drivers/libraries or at max for the project application.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

mikey
mikey
Joined: 22 Jan 05
Posts: 12,555
Credit: 1,838,836,975
RAC: 23,916

RE: RE: RE: RE: You

Message 99939 in response to message 99938

Quote:
Quote:
Quote:
Quote:
You run mismatched gpu's in the same machine? Doesn't that cause Boinc to run at the slowest of the 2 speeds and therefore waste the higher performance of the better one?

BOINC has no control over the speed of the GPUs.

The only backdraw with mismatched GPUs is that BOINC will never succeed in calculating the correct amount of tasks to download, because the Duration Correction Factor always changes.

Gruß,
Gundolf

From the Dnetc boards:
"Problem with mixing cards is your speeds will default to whichever card has the lower speed, or at least that's what I've found to be the case ... "
http://www.dnetc.net/forum_thread.php?id=269 This is from a man that has a rac of OVER 7.25 MILLION credits!!! This has been brought up by MANY, MANY people over on both Dnetc and Colaltz MANY times, that is why I asked.

But that still doesn't mean BOINC can do anything about that. As I understand it, that would be a thing for the drivers/libraries or at max for the project application.

Gruß,
Gundolf

You are correct, at some point Boinc is powerless to continue fixing things we users cause, but my point was to ask the OP if was aware of it and if he had noticed it? The easy fix is to put only matched cards in a single pc, but that then assumes that each person has unlimited numbers of pc's in which to crunch with. Not everyone does of course and hopefully we crunch with what we can for as long as we can.

Fred J. Verster
Fred J. Verster
Joined: 27 Apr 08
Posts: 118
Credit: 22,451,438
RAC: 0

Clear, that this isn't always

Message 99940 in response to message 99938

Clear, that this isn't always the case, when mixing cards:

Why gives a host with 2 ATI cards, a 4850 and a 5870, NO problems, both cards are using version 2.0 PCI-E x16 and difference in speed is clear. 5870 is on Collatz C. and M.W. 3x faster then 4850.
Ofcoarse, no crossfire. (And I did put a BIG cooler on the Q6600, temps are ~70C)
Power draw is, with both cards crunching 430-485 Watt.(PSU=650 Watt)

Same Mobo, but with 2 different nVidia cards, a 470 & 480, gives a lot of troubles?
Took out the 470, and all is well. For some reason, a bad mix, in this case. Maybe the X38 chipset, 'likes' ATI, over NVidia...........? (ASUS P5E)
Or the PSU, power draw is 385 Watt, with an GTX480, 90% load, CPU 100%.
{PSU=850 Watt}
Both running WINDOWS, but the above mentioned, uses the 32BIT version and the latter, XP64, both Pro.

Temps on the NorthBridge are normal ~45-50C, max.

Fred J. Verster
Fred J. Verster
Joined: 27 Apr 08
Posts: 118
Credit: 22,451,438
RAC: 0

OFF topicSigned up for DNETC,

Message 99941 in response to message 99940

OFF topic
Signed up for DNETC, today and to my surprise, they use CPU WU's and WU's that use 0.05CPU+2(ATI)GPU?! (Means, the 4850 & 5870 are working together.

On topic, again............

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.