Neeed help with compute errors

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686041283
RAC: 602978

Hi! The message you quoted

Message 89879 in response to message 89878

Hi!

The message you quoted is just a self-diagnostics message that notes that a value that should be positive is indeed negative so "something went wrong". This doesn't help us at all, because this could be caused by either a bug or a hardware malfunction.

If your wingman had similar problems I'd be more inclined to assume a bug in the software. We have seen an equally strange sporadic error under Linux which turned out to be caused by a bug in some versions of teh Linux operating system, so nothing is excluded.

CU
Bikeman

Nothing But Idle Time
Nothing But Idl...
Joined: 24 Aug 05
Posts: 158
Credit: 289204
RAC: 0

RE: The message you quoted

Message 89880 in response to message 89879

Quote:
The message you quoted is just a self-diagnostics message that notes that a value that should be positive is indeed negative so "something went wrong". This doesn't help us at all, because this could be caused by either a bug or a hardware malfunction.
CU
Bikeman

WARNING: Fixing yLower (-29538 -> 0) [HoughMap.c 771]

I focus on the word "FIXING" in the message; what is it fixing? Is there an attempt to update something on disk or in memory that is causing an access violation? If so, what is it about this process that leads to the violation? If I can prove that my computer and not the application is the source of my woes then I can concentrate (provide evidence to my wife) on getting a new computer.

As posted previously I've run memtest86 and prime95 and chkdsk and I can't uncover anything that points to my computer. If, OTH, the application is slightly buggy then let's quit sticking our heads in the sand and get to the bottom of the application problem. I don't accept that just because my wingman doesn't get an access violation doesn't mean that the application is OK. Frankly I don't care what or who is the source of the problem I just want to rectify it. For now I have no choice but to disconnect.

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: WARNING: Fixing yLower

Message 89881 in response to message 89880

Quote:

WARNING: Fixing yLower (-29538 -> 0) [HoughMap.c 771]

I focus on the word "FIXING" in the message; what is it fixing? Is there an attempt to update something on disk or in memory that is causing an access violation?


Just guessing here, but I think the warning is just a symptom, not the cause of the access violation. The negative value gets caught, but that what has caused the wrong value probably has caused something else...

Is there a possibility that (the contents of) the data file(s) is corrupted (download error or whatever)? Though the checksum should have dealt with that... just thinking.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Dagorath
Dagorath
Joined: 22 Apr 06
Posts: 146
Credit: 226423
RAC: 0

RE: If I can prove that my

Message 89882 in response to message 89880

Quote:
If I can prove that my computer and not the application is the source of my woes then I can concentrate (provide evidence to my wife) on getting a new computer.

Well why didn't you just say so? Here's what you do...

Grab a small pack of assorted transistors from Radio Shack for under $5. Take a few of them and hook them up to a 12 volt battery from a car or motorcycle. Disconnect the battery from the vehicle first so as not to damage the electronics in the vehicle. You can hook up transistors without resistors only so many ways and most ways will cause them to explode and stink. Wear gloves and safety glasses for protection from the heat and flying bits. Blow the transistors in the computer room so the smoke lingers a little. Do it just before she comes home. Make sure you keep 1 good one to show her how they look before they fry.

Now shutdown the computer, open the case, disconnect the power supply from the drives and motherboard. Toss the blown up transistors onto the floor of the case. Turn off the breaker that feeds that room. When she comes home you turn the computer on but of course it won't boot. Then discover the "tripped" breaker and give her the first "Uh oh, this doesn't look good." Now try powering up the computer again. Now a tear should trickle down your cheek as you moan "All I ever wanted to do was help mankind through research." Then open the case and "discover" the transistors that "blew right off the mother board". Man, it must have been one huge power spike!!! Now your bottom lip should quiver as you begin to sob and thank god she wasn't touching the computer when the spike hit else you be all alone and unable to carry on, there could never be another one like her.

A month or 2 after you have the replacement machine bought and paid for, one of your cruncher buddys at Einstein tells you how to fix the old machine. Well, now that you have 2 machines you of course need a router, a KVM switch and top dollar spike protector. Buy a 4 port KVM, even though you need only a 2 port at this time because next year the insidious vBLOtran virus is gonna cause a "virtual power spike" that overrides the spike protector and blows the transistors clean off the mobo in your new machine. The wizards will tell you there is no way a your new machine can be fixed but a month after the 3rd machine is broke in and not returnable, Fairchild Electronics invents the Universal Transistor that can replace ANY blown transistor on ANY mobo.

Of course if she's blonde you can just squirt some dirty oil in the case and tell her the seals blew, the oil leaked out and then the pistons in all the charge pumps ceased.

Nothing But Idle Time
Nothing But Idl...
Joined: 24 Aug 05
Posts: 158
Credit: 289204
RAC: 0

RE: Well why didn't you

Message 89883 in response to message 89882

Quote:
Well why didn't you just say so? Here's what you do...

That's diabolical and intricate...have you done this before? Unfortunately my dumb blonde of a wife has a PhD and I don't think she is likely to fall for the quivering lip approach (damn!) Hell, if I fell off the roof she would mutter to herself "Damn, he's still alive!"

Dagorath
Dagorath
Joined: 22 Apr 06
Posts: 146
Credit: 226423
RAC: 0

RE: RE: Well why didn't

Message 89884 in response to message 89883

Quote:
Quote:
Well why didn't you just say so? Here's what you do...

That's diabolical and intricate...have you done this before? Unfortunately my dumb blonde of a wife has a PhD and I don't think she is likely to fall for the quivering lip approach (damn!) Hell, if I fell off the roof she would mutter to herself "Damn, he's still alive!"

Diabolical for sure but not nearly as diabolical as some of Honey's tricks :)

I've never done that but it worked for a friend a while back. If I did it Honey would say I had too many crunch boxes anyway and now I just have 1 less which is still too many in her opinion.

I usually don't go up on the roof with Honey but if I have to I don't turn my back on her ;)

Nothing But Idle Time
Nothing But Idl...
Joined: 24 Aug 05
Posts: 158
Credit: 289204
RAC: 0

RE: ...Just guessing here,

Message 89885 in response to message 89881

Quote:
...Just guessing here, but I think the warning is just a symptom, not the cause of the access violation. The negative value gets caught, but that what has caused the wrong value probably has caused something else...
Gruß,
Gundolf


Sorry for being slow to respond. Your intuition falls in line with my own but I don't know what to do about it. It implies (I think) that my fpu/cpu is flakey. However, I thought access violations result generally from bad pointers and I don't know what causes that -- application or hardware?

I ran prime95 for 24 hours testing memory and chips and found nothing wrong. However, I get access violations with some Rosetta tasks also. I understand that Einstein tasks are floating point intensive while Rosetta tasks are more integer intensive which is partly why I run these tasks opposite each other on my hyper-threaded machine. All this speculation is getting me nowhere. At least at Rosetta I have seen people there report getting access violations, too. I'm not the only one, so it's hard to blame my computer if others are experiencing the same. At Rosetta the programming itself has been flakey for a very long time due to the complexity of what they try to do.

Pooh Bear 27
Pooh Bear 27
Joined: 20 Mar 05
Posts: 1376
Credit: 20312671
RAC: 0

RE: I ran prime95 for 24

Message 89886 in response to message 89885

Quote:
I ran prime95 for 24 hours testing memory and chips and found nothing wrong.


Did you run Prime95 on both cores (you need to run it twice)? This would then test both your cores. One core could be fine, one could be bad.

Nothing But Idle Time
Nothing But Idl...
Joined: 24 Aug 05
Posts: 158
Credit: 289204
RAC: 0

RE: RE: I ran prime95 for

Message 89887 in response to message 89886

Quote:
Quote:
I ran prime95 for 24 hours testing memory and chips and found nothing wrong.

Did you run Prime95 on both cores (you need to run it twice)? This would then test both your cores. One core could be fine, one could be bad.


You can select a stress test as well as select one thread or two; I ran two threads simultaneously.

Dagorath
Dagorath
Joined: 22 Apr 06
Posts: 146
Credit: 226423
RAC: 0

RE: RE: ...Just guessing

Message 89888 in response to message 89885

Quote:
Quote:
...Just guessing here, but I think the warning is just a symptom, not the cause of the access violation. The negative value gets caught, but that what has caused the wrong value probably has caused something else...
Gruß,
Gundolf

Sorry for being slow to respond. Your intuition falls in line with my own but I don't know what to do about it. It implies (I think) that my fpu/cpu is flakey. However, I thought access violations result generally from bad pointers and I don't know what causes that -- application or hardware?

There is no single cause. It could be either or a combination of both. The available evidence does not favor one over the other. Gather more evidence, if you have the time, and the true cause will become more apparent.

Quote:
I ran prime95 for 24 hours testing memory and chips and found nothing wrong.

Yah but all that indicates is that your system is "pretty good". It doesn't mean it's rock solid. Run 2 instances of prime95 for 6 days and see if you don't get an error.

Quote:
All this speculation is getting me nowhere.

Without more evidence you have only speculation. One way to get more evidence is to run the apps under a debugger. Or swap out components one at a time to see which one, if any, causes the problem.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.