Computation error???

crazyrabbit1
crazyrabbit1
Joined: 23 Sep 06
Posts: 34
Credit: 4207137
RAC: 0
Topic 194081

Since last week i have a strange problem. Last week wendsday at one moment all wu's finished with computation error. The same happens wendsday this week, between every thing is working fine. Made no changes on the system, no hardware no software, also the system is running overclocked to 3 since month and i have no problems at seti.
Yesterday and right now the same, all wu's gone to computation error. But this time i see something i did not understand.
One wu was finished and uploadet ...ok
Second just finished and was trying to upload....upload error, at this moment all other wu's go to computation error, missing outputfile ??? is this normal? And now the funning thing, seconds later the upload was finished. I have no idea where to search the problem.
Thanks for any idea, all are welcome, for now i go to no new work.

System q6600@3
Boinc 6.2.19
Einstein 6.05 or 6.09, trying both versions

Byron S Goodgame
Byron S Goodgame
Joined: 16 Jan 06
Posts: 187
Credit: 56581
RAC: 0

Computation error???

Quote:

Since last week i have a strange problem. Last week wendsday at one moment all wu's finished with computation error. The same happens wendsday this week, between every thing is working fine. Made no changes on the system, no hardware no software, also the system is running overclocked to 3 since month and i have no problems at seti.
Yesterday and right now the same, all wu's gone to computation error. But this time i see something i did not understand.
One wu was finished and uploadet ...ok
Second just finished and was trying to upload....upload error, at this moment all other wu's go to computation error, missing outputfile ??? is this normal? And now the funning thing, seconds later the upload was finished. I have no idea where to search the problem.
Thanks for any idea, all are welcome, for now i go to no new work.

System q6600@3
Boinc 6.2.19
Einstein 6.05 or 6.09, trying both versions

You're getting several different types of errors the first ones seem to be 119 MD5 check failed, which could be something to do with the app, or you may have to do a project reset becasue it could be that your public key got corrupted.

Could also be the OC and you may want to consider bringing it down to stock, run some more and see if you get the error.

Also one of the other errors I saw could be becuase in your Boinc Manager settings under processor usage you may have the cpu time set to something other than 100%

crazyrabbit1
crazyrabbit1
Joined: 23 Sep 06
Posts: 34
Credit: 4207137
RAC: 0

Thank's so far. I also

Thank's so far.

I also thought about a project reset, but i want to wait with it to have the chance to solve a problem with the app IF there is one. I really want to find out what the problem is, because it happens with a different bunch of wu's 618.30 and 618.35. i think.

Why i get all the different error masseges within seconds and not always the same.
Why i have no errors on seti with the OC.
fyi, my cpu time is at 100%

Thank's to everyone

Novasen169
Novasen169
Joined: 14 May 06
Posts: 43
Credit: 2767204
RAC: 0

RE: Also one of the other

Message 88947 in response to message 88945

Quote:
Also one of the other errors I saw could be becuase in your Boinc Manager settings under processor usage you may have the cpu time set to something other than 100%


People are always saying this causes errors, but I use it from time to time (I set it to 80%) and I've never even gotten one error because of that =/

jedirock
jedirock
Joined: 11 Jun 06
Posts: 23
Credit: 1517411
RAC: 0

RE: RE: Also one of the

Message 88948 in response to message 88947

Quote:
Quote:
Also one of the other errors I saw could be becuase in your Boinc Manager settings under processor usage you may have the cpu time set to something other than 100%

People are always saying this causes errors, but I use it from time to time (I set it to 80%) and I've never even gotten one error because of that =/


It doesn't cause errors with all the projects. I don't know how Einstein handles it, but ABC@Home is finicky with it. What the 80% time does is start and stop applications very quickly, I believe every second. So it runs the application for 0.8 seconds, then stops it for 0.2 seconds. If it's not in a second, then it some other unit of time, but that principle. If a project's application doesn't checkpoint often enough or at all, it can cause errors.

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: People are always

Message 88949 in response to message 88947

Quote:
People are always saying this causes errors, but I use it from time to time (I set it to 80%) and I've never even gotten one error because of that =/


It does cause frequent restarts of tasks with some machines (single and multi core, though often stated otherwise), at Einstein too.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 753898814
RAC: 1168427

Returning to crazyrabbits

Returning to crazyrabbits problem ....

Quote:
Since last week i have a strange problem. Last week wendsday at one moment all wu's finished with computation error. The same happens wendsday this week, between every thing is working fine. Made no changes on the system, no hardware no software, also the system is running overclocked to 3 since month and i have no problems at seti.
Yesterday and right now the same, all wu's gone to computation error. But this time i see something i did not understand.
One wu was finished and uploadet ...ok

The majority of errors are related to failing MD5 checksum checks, which suggests a problem with your PC or maybe with your internet connection. If there is a weekly pattern in the occurance of the problem, it might be worth looking at antivirus scan or other periodic stuff that might interfere with BOINC.

CU
Bikeman

crazyrabbit1
crazyrabbit1
Joined: 23 Sep 06
Posts: 34
Credit: 4207137
RAC: 0

Thank's just checked the

Thank's

just checked the disk- look's ok
made a defrag
no virus found
no periodic task. only a daily update of the virussoftware, but yesterday it happens not at the same time with the computation error.
just reseted the project and deleated all file in the einsteinfolder.

Downloaded new work and wait what happens.

Is there any tool to check the internetconnection? I realized no problems so far.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118173116898
RAC: 23798291

RE: ... at one moment all

Quote:

... at one moment all wu's finished with computation error.
....

System q6600@3
Boinc 6.2.19
Einstein 6.05 or 6.09, trying both versions

I have six Q6600s, all overclocked to near 3.0GHz and all with the stock heat sink and fan. The combination of overclocking and 100% CPU load creates a lot of heat and it's not surprising that problems could arise, particularly if there is a relatively high ambient temperature - perhaps around 35-40C in my case. I have seen pretty close to exactly the same sort of behaviour that you describe a couple of times previously. I'm pretty confident this was due to an overclock that was too ambitious for the stock cooler to cope with.

I cured this by backing off a little on the overclock. I probably could have also cured it by using a better cooler and/or by upping the Vcore until full stability was achieved. The easiest option for me was to lower the frequency so I chose that. The drop in performance was quite small.

It's a very long story but I was able to recover virtually the full cache of tasks (on several occasions) that had been trashed without having to reset the project or replace any data or program files that had previously been downloaded. Even though there were supposed to be files with bad MD5 checksums, I never found any evidence of file corruption. I believe that problems were possibly being caused by overclock induced failures in zipping/unzipping operations.

Cheers,
Gary.

crazyrabbit1
crazyrabbit1
Joined: 23 Sep 06
Posts: 34
Credit: 4207137
RAC: 0

Thank's again, just

Thank's again,

just finished the first new wu, no problems. next one in half an hour, just see.

heat should not be the problem, cpu 40c (107f) peak 43C (109f), cores 60c (140f) peak 65c (148f), seen higher values during summer.

Problems with the OCing after month without problems, hmmm, then something is going down.

If the problem is coming up next time i try to step down the oc a little bit. or raise the voltage on the memory a little bit, working since the beginning with undervoltage, who knows.

mikey
mikey
Joined: 22 Jan 05
Posts: 12764
Credit: 1848332354
RAC: 715193

RE: Thank's again, just

Message 88954 in response to message 88953

Quote:

Thank's again,

just finished the first new wu, no problems. next one in half an hour, just see.

heat should not be the problem, cpu 40c (107f) peak 43C (109f), cores 60c (140f) peak 65c (148f), seen higher values during summer.

Problems with the OCing after month without problems, hmmm, then something is going down.

If the problem is coming up next time i try to step down the oc a little bit. or raise the voltage on the memory a little bit, working since the beginning with undervoltage, who knows.

Is this happening on only one pc? Is so try going back to the standard, no overclocking, and see if it still happens. If not then you have your answer, if it does keep happening, then you have an answer too. Oc'ing can be a very touchy thing, one month it works fine, the next everything is down because this or that part was not capable of handling it long term.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.