Validate errors

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0
Topic 194254

I have a lot of validate errors(checked without consensus yet) on all of my 3 hosts. From hardware this is simply impossible. All these WUs have been crunched over the weekend during the server crash.

Anybody with the same experience?

cu,
Michael

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686042976
RAC: 586832

Validate errors

Quote:

I have a lot of validate errors(checked without consensus yet) on all of my 3 hosts. From hardware this is simply impossible. All these WUs have been crunched over the weekend during the server crash.

Anybody with the same experience?

cu,
Michael

Hi!

This must be due to file upload problems when the server came alive again. I passed the info to the devs.

CU
Bikeman

Stick
Stick
Joined: 24 Feb 05
Posts: 790
Credit: 31192054
RAC: 415

RE: I have a lot of

Quote:

I have a lot of validate errors(checked without consensus yet) on all of my 3 hosts. From hardware this is simply impossible. All these WUs have been crunched over the weekend during the server crash.

Anybody with the same experience?

cu,
Michael

I have this one. It was completed while E@H was down and I am pretty sure it uploaded and reported before everything was "back to normal". Like yours, my Workunit details also indicates a "Validate error", but the result says 'Checked, but no consensus yet". So, maybe there's still some hope for them.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686042976
RAC: 586832

RE: I have this one. It

Message 90986 in response to message 90985

Quote:

I have this one. It was completed while E@H was down and I am pretty sure it uploaded and reported before everything was "back to normal".

The result reports:

Received	30 Mar 2009 19:19:07 UTC

That was when the upload handler was ok again but the DB and scheduler were still not completely back again.

CU
Bikeman

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: I have a lot of

Quote:

I have a lot of validate errors(checked without consensus yet) on all of my 3 hosts. From hardware this is simply impossible. All these WUs have been crunched over the weekend during the server crash.

Anybody with the same experience?

cu,
Michael

Yeah, I lost several to the same problem.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023324931
RAC: 1821328

RE: RE: Anybody with the

Message 90988 in response to message 90987

Quote:
Quote:
Anybody with the same experience?

Yeah, I lost several to the same problem.


Me too:
about 24 on my Q9550, reported at 19:25 UTC on March 30
about 14 on my Q6600, reported at 19:24
none on my E6600, despite doing reports at 19:23--however it had delayed its uploads until just before this even, while the others had uploaded many hours earlier.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109387336802
RAC: 35923576

The problem occurs only for

The problem occurs only for people using BOINC clients before version 6, ie 5.10.x or earlier. If you have a look back through your messages tab you will find messages to the effect "Giving up on upload of ...... : file not found".

At that point it is already too late as the output file (originally hough.out but renamed to a more descriptive name - eg h1_0462.75_S5R4__112_S5R5a_1_0 for example) has disappeared fron the E@H project directory and the BOINC client cannot find it any more. I've lost more than a hundred results to this BOINC bug and there was simply no way to have avoided it (for people using pre V6 clients) other than suspending network access for the duration of the outage. I've managed to save quite a few results by doing just that. I was also lucky enough to have recently converted a few hosts to 6.2.29 and they didn't lose the output files.

The bug was well known to the BOINC devs (I presume) - see this page which explains why this is happening with pre V6 core clients. I was amused by the implication of the final comment that the problem could be caused by project staff not shutting down the file upload handler by the approved method :-). I imagine it's a bit hard for staff to follow "approved procedures" if all is violently crashing around your ears :-).

To my way of thinking, people should have been warned (loud and often) when it was realised that a project outage like this could lead to the loss of all work completed until the project came back on line. It's too late now but the message needs to go out about the possible loss of results during outages if you don't upgrade beyond BOINC 5.10.x.

EDIT: Small correction - the version 6.2.29 mentioned above should of course be 6.2.19 which was the final Windows version. The Linux version was 6.2.15 and since over 80% of my machines run Linux it was that version that I had been experimenting with lately. With Linux the upgrade from 5.10.45 to 6.2.15 is so easy - just stop BOINC, copy the new boinc executables over the top of the existing ones and then restart BOINC.

Cheers,
Gary.

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: The problem occurs only

Message 90990 in response to message 90989

Quote:

The problem occurs only for people using BOINC clients before version 6, ie 5.10.x or earlier. If you have a look back through your messages tab you will find messages to the effect "Giving up on upload of ...... : file not found".

At that point it is already too late as the output file (originally hough.out but renamed to a more descriptive name - eg h1_0462.75_S5R4__112_S5R5a_1_0 for example) has disappeared fron the E@H project directory and the BOINC client cannot find it any more. I've lost more than a hundred results to this BOINC bug and there was simply no way to have avoided it (for people using pre V6 clients) other than suspending network access for the duration of the outage. I've managed to save quite a few results by doing just that. I was also lucky enough to have recently converted a few hosts to 6.2.29 and they didn't lose the output files.

The bug was well known to the BOINC devs (I presume) - see this page which explains why this is happening with pre V6 core clients. I was amused by the implication of the final comment that the problem could be caused by project staff not shutting down the file upload handler by the approved method :-). I imagine it's a bit hard for staff to follow "approved procedures" if all is violently crashing around your ears :-).

To my way of thinking, people should have been warned (loud and often) when it was realised that a project outage like this could lead to the loss of all work completed until the project came back on line. It's too late now but the message needs to go out about the possible loss of results during outages if you don't upgrade beyond BOINC 5.10.x.

Okay. . .

Sounds like I'll be doing some BOINC upgrades.

Conan
Conan
Joined: 19 Jun 05
Posts: 172
Credit: 7099171
RAC: 2415

That does not seem fair to us

That does not seem fair to us volunteers if the programme that has been given to us, just 'gives up' on trying to communicate with a project.

The work we have done is still valid, or would be if BOINC Manager didn't dump a file and then make your valid work unit invalid.

I tried to update to 6.x but had to go back to 5.10.21 (Linux) due to the dismal benchmark scores I then received (well over 40% drop in Whetstone and Dhrystone).

As I still do some projects with benchmarking so I need the best score I can get on my now older computers.
Therefore 6.2 and 6.4 have been disappointing for me.

I have seen excellent benchmarks on some recent Intel machines and some recent Windows based AMD machines but not on my AMD Linux machines.

In future if it seems that a project will be off air for 3 days or more then I may as well abort all remaining work for that project as I wont get any credit anyway so why waste CPU time and power.

Donald A. Tevault
Donald A. Tevault
Joined: 17 Feb 06
Posts: 439
Credit: 73516529
RAC: 0

RE: That does not seem fair

Message 90992 in response to message 90991

Quote:

That does not seem fair to us volunteers if the programme that has been given to us, just 'gives up' on trying to communicate with a project.

The work we have done is still valid, or would be if BOINC Manager didn't dump a file and then make your valid work unit invalid.

I tried to update to 6.x but had to go back to 5.10.21 (Linux) due to the dismal benchmark scores I then received (well over 40% drop in Whetstone and Dhrystone).

As I still do some projects with benchmarking so I need the best score I can get on my now older computers.
Therefore 6.2 and 6.4 have been disappointing for me.

I have seen excellent benchmarks on some recent Intel machines and some recent Windows based AMD machines but not on my AMD Linux machines.

In future if it seems that a project will be off air for 3 days or more then I may as well abort all remaining work for that project as I wont get any credit anyway so why waste CPU time and power.

There's another consideration as well. . .

Some Linux distros, such as Debian Etch, don't have new enough libraries to support the newer BOINC clients. On my two Etch machines, I can't run anything newer than BOINC 5.8.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109387336802
RAC: 35923576

RE: In future if it seems

Message 90993 in response to message 90991

Quote:
In future if it seems that a project will be off air for 3 days or more then I may as well abort all remaining work for that project as I wont get any credit anyway so why waste CPU time and power.


You don't really have to abort if you maintain a cache sufficient to ride out the outage. You could simply suspend comms for the duration and allow the tasks to be crunched but not uploaded. After the outage you would just re-enable comms and allow the work to be uploaded and reported.

In my case, I actually had enough work to (mostly) outlast the outage. I just didn't realise early enough what was happening and that I needed to suspend comms in order to save the completed work. When I first saw the reason for the problem I started upgrading BOINC but quickly gave that away when I realised it was easier to suspend comms and save the results that way.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.