I have a lot of validate errors(checked without consensus yet) on all of my 3 hosts. From hardware this is simply impossible. All these WUs have been crunched over the weekend during the server crash.
Anybody with the same experience?
cu,
Michael
Copyright © 2024 Einstein@Home. All rights reserved.
Validate errors
)
Hi!
This must be due to file upload problems when the server came alive again. I passed the info to the devs.
CU
Bikeman
RE: I have a lot of
)
I have this one. It was completed while E@H was down and I am pretty sure it uploaded and reported before everything was "back to normal". Like yours, my Workunit details also indicates a "Validate error", but the result says 'Checked, but no consensus yet". So, maybe there's still some hope for them.
RE: I have this one. It
)
The result reports:
That was when the upload handler was ok again but the DB and scheduler were still not completely back again.
CU
Bikeman
RE: I have a lot of
)
Yeah, I lost several to the same problem.
RE: RE: Anybody with the
)
Me too:
about 24 on my Q9550, reported at 19:25 UTC on March 30
about 14 on my Q6600, reported at 19:24
none on my E6600, despite doing reports at 19:23--however it had delayed its uploads until just before this even, while the others had uploaded many hours earlier.
The problem occurs only for
)
The problem occurs only for people using BOINC clients before version 6, ie 5.10.x or earlier. If you have a look back through your messages tab you will find messages to the effect "Giving up on upload of ...... : file not found".
At that point it is already too late as the output file (originally hough.out but renamed to a more descriptive name - eg h1_0462.75_S5R4__112_S5R5a_1_0 for example) has disappeared fron the E@H project directory and the BOINC client cannot find it any more. I've lost more than a hundred results to this BOINC bug and there was simply no way to have avoided it (for people using pre V6 clients) other than suspending network access for the duration of the outage. I've managed to save quite a few results by doing just that. I was also lucky enough to have recently converted a few hosts to 6.2.29 and they didn't lose the output files.
The bug was well known to the BOINC devs (I presume) - see this page which explains why this is happening with pre V6 core clients. I was amused by the implication of the final comment that the problem could be caused by project staff not shutting down the file upload handler by the approved method :-). I imagine it's a bit hard for staff to follow "approved procedures" if all is violently crashing around your ears :-).
To my way of thinking, people should have been warned (loud and often) when it was realised that a project outage like this could lead to the loss of all work completed until the project came back on line. It's too late now but the message needs to go out about the possible loss of results during outages if you don't upgrade beyond BOINC 5.10.x.
EDIT: Small correction - the version 6.2.29 mentioned above should of course be 6.2.19 which was the final Windows version. The Linux version was 6.2.15 and since over 80% of my machines run Linux it was that version that I had been experimenting with lately. With Linux the upgrade from 5.10.45 to 6.2.15 is so easy - just stop BOINC, copy the new boinc executables over the top of the existing ones and then restart BOINC.
Cheers,
Gary.
RE: The problem occurs only
)
Okay. . .
Sounds like I'll be doing some BOINC upgrades.
That does not seem fair to us
)
That does not seem fair to us volunteers if the programme that has been given to us, just 'gives up' on trying to communicate with a project.
The work we have done is still valid, or would be if BOINC Manager didn't dump a file and then make your valid work unit invalid.
I tried to update to 6.x but had to go back to 5.10.21 (Linux) due to the dismal benchmark scores I then received (well over 40% drop in Whetstone and Dhrystone).
As I still do some projects with benchmarking so I need the best score I can get on my now older computers.
Therefore 6.2 and 6.4 have been disappointing for me.
I have seen excellent benchmarks on some recent Intel machines and some recent Windows based AMD machines but not on my AMD Linux machines.
In future if it seems that a project will be off air for 3 days or more then I may as well abort all remaining work for that project as I wont get any credit anyway so why waste CPU time and power.
RE: That does not seem fair
)
There's another consideration as well. . .
Some Linux distros, such as Debian Etch, don't have new enough libraries to support the newer BOINC clients. On my two Etch machines, I can't run anything newer than BOINC 5.8.
RE: In future if it seems
)
You don't really have to abort if you maintain a cache sufficient to ride out the outage. You could simply suspend comms for the duration and allow the tasks to be crunched but not uploaded. After the outage you would just re-enable comms and allow the work to be uploaded and reported.
In my case, I actually had enough work to (mostly) outlast the outage. I just didn't realise early enough what was happening and that I needed to suspend comms in order to save the completed work. When I first saw the reason for the problem I started upgrading BOINC but quickly gave that away when I realised it was easier to suspend comms and save the results that way.
Cheers,
Gary.