2 different hosts got a validate error on the same WU. The workunit gets send out again. I have no idea what happened, it just shows again why I really don't like these long workunits.
cu,
Michael
Copyright © 2024 Einstein@Home. All rights reserved.
Validate Error - Why?
)
Thanks for reporting, this looks a bit suspicious. Usually when two hosts fail to validate, the results are in state "no consensus yet" and the third one will match one of the submitted results. But in this case both results are invalid, which must mean that both failed individually. Possible, but somewhat unlikely. I forwarded this to the team so they can have a close look on this one.
The units are near the far end of the frequency range, so I would not rule out a yet undiscovered bug in the validator.
CU
Bikeman
"Validate error" != "Checked,
)
"Validate error" != "Checked, but no consensus yet"
"Validate error" usually means that no result data file is found on the server for the validator to do its work on - we see it at SETI, often when the BOINC 'report' stage follows too quickly after the result 'upload' stage. But to lose two result files this way, on two different days, is indeed unusual and unfortunate.
Hmmm... Agreed, this looks
)
Hmmm...
Agreed, this looks like the backend lost the output files from both hosts. If it was a CBNC, they should still be in the 'Pending' state.
Alinator
I suspect some problem with
)
I suspect some problem with the unzipping code. I brought this to Bernd's attention. Stay tuned, and again, thanks for the report!!
CU
Bikeman
Yep, that makes sense. I
)
Yep, that makes sense. I noticed there was some backend 'gagging' happening off and on with the project a few days ago or so, but fortunately for me none of my hosts had to upload or report any work then. ;-)
Obviously, other folks weren't so lucky. :-(
Just an observation about backend failures and other issues like this wrt EAH, my records show that EAH's overall reliability and task failure rate for all reasons is at least an order magnitude better than any other project I run, regardless of task runtime length.
Alinator
RE: "Validate error" !=
)
I had 4 or 5 of these right after the change over last month. I think the difference was that it validated on 2 other computers (sent out to 3rd one after mine came back as invalid). Looking at the wu details it exited as it was supposed to. I memtioned something about this last month when others were reporting the same thing. (Whole thread asking everyone to report any of these)I don't know what ever happened with them as they seem to have disappeared during the last week. I don't know if I was ever given credit for them out not but that's the way it is sometimes.
Just remembered there were a couple that were invalid with all 3 hosts as well.
Thanks, for your
)
Thanks, for your explanations, my wingman's host got 3 more errors at the same time.
The time between uploading and reporting must have been far big enough.
My host is a root server which does only report when requesting new work.
So I also guess there was some kind of server trouble.
Too bad,
Michael
[edit] upload«»report
The word from Bernd is that
)
The word from Bernd is that the problem was caused by a new validator which was installed and run for about a minute. During that time most results that it handled were marked as invalid. Needless to say that the old validator was put back into service immediately until the problem with the new one can be diagnosed.
Bernd says that he will cause the trashed results to be fixed (re-validated) as soon as he gets a chance.
Cheers,
Gary.
The 61 workunits (and their
)
The 61 workunits (and their results) affected have been marked for validation again. Re-checking should happen in a few minutes in most cases, in a few that will have to wait until the extra tasks already sent out have come back.
BM
BM
RE: 2 different hosts got a
)
I see you've assigned ATLAS to re-validate that original WU, Bernd! Now that's what I call service.