Invalid Results-way too many

Jayargh
Jayargh
Joined: 9 Feb 05
Posts: 64
Credit: 1,205,159
RAC: 0
Topic 189859

I'm going to start this by re-posting from our 2CPU.com message boards
While LHC was out of work I had a host running some Einstein and damn if I didn't get an invalid result http://einsteinathome.org/workunit/1821847
This was part of the issue I stopped doing Einstein b4 and keeps me from doing it on a regular basis as this is the only Boinc project I get invalid results from. This wu was screwy as 1 other host was invalid and they granted credit after only 2 results were received. Anyone else running Einstein have a problem with invalid

The response from one of my teamates :Host ID: Work Unit IDs
400941: 1891407
400952: 1916105, 1923846
400949: 1885401, 1918831
400962: 1946403
400957: 1901577, 1900657
404935: 1899696
400897: 1900599, 1900612
All these with errors,now it would seem to me that there is some kind of validation problem and has been since I joined way back in february. Perhaps if you folks look into this and solve you might get a few more crunchers back on board.....like I said this is the ONLY project that gets invalid results and ever has on my machines and I have over 300k Boinc Credits...I like the project but will not up my resources until y'all can fix this COMPLETLY. Hope I can get an admin response to this..

Stick
Stick
Joined: 24 Feb 05
Posts: 790
Credit: 2,891,373
RAC: 7,806

Invalid Results-way too many

I, too, was getting invalid results for a while. Mine started right after I upgraded to BOINC 4.45 (from 4.19). I saw a post on the "problems and bugs" message board that suggested trying the Einstein Beta Application. I did and I haven't had an invalid result since. Might be worth a try.

Jayargh
Jayargh
Joined: 9 Feb 05
Posts: 64
Credit: 1,205,159
RAC: 0

Stick- My Invalids started

Stick- My Invalids started wiyh Boinc4.19....and continued....4.25...4.45...and now 4.72.
It seems platform made little difference...perhaps application IS the key. Can you link me( and others obviously) the Beta application?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,162
Credit: 38,564,978,412
RAC: 43,809,290

RE: I'm going to start this

Quote:
I'm going to start this by re-posting from our 2CPU.com message boards
While LHC was out of work I had a host running some Einstein and damn if I didn't get an invalid result http://einsteinathome.org/workunit/1821847
This was part of the issue I stopped doing Einstein b4 and keeps me from doing it on a regular basis as this is the only Boinc project I get invalid results from. This wu was screwy as 1 other host was invalid and they granted credit after only 2 results were received. Anyone else running Einstein have a problem with invalid

I'm sorry, but I've never had problems with validation at EAH. The WUID you quote seems to be no longer in the online database so it's a bit difficult to investigate it. However, at the time of the validation process there must have been at least three successfully completed results for the validator to be called. If two agree but one doesn't, then that one will be marked invalid whilst the other two will be marked valid and credit will be issued. EAH is a bit different from other projects in this regard, I think. I believe it's a project settable policy. I believe other projects will wait for a third agreeing result before validation occurs.

Quote:
The response from one of my teamates :Host ID: Work Unit IDs
400941: 1891407
400952: 1916105, 1923846
400949: 1885401, 1918831
400962: 1946403
400957: 1901577, 1900657
404935: 1899696
400897: 1900599, 1900612
All these with errors,now it would seem to me that there is some kind of validation problem and has been since I joined way back in february. Perhaps if you folks look into this and solve you might get a few more crunchers back on board.....like I said this is the ONLY project that gets invalid results and ever has on my machines and I have over 300k Boinc Credits...I like the project but will not up my resources until y'all can fix this COMPLETLY. Hope I can get an admin response to this..

I've looked at about 6 of the above results and none so far have anything to do with validation. The hosts mentioned are part of a group of 88 machines which would seem to indicate some sort of large computer lab or centre. Many were initially fired up at about the same time and some were single cpu and some had two. Interestingly, the problems were client errors and were with the initial result crunched (or two results if it were a dual). After that all subsequent results were valid (or so it seems, as I gave up when I saw the trend).

In each of the ones I looked at the science app terminated abnormally with an exit code of -185 and a message about "One or more missing files ... ". This is happening way before validation and is nothing to do with validation. It would seem to be something to do with how the boxes were setup or operator/user interference of some sort. Perhaps the normal users of some of the boxes noticed a 99% process and decided to try deleting a few files ... :).

In any case, whatever the cause, it's a bit tough to blame EAH validation for something that would seem to have nothing to do with EAH validation.

Cheers,
Gary.

Jayargh
Jayargh
Joined: 9 Feb 05
Posts: 64
Credit: 1,205,159
RAC: 0

ok Gary Roberts I still stand

ok Gary Roberts I still stand by my result 1821847 that one of the "validated" results had an extrmely short compute time and the validator still validated. That raised my eyebrows especially when the 4th result had similar computation times as mine and the 1 normal period computation time with the validate. Isn't it odd that 3 similar computation times only 1 validates and a short also validates? Too bad that wu no longer available because the short was only about 2-3k seconds ...seemingly impossible.....as well as 2 different hosts with "normal" cpu times getting invalids at the same time...still smells like a validation problem to me.... perhaps with 800k+credits you are not bothering to check logs enough to know like the lil guy :)....please explain all that

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,162
Credit: 38,564,978,412
RAC: 43,809,290

RE: ok Gary Roberts I still

Message 16393 in response to message 16392

Quote:
ok Gary Roberts I still stand by my result 1821847 that one of the "validated" results had an extrmely short compute time and the validator still validated. That raised my eyebrows especially when the 4th result had similar computation times as mine and the 1 normal period computation time. Isn't it odd that 3 similar computation times only 1 validates and a short also validates?....please explain that

Actually, as I explained in my message, the web server tells me that that particular WUID is "unavailable". Your initial description didn't mention anything about an "extremely short compute time". Am I to presume that your complaint is about a result being declared valid where the cpu time seems to be impossibly short?

If so, then there is a very easy explanation. There are circumstances where communication is lost between the science app and BOINC because one or other was terminated for some reason. BOINC notices this and it restarts the science app from the last saved checkpoint. However the visible elapsed time is zeroed even though the result is resuming perhaps even when already 95% complete. When the science app finishes normally, the result is still good and will validate but the visible elapsed time can be impossibly short. I've personally seen several cases like this in the last month or so. No problem for the validator and no problem for the science. The result really did take the correct amount of time.

Cheers,
Gary.

Jayargh
Jayargh
Joined: 9 Feb 05
Posts: 64
Credit: 1,205,159
RAC: 0

ok then Gary is it NOT a

ok then Gary is it NOT a validator problem if 2 subsequent results (and mine had a 0 exit status) now are invalid with "normal " computation times and 0 credit to show for it after 10-13 hours computation time? What are the odds? I still fail to understand your reasoning that this is explainable and NOT a problem?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,162
Credit: 38,564,978,412
RAC: 43,809,290

RE: ... My Invalids started

Message 16395 in response to message 16390

Quote:
... My Invalids started wiyh Boinc4.19....and continued....4.25...4.45...and now 4.72 ....

I just had a look through your results list to see if there were any current examples of invalid results. I didn't find any currently but I'm not doubting that it's a real problem for you. Perhaps you might be prepared to restart limited crunching on your two boxes that currently have no work and post a message as soon as an invalid turns up. I'm sure there would be many people far more knowledgeable than me who would be only too willing to look at this in some detail to try to isolate exactly what is causing the problem.

Best of luck with it.

Cheers,
Gary.

Jayargh
Jayargh
Joined: 9 Feb 05
Posts: 64
Credit: 1,205,159
RAC: 0

I'm sure there would be many

I'm sure there would be many people far more knowledgeable than me who would be only too willing to look at this in some detail to try to isolate exactly what is causing the problem.

Best of luck with it.

Thank-you Gary for your input it has been appreciated and also if I could get a link to the new application it may help. These problems were also on the other hosts...many time with 3 validating and me not. With again a 0 exit status. But "shorts" had always worked into this. I'm not trying to create a bitch session...(though other recent instances noticed by readership could help)....I'm trying to get admin too look into and solve the problem somehow.....I just want to stop the invalids to contribute more(seems less than 5% but 1% is significant)....maybe it is new application?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,162
Credit: 38,564,978,412
RAC: 43,809,290

RE: ok then Gary is it NOT

Message 16397 in response to message 16394

Quote:
ok then Gary is it NOT a validator problem if 2 subsequent results (and mine had a 0 exit status) now are invalid with "normal " computation times and 0 credit to show for it after 10-13 hours computation time? What are the odds? I still fail to understand your reasoning that this is explainable and NOT a problem?

Without being able to look at the results in question, who can say anything with certainty? All I did was provide a possible explanation of why a "short time" result may be in fact quite valid. This is something that has been seen quite a few times lately. Do a search of the message boards and you will find plenty of examples. Before I was aware of the reason, I complained about "impossibly short" results being declared valid too. However I'm now more than happy to accept the explanation for this. On the other hand, I've not seen any other examples of people complaining about validation. On the law of averages, which explanation would you choose??

Quite some time ago there was some issue with validation when there was a mix of windows boxes and linux boxes doing the crunching. If two windows boxes got in first then the linux result was declared invalid but if two linux boxes got in first then the windows result was invalidated. I believe this was a problem attributable to the different precisions of the two maths libraries. I believe the problem was solved by making the validation less strict in some way. Whatever happened, the complaints ceased and it's a long time since I've seen that particular issue. I don't suppose you made a note of the OS's used in all the results for WUID=1821847 by any chance?

Cheers,
Gary.

Jayargh
Jayargh
Joined: 9 Feb 05
Posts: 64
Credit: 1,205,159
RAC: 0

RE: RE: I don't

Message 16398 in response to message 16397

Quote:
Quote:

I don't suppose you made a note of the OS's used in all the results for WUID=1821847 by any chance?

No (sheepishly) I only remember the other invalid was windows XP like me.I'm hoping others will respond like Stick did at least somewhat saying the same as me.Otherwise I have a stand alone problem I would still like to see this addressed by admin cause THEY can still look at wu(prob is that other invalid on wu so I think I'm not alone and hence started this thread.)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.