No change. The proper quorum of at least 3 results were successfuly returned. Two of the results were found to be accurate enough to pass validation and have credit awarded. The other two evidently were just far enough out of bounds to be called invalid.
For WU 1906702, 4 results are listed, namely
. ID 8040829 invalid (app 4.82)
. ID 8040830 invalid (app 4.82, my computer)
. ID 8040831 valid (app 4.79)
. ID 8040832 valid (app 4.79)
and credit has been granted for the 2 computers with valid results.
The rule, as I understand it, states that a minimum of 3 valid results is required before the result is used for further calculations and credit is granted. What worries me is not the issue of credit, but the fact that what I have noticed may reflect a bug in the validation procedure. By the way, I have not seen the same problem for any other WU that I was curious enough to examine (at least 50 of them).
Please, BA, BM or one their Colleagues, let me know that you have seen this message. Thanks, JJ;
Did you notice that one of the work units finished in about 700 seconds, and was given credit? One of the administrators should definitely check up on that, there is obviously something seriously wrong there.
One more thing, I notice your turnaround time is 3 days, but you have 8 work units outstanding. It will surely take 24 days to finish those work units, but the deadline is 14 days. You should at least halve your 'connect to network' time.
Verty was looking at another of the work units in the results list of the original poster, just 3 places away from the invalid result. The WUID is 1932119. If you check the result ID for this WUID, you will see a result whose crunch time was only 657 seconds but it was validated and awarded full credit, far in excess of what was actually claimed. How can such a result be valid??? Surely something must be seriously wrong??
The main reason I decided to post a reply in this thread is that I've recently observed similar behaviour in another machine using the Darwin OS. Have a look at CPUID 939 and you will notice several validated results which have taken far less than the normal time, including one that took just 1422 seconds instead of around 34,000 seconds.
Can any of the Devs explain what is going on with these extremely short but supposedly valid results? My only concern is for the validity of the science.
Verty was looking at another of the work units in the results list of the original poster, just 3 places away from the invalid result. The WUID is 1932119. If you check the result ID for this WUID, you will see a result whose crunch time was only 657 seconds but it was validated and awarded full credit, far in excess of what was actually claimed. How can such a result be valid??? Surely something must be seriously wrong??
There is something wrong, but nothing serious. This is the infamous
"No heartbeat from core client"-bug. Sometimes more than one science-app
runs per CPU-core. One is exiting with the message above. If it is
restarted it will resume normally, but CPU time starts from zero.
There is something wrong, but nothing serious. This is the infamous
"No heartbeat from core client"-bug ....
OK, I see that error message as part of the error output when you examine the details of the appropriate resultID. So I went and checked out CPUID 939 where there were three quite short run but otherwise valid results. No such error messages with those three.
Is this the same bug, somehow without the error message??
There is something wrong, but nothing serious. This is the infamous
"No heartbeat from core client"-bug ....
OK, I see that error message as part of the error output when you examine the details of the appropriate resultID. So I went and checked out CPUID 939 where there were three quite short run but otherwise valid results. No such error messages with those three.
Is this the same bug, somehow without the error message??
Probably a different bug and not Einstein... BOINC Manager 4.43 (for Mac) sometimes got "confused" when a new WU for a different project is downloaded and immediately starts running; the WU that was running and is now paused gets the CPU time and the Progress reset to 0. Whenever it gets to run again, the Progress value is corrected, but the CPU time is not. I've been 6-7 hours into a WU and had this happen and it soon shows 90% complete with 3 minutes of CPU time, estimated completion in a few seconds - and it sits this way, very slowly changing, for that last half-hour of processing. When the result is finally sent, every indication (including requested credit) is that I did that WU in a half hour, when it really took closer to eight.
Mac version 4.72 so far seems to not have the same problem.
Credit granted with only two valid results for WU 1906702
)
No change. The proper quorum of at least 3 results were successfuly returned. Two of the results were found to be accurate enough to pass validation and have credit awarded. The other two evidently were just far enough out of bounds to be called invalid.
I do not understand David
)
I do not understand David Knittle's reply.
For WU 1906702, 4 results are listed, namely
. ID 8040829 invalid (app 4.82)
. ID 8040830 invalid (app 4.82, my computer)
. ID 8040831 valid (app 4.79)
. ID 8040832 valid (app 4.79)
and credit has been granted for the 2 computers with valid results.
The rule, as I understand it, states that a minimum of 3 valid results is required before the result is used for further calculations and credit is granted. What worries me is not the issue of credit, but the fact that what I have noticed may reflect a bug in the validation procedure. By the way, I have not seen the same problem for any other WU that I was curious enough to examine (at least 50 of them).
Please, BA, BM or one their Colleagues, let me know that you have seen this message. Thanks, JJ;
Did you notice that one of
)
Did you notice that one of the work units finished in about 700 seconds, and was given credit? One of the administrators should definitely check up on that, there is obviously something seriously wrong there.
One more thing, I notice your turnaround time is 3 days, but you have 8 work units outstanding. It will surely take 24 days to finish those work units, but the deadline is 14 days. You should at least halve your 'connect to network' time.
RE: Did you notice that one
)
?
Verty, are you looking at the right work unit Jean is talking about?
http://einsteinathome.org/workunit/1906702
RE: RE: Did you notice
)
Verty was looking at another of the work units in the results list of the original poster, just 3 places away from the invalid result. The WUID is 1932119. If you check the result ID for this WUID, you will see a result whose crunch time was only 657 seconds but it was validated and awarded full credit, far in excess of what was actually claimed. How can such a result be valid??? Surely something must be seriously wrong??
The main reason I decided to post a reply in this thread is that I've recently observed similar behaviour in another machine using the Darwin OS. Have a look at CPUID 939 and you will notice several validated results which have taken far less than the normal time, including one that took just 1422 seconds instead of around 34,000 seconds.
Can any of the Devs explain what is going on with these extremely short but supposedly valid results? My only concern is for the validity of the science.
Cheers,
Gary.
RE: Verty was looking at
)
There is something wrong, but nothing serious. This is the infamous
"No heartbeat from core client"-bug. Sometimes more than one science-app
runs per CPU-core. One is exiting with the message above. If it is
restarted it will resume normally, but CPU time starts from zero.
So the science is absolutely valid.
Michael
Team Linux Users Everywhere
RE: There is something
)
OK, I see that error message as part of the error output when you examine the details of the appropriate resultID. So I went and checked out CPUID 939 where there were three quite short run but otherwise valid results. No such error messages with those three.
Is this the same bug, somehow without the error message??
Cheers,
Gary.
RE: RE: There is
)
Probably a different bug and not Einstein... BOINC Manager 4.43 (for Mac) sometimes got "confused" when a new WU for a different project is downloaded and immediately starts running; the WU that was running and is now paused gets the CPU time and the Progress reset to 0. Whenever it gets to run again, the Progress value is corrected, but the CPU time is not. I've been 6-7 hours into a WU and had this happen and it soon shows 90% complete with 3 minutes of CPU time, estimated completion in a few seconds - and it sits this way, very slowly changing, for that last half-hour of processing. When the result is finally sent, every indication (including requested credit) is that I did that WU in a half hour, when it really took closer to eight.
Mac version 4.72 so far seems to not have the same problem.