Credit granted with only two valid results for WU 1906702

Jean Jeener
Jean Jeener
Joined: 3 Jun 05
Posts: 37
Credit: 5291636
RAC: 0
Topic 189795

Does this reflect a change in the validation rules, or a bug in the validation procedure, or my limited knowledge about these things.
JJ

Divide Overflow
Divide Overflow
Joined: 9 Feb 05
Posts: 91
Credit: 183220
RAC: 0

Credit granted with only two valid results for WU 1906702

No change. The proper quorum of at least 3 results were successfuly returned. Two of the results were found to be accurate enough to pass validation and have credit awarded. The other two evidently were just far enough out of bounds to be called invalid.

Jean Jeener
Jean Jeener
Joined: 3 Jun 05
Posts: 37
Credit: 5291636
RAC: 0

I do not understand David

I do not understand David Knittle's reply.

For WU 1906702, 4 results are listed, namely
. ID 8040829 invalid (app 4.82)
. ID 8040830 invalid (app 4.82, my computer)
. ID 8040831 valid (app 4.79)
. ID 8040832 valid (app 4.79)
and credit has been granted for the 2 computers with valid results.

The rule, as I understand it, states that a minimum of 3 valid results is required before the result is used for further calculations and credit is granted. What worries me is not the issue of credit, but the fact that what I have noticed may reflect a bug in the validation procedure. By the way, I have not seen the same problem for any other WU that I was curious enough to examine (at least 50 of them).

Please, BA, BM or one their Colleagues, let me know that you have seen this message. Thanks, JJ;

verty
verty
Joined: 31 Jul 05
Posts: 69
Credit: 16658
RAC: 0

Did you notice that one of

Did you notice that one of the work units finished in about 700 seconds, and was given credit? One of the administrators should definitely check up on that, there is obviously something seriously wrong there.

One more thing, I notice your turnaround time is 3 days, but you have 8 work units outstanding. It will surely take 24 days to finish those work units, but the deadline is 14 days. You should at least halve your 'connect to network' time.

Divide Overflow
Divide Overflow
Joined: 9 Feb 05
Posts: 91
Credit: 183220
RAC: 0

RE: Did you notice that one

Message 16038 in response to message 16037

Quote:
Did you notice that one of the work units finished in about 700 seconds, and was given credit?


?
Verty, are you looking at the right work unit Jean is talking about?
http://einsteinathome.org/workunit/1906702

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118230473178
RAC: 24450579

RE: RE: Did you notice

Message 16039 in response to message 16038

Quote:
Quote:
Did you notice that one of the work units finished in about 700 seconds, and was given credit?

?
Verty, are you looking at the right work unit Jean is talking about?
http://einsteinathome.org/workunit/1906702

Verty was looking at another of the work units in the results list of the original poster, just 3 places away from the invalid result. The WUID is 1932119. If you check the result ID for this WUID, you will see a result whose crunch time was only 657 seconds but it was validated and awarded full credit, far in excess of what was actually claimed. How can such a result be valid??? Surely something must be seriously wrong??

The main reason I decided to post a reply in this thread is that I've recently observed similar behaviour in another machine using the Darwin OS. Have a look at CPUID 939 and you will notice several validated results which have taken far less than the normal time, including one that took just 1422 seconds instead of around 34,000 seconds.

Can any of the Devs explain what is going on with these extremely short but supposedly valid results? My only concern is for the validity of the science.

Cheers,
Gary.

Michael Karlinsky
Michael Karlinsky
Joined: 22 Jan 05
Posts: 888
Credit: 23502182
RAC: 0

RE: Verty was looking at

Message 16040 in response to message 16039

Quote:

Verty was looking at another of the work units in the results list of the original poster, just 3 places away from the invalid result. The WUID is 1932119. If you check the result ID for this WUID, you will see a result whose crunch time was only 657 seconds but it was validated and awarded full credit, far in excess of what was actually claimed. How can such a result be valid??? Surely something must be seriously wrong??

There is something wrong, but nothing serious. This is the infamous
"No heartbeat from core client"-bug. Sometimes more than one science-app
runs per CPU-core. One is exiting with the message above. If it is
restarted it will resume normally, but CPU time starts from zero.

So the science is absolutely valid.

Michael

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118230473178
RAC: 24450579

RE: There is something

Message 16041 in response to message 16040

Quote:

There is something wrong, but nothing serious. This is the infamous
"No heartbeat from core client"-bug ....

OK, I see that error message as part of the error output when you examine the details of the appropriate resultID. So I went and checked out CPUID 939 where there were three quite short run but otherwise valid results. No such error messages with those three.

Is this the same bug, somehow without the error message??

Cheers,
Gary.

Tern
Tern
Joined: 27 Jul 05
Posts: 309
Credit: 99440614
RAC: 0

RE: RE: There is

Message 16042 in response to message 16041

Quote:
Quote:

There is something wrong, but nothing serious. This is the infamous
"No heartbeat from core client"-bug ....

OK, I see that error message as part of the error output when you examine the details of the appropriate resultID. So I went and checked out CPUID 939 where there were three quite short run but otherwise valid results. No such error messages with those three.

Is this the same bug, somehow without the error message??

Probably a different bug and not Einstein... BOINC Manager 4.43 (for Mac) sometimes got "confused" when a new WU for a different project is downloaded and immediately starts running; the WU that was running and is now paused gets the CPU time and the Progress reset to 0. Whenever it gets to run again, the Progress value is corrected, but the CPU time is not. I've been 6-7 hours into a WU and had this happen and it soon shows 90% complete with 3 minutes of CPU time, estimated completion in a few seconds - and it sits this way, very slowly changing, for that last half-hour of processing. When the result is finally sent, every indication (including requested credit) is that I did that WU in a half hour, when it really took closer to eight.

Mac version 4.72 so far seems to not have the same problem.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.