Question about validator being valid

Beyond
Beyond
Joined: 28 Feb 05
Posts: 121
Credit: 2374786212
RAC: 5712628
Topic 196976

I had my first validation failure in quite a while so thought I'd check it out to see if there was a problem with the GPU. Here's the WU:

http://einsteinathome.org/workunit/163030395

The results look strange. The 2 invalid WUs both say they completed successfully and have the same result:

------> Number of samples: 4194304
------> Trial dispersion measure: 436.5 cm^-3 pc
------> Scale factor: 0.0010939
[09:54:36][1624][INFO ] Seed for random number generator is 1168861107.
[09:54:40][1624][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 1.32531e-008
------> thr1 = 18.139
------> thr2 = 21.241
------> thr4 = 26.2686
------> thr8 = 34.6478
------> thr16 = 48.9581
[10:00:42][1624][INFO ] Checkpoint committed!
[10:00:59][1624][INFO ] OpenCL shutdown complete!
[10:00:59][1624][INFO ] Statistics: count dirty SumSpec pages 3861 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052
[10:00:59][1624][INFO ] Data processing finished successfully!
10:00:59 (1624): called boinc_finish

------> Number of samples: 4194304
------> Trial dispersion measure: 436.5 cm^-3 pc
------> Scale factor: 0.0010939
[16:45:12][17360][INFO ] Seed for random number generator is 1168861107.
[16:45:14][17360][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 1.32531e-008
------> thr1 = 18.139
------> thr2 = 21.241
------> thr4 = 26.2686
------> thr8 = 34.6478
------> thr16 = 48.9581
[16:45:14][17360][INFO ] CUDA global memory status (GPU setup complete):
------> Used in total: 534 MB (490 MB free / 1024 MB total) -> Used by this application (assuming a single GPU task): 293 MB
[16:48:48][17360][INFO ] Checkpoint committed!
[16:54:54][17360][INFO ] Statistics: count dirty SumSpec pages 3861 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052
[16:54:54][17360][INFO ] Data processing finished successfully!
16:54:54 (17360): called boinc_finish

While the WUs that validated successfully don't seem to agree with each other:

------> Number of samples: 4194304
------> Trial dispersion measure: 436.5 cm^-3 pc
------> Scale factor: 0.0010939
[18:28:31][1164][INFO ] Seed for random number generator is 1168861107.
[18:28:34][1164][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 1.32531e-008
------> thr1 = 18.139
------> thr2 = 21.241
------> thr4 = 26.2686
------> thr8 = 34.6478
------> thr16 = 48.9581
[18:52:47][1164][INFO ] Checkpoint committed!
[18:52:58][1164][INFO ] OpenCL shutdown complete!
[18:52:58][1164][INFO ] Statistics: count dirty SumSpec pages 3820 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052
[18:52:58][1164][INFO ] Data processing finished successfully!
18:52:58 (1164): called boinc_finish

------> Number of samples: 4194304
------> Trial dispersion measure: 436.5 cm^-3 pc
------> Scale factor: 0.0010939
[10:42:37][2284][INFO ] Seed for random number generator is 1168861107.
[10:42:51][2284][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 1.32531e-008
------> thr1 = 18.139
------> thr2 = 21.241
------> thr4 = 26.2686
------> thr8 = 34.6478
------> thr16 = 48.9581
[10:43:29][2284][INFO ] Checkpoint committed!
[10:44:29][2284][INFO ] Checkpoint committed!
[10:45:29][2284][INFO ] Checkpoint committed!
[10:45:52][2284][INFO ] OpenCL shutdown complete!
[10:45:52][2284][INFO ] Statistics: count dirty SumSpec pages 142 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052
[10:45:52][2284][INFO ] Data processing finished successfully!
10:45:52 (2284): called boinc_finish

Question: is there a problem with the validator or am I reading something incorrectly? Or did the WUs just get sent out again because the first 2 users were so late? If so, shouldn't all 4 be considered valid? Either way it looks like there might be some kind of validator problem.

Horacio
Horacio
Joined: 3 Oct 11
Posts: 205
Credit: 80557243
RAC: 0

Question about validator being valid

Do you remember/know that the real results are not visible in the Stderr and that each WU in fact is a set of 8 WUs packed togheter?

And, while this doesnt invalidates your question, anyway you can't know if the results are equal or not just from the info of the stderr.
AFAIK, its not totally unexpected to have an ocassional disagreement in the validations due to a normal rounding difference which is not related to a failure, but for answering this someone of the staff has to look manually in the results...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.