Has anyone here established credentials to judge the validity of results? I haven't because I'm not smart enough. I trust the E@H crew to do that for me.
Cheers, Mike.
This is precisely my position, which is why I was so upset that they let this situation develop at all.
To all those arguing that the optimised apps (that work) are valid, I am only suggesting that we wait for the E&H crew to concede that.
Imagine YOU are the scientific community. Now if You as a customer go to a doctors or a dentist or having someone build your house (or virtually any other trained profession). Would you say please give me the the young inexperiance but fast one with no qualifications? NO. Especially in something like a doctor you would expect/demand that they were someone that had gone throught years of training, examinations, been fully certified for the job by those higher up, you yourself trust, and you'd always hope for the one with the most experiance.
You wouldn't want someone to build your house the quickest way possible. You'd want the one who can do it fairly quick but you can be very sure they know what they're doing and get a guaranty for the work when its done.
The problem isn't that even the stable releases can produce identical results. It's that scientists who want this data will not accept stuff done by the trainie even if he probly can do the work just as well and quicker, they want it done by the one THEY KNOW and THEY certify can do the work.
I personally know that Akos's work is just as good as the official app because I run it and also many others but you have to understand WE ARE NOT the ones who decide the app is up to scratch. Untill the scientist have a look at Akos's work and decide it would be acceptable then all work done with the patched will have to be thrown out and RE-DONE by the official app.
By running an UNofficial app your are actually wasting your time AND the projects because the work will have to be done again.
So ask yourself. As the person who is paying electricity- Do you want to do the app a bit slower but the result is worthwhile -or- do it a little faster and waste it and your electicity altogether?
As i already said: the work will NOT be done again because there is not even a proper way to filter out those results crunched with an optimized app.
Well the post has gone off the top of the page but akosf said himself that with communications with the project staff (at the time he'd made that post) about 5000 WU where going to have to be resent.
Quote:
edit: Probably these units will be send out again. ( about 5000 WUs at moment )
And as the scientific community wont accept the results of those done with patches theres little choice but to redo them.
Unless thats what your trying. Force them to either accept or have todump the results.
As i already said: the work will NOT be done again because there is not even a proper way to filter out those results crunched with an optimized app.
Well the post has gone off the top of the page but akosf said himself that with communications with the project staff (at the time he'd made that post) about 5000 WU where going to have to be resent.
I read that but that was (at least i very much suppose that) about the results of the versions that produced invalid results like 0004, 0713 (in some cases) etc. i think the live-testing was a bit too much.
But if there is such an easy way to identify then it should be easy just to not accept results from any inoffical version any more.
But i'm not sure and would finaly be glad about any statement from a developer, that already should have come many days ago....
As i already said: the work will NOT be done again because there is not even a proper way to filter out those results crunched with an optimized app.
Well the post has gone off the top of the page but akosf said himself that with communications with the project staff (at the time he'd made that post) about 5000 WU where going to have to be resent.
I read that but that was (at least i very much suppose that) about the results of the versions that produced invalid results like 0004, 0713 (in some cases) etc. i think the live-testing was a bit too much.
That i agree seemed maybe the case but thats maybe even more worriying- There where actually only a couple of patches that produces invalid results. That suggests that enought people downloaded the bad patches and where able to ctrunch through 5000 units in a very short amount of time.I find it unlikely becuase of the length of the units and the speed at which new patches came out. SPeed that bad patches where identified and that there where very few patches that were bad.
hm.. could be... hard to say.... I will just wait now any only crunch under Linux because there the speed is almost as high as an optimized windows app (at least on that type of WU i just got there now...)
As i already said: the work will NOT be done again because there is not even a proper way to filter out those results crunched with an optimized app.
No. Akos had ensured that the returns were identified on a per version basis. He did that precisely for the reasons of quality control, and it was upon failure of said apps that he concluded the erroneous results - hence which were not acceptable for science use - and thus advised withdrawal.
More generally:
How many of you are responding here without actually checking the prior posts? :-)
By that I mean looking for and then reading what people have posted .... :-)
This is quite old ground, and some of you don't appear to realise we are on about the third lap, I think. So let's get over it team! :-)
If it wasn't a duty of a moderator, I would have left this dead horse of a thread ages ago! :-)
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Have you tested and compared the unofficial vs official apps on each and every possible WU? Describe for us what testing was done in house by the developers.
If the project scientists could do that, test multiple apps on every possible work unit to find out what's the best, they probably wouldn't need to use BOINC, would they? :)
Maybe I'm wrong, but I thought the primary confusion about the optimized applications and being told to stop using them was regarding how results crunched with an optimized application that validate, especially against the standard application, could be considered bad or be causing problems. Apparently they are causing problems, for vague reasons (causing problems with the database isn't very specific). But the project sets the validator program to their specified tolerances so that they get the accuracy and precision they need. If an optimized-crunched work unit validates, especially when a standard-crunched work unit is in the quorum, then it has met the science requirements, no? If you (in general) don't believe the checked validity of optimized-crunched work units, then I guess you don't believe the validator is capable of doing its job properly. Since the validator is how the scientists ensure that the work they validate and accept is high quality, I would assume that they know how to set it up properly for their purposes. Testing variations of the application, hardware, OSes, etc. might be good, but if the validator does its job right, rigorous testing such as the one implied above shouldn't be necessary. Some testing, sure, but certainly not every work unit in full or they wouldn't need us to help them with the work.
If the issue is really a database thing, that something about the optimized work units are doing something different enough that the database can't handle them properly (but the science is valid), then I hope they fix the database. We have no details on that, so I don't know what the problem might actually be that would cause valid results to mess things up. But if the database gets fixed, maybe then we can go back to using the optimized applications which save a bunch of time (something like 6 hours per long work unit for me, I think).
Eidt: italics added to call out a section. In response the the building analogy, what if an inspector has to come in and check the final house to make sure it meets the requirements? And what if the fast, inexperienced worker can produce a house that meets all predetermined tolerances, the same as the slower, more experienced worker? Is the house any less good because it was finished sooner? I don't think so because it still meets the requirements and was built to tolerance. This is my point above about the validator. The validator is like the building inspector. If it says it meets tolerances, you either believe it and accept the results, or don't believe it and say the inspector (vaidator) is inept and can't do his/her job properly. Are the scientists in the latter group? Do we even know that for sure? As noted above, since the validator ensures the quality of the accepted results, I believe the scientists have programmed it properly so that it will only accept results within pre-determined tolerances. I don't think the work units should be thrown out if they validate simply because a different version of the application crunched them.
It is causing problems in the database. Now that tells me they CAN tell what is optimized and what is not, because there are issues with how it's interpreted in the database after the validation.
If this is true, then the numbers are NOT EXACTLY the same, and are giving results that are bad. Even though certain numbers match for the validator, they do not match enough for the database, and causing issues.
I believe Akos, in what he says. He made the apps, he has been at Einstein, he knows what is going on behind the scenes. Please heed his warning.
I think you didn't understand, what Akos wrote. He ist writing about the bad results, imho the invalid ones from some of his patched apps, and he is NOT writing about "the results" or "all results".
Later on he added the argument about the view of other scientists, who only accept an official app.
This let's open the question, if the patched apps, returning valid results are producing valid, but wrong results. The project should check this soon, otherwise they loose crunchers with every day.
RE: Has anyone here
)
This is precisely my position, which is why I was so upset that they let this situation develop at all.
To all those arguing that the optimised apps (that work) are valid, I am only suggesting that we wait for the E&H crew to concede that.
Dead men don't get the baby washed. HTH
OK heres my go at giving it
)
OK heres my go at giving it an anology.
Imagine YOU are the scientific community. Now if You as a customer go to a doctors or a dentist or having someone build your house (or virtually any other trained profession). Would you say please give me the the young inexperiance but fast one with no qualifications? NO. Especially in something like a doctor you would expect/demand that they were someone that had gone throught years of training, examinations, been fully certified for the job by those higher up, you yourself trust, and you'd always hope for the one with the most experiance.
You wouldn't want someone to build your house the quickest way possible. You'd want the one who can do it fairly quick but you can be very sure they know what they're doing and get a guaranty for the work when its done.
The problem isn't that even the stable releases can produce identical results. It's that scientists who want this data will not accept stuff done by the trainie even if he probly can do the work just as well and quicker, they want it done by the one THEY KNOW and THEY certify can do the work.
I personally know that Akos's work is just as good as the official app because I run it and also many others but you have to understand WE ARE NOT the ones who decide the app is up to scratch. Untill the scientist have a look at Akos's work and decide it would be acceptable then all work done with the patched will have to be thrown out and RE-DONE by the official app.
By running an UNofficial app your are actually wasting your time AND the projects because the work will have to be done again.
So ask yourself. As the person who is paying electricity- Do you want to do the app a bit slower but the result is worthwhile -or- do it a little faster and waste it and your electicity altogether?
As i already said: the work
)
As i already said: the work will NOT be done again because there is not even a proper way to filter out those results crunched with an optimized app.
RE: As i already said: the
)
Well the post has gone off the top of the page but akosf said himself that with communications with the project staff (at the time he'd made that post) about 5000 WU where going to have to be resent.
And as the scientific community wont accept the results of those done with patches theres little choice but to redo them.
Unless thats what your trying. Force them to either accept or have todump the results.
RE: RE: As i already
)
I read that but that was (at least i very much suppose that) about the results of the versions that produced invalid results like 0004, 0713 (in some cases) etc. i think the live-testing was a bit too much.
But if there is such an easy way to identify then it should be easy just to not accept results from any inoffical version any more.
But i'm not sure and would finaly be glad about any statement from a developer, that already should have come many days ago....
RE: RE: RE: As i
)
That i agree seemed maybe the case but thats maybe even more worriying- There where actually only a couple of patches that produces invalid results. That suggests that enought people downloaded the bad patches and where able to ctrunch through 5000 units in a very short amount of time.I find it unlikely becuase of the length of the units and the speed at which new patches came out. SPeed that bad patches where identified and that there where very few patches that were bad.
hm.. could be... hard to
)
hm.. could be... hard to say.... I will just wait now any only crunch under Linux because there the speed is almost as high as an optimized windows app (at least on that type of WU i just got there now...)
RE: As i already said: the
)
No. Akos had ensured that the returns were identified on a per version basis. He did that precisely for the reasons of quality control, and it was upon failure of said apps that he concluded the erroneous results - hence which were not acceptable for science use - and thus advised withdrawal.
More generally:
How many of you are responding here without actually checking the prior posts? :-)
By that I mean looking for and then reading what people have posted .... :-)
This is quite old ground, and some of you don't appear to realise we are on about the third lap, I think. So let's get over it team! :-)
If it wasn't a duty of a moderator, I would have left this dead horse of a thread ages ago! :-)
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
RE: Have you tested and
)
If the project scientists could do that, test multiple apps on every possible work unit to find out what's the best, they probably wouldn't need to use BOINC, would they? :)
Maybe I'm wrong, but I thought the primary confusion about the optimized applications and being told to stop using them was regarding how results crunched with an optimized application that validate, especially against the standard application, could be considered bad or be causing problems. Apparently they are causing problems, for vague reasons (causing problems with the database isn't very specific). But the project sets the validator program to their specified tolerances so that they get the accuracy and precision they need. If an optimized-crunched work unit validates, especially when a standard-crunched work unit is in the quorum, then it has met the science requirements, no? If you (in general) don't believe the checked validity of optimized-crunched work units, then I guess you don't believe the validator is capable of doing its job properly. Since the validator is how the scientists ensure that the work they validate and accept is high quality, I would assume that they know how to set it up properly for their purposes. Testing variations of the application, hardware, OSes, etc. might be good, but if the validator does its job right, rigorous testing such as the one implied above shouldn't be necessary. Some testing, sure, but certainly not every work unit in full or they wouldn't need us to help them with the work.
If the issue is really a database thing, that something about the optimized work units are doing something different enough that the database can't handle them properly (but the science is valid), then I hope they fix the database. We have no details on that, so I don't know what the problem might actually be that would cause valid results to mess things up. But if the database gets fixed, maybe then we can go back to using the optimized applications which save a bunch of time (something like 6 hours per long work unit for me, I think).
Eidt: italics added to call out a section. In response the the building analogy, what if an inspector has to come in and check the final house to make sure it meets the requirements? And what if the fast, inexperienced worker can produce a house that meets all predetermined tolerances, the same as the slower, more experienced worker? Is the house any less good because it was finished sooner? I don't think so because it still meets the requirements and was built to tolerance. This is my point above about the validator. The validator is like the building inspector. If it says it meets tolerances, you either believe it and accept the results, or don't believe it and say the inspector (vaidator) is inept and can't do his/her job properly. Are the scientists in the latter group? Do we even know that for sure? As noted above, since the validator ensures the quality of the accepted results, I believe the scientists have programmed it properly so that it will only accept results within pre-determined tolerances. I don't think the work units should be thrown out if they validate simply because a different version of the application crunched them.
RE: To those thinking the
)
I think you didn't understand, what Akos wrote. He ist writing about the bad results, imho the invalid ones from some of his patched apps, and he is NOT writing about "the results" or "all results".
Later on he added the argument about the view of other scientists, who only accept an official app.
This let's open the question, if the patched apps, returning valid results are producing valid, but wrong results. The project should check this soon, otherwise they loose crunchers with every day.
cu,
Michael
[edit]typo[/edit]