In my eyes it's a matter of inter-project etiquette to try to align the credits or else credit-inflation would happen in a race to attract more users. If intra-parity cannot be achieved, it would be reasonable to pick the most widely used platform for calibration instead (you have to calibrate the cobblestones to some value, after all). At the moment this would rather be Intel/Win than AMD/Win.
But we are getting off-topic a bit I'm afraid.
As I said before, I'll respectfully disagree about it being off-topic.
Beyond the credit disparity, there is really the more troubling performance/watt penalty. A comparable system to mine running Linux will use around 20% less power for the same work with patched 4.17/4.23, and 40-50% if I were using the non-patched 4.17 app.
As for if the Win/Intel platform is selected for the next round of credit dropping, I won't fuss too terribly, provided that the severe penalty that exists now that directly impacts me is addressed to my satisfaction (within 5% of a comparable Linux system)...
Is it too early to ask whether some of the results computed so far will have to be recomputed? Sounds like a rather serious problem affecting all platforms.
All platforms are affected, but not all workunits. People haven't reached a consensus yet how many this actually are. My current wild guess would be of an order of a few percent, but the main question is how reliable we can identify the ones affected without completely re-calculating them all.
Is it too early to ask whether some of the results computed so far will have to be recomputed? Sounds like a rather serious problem affecting all platforms.
All platforms are affected, but not all workunits. People haven't reached a consensus yet how many this actually are. My current wild guess would be of an order of a few percent, but the main question is how reliable we can identify the ones affected without completely re-calculating them all.
BM
There seems to be some thought that Homogenous Redundancy should be turned on. Is it fair to say that this would only mask "the problem" (credit not being granted) from mainly our perspective as volunteers and still leave an actual scientific problem from the science/project side, or would turning on HR work?
There seems to be some thought that Homogenous Redundancy should be turned on. Is it fair to say that this would only mask "the problem" (credit not being granted) from mainly our perspective as volunteers and still leave an actual scientific problem from the science/project side
Yes.
The validation "problem" is only a symptom of an actual "scientific" problem.
The problem was technically a variable that was used uninitialized, but only in some cases. The value of such variables often ends up being zero on Unix-type systems (such as Linux an MacOS, depends also on the compiler and optimization), and some random value on Windows. Unfortunately even zero isn't a valid value from the meaning of this variable, so in these cases actually all results are scientifically invalid.
There seems to be some thought that Homogenous Redundancy should be turned on. Is it fair to say that this would only mask "the problem" (credit not being granted) from mainly our perspective as volunteers and still leave an actual scientific problem from the science/project side
Yes.
The validation "problem" is only a symptom of an actual "scientific" problem.
The problem was technically a variable that was used uninitialized, but only in some cases. The value of such variables often ends up being zero on Unix-type systems (such as Linux an MacOS, depends also on the compiler and optimization), and some random value on Windows. Unfortunately even zero isn't a valid value from the meaning of this variable, so in these cases actually all results are scientifically invalid.
BM
Did you see my comment about using /RTC if you are using VC++ 2005? That will add run-time checks that can catch uninitialized variables. One other thing it helps with are array OOB conditions (buffer overflow)...
Did you see my comment about using /RTC if you are using VC++ 2005?
Yes, I did. Actually that's how I found the bug (though we are still using VS ".NET" of 2003), but it was necessary for me, too, to get the Runtime Debugger working for this (and to dig out the right workunits).
Your posts are incredibly helpful, thanks a lot! Are you doing Windows programming for living?
I'm currently unemployed, and I think that there are likely many others who provided much better help... My only background with C++ was maintaining (not initial development) of a credit card communication DLL that was migrated from C, to C++ (Unix), and then to a Windows DLL. A guy I worked with that is much smarter than I could ever hope to be ended up having to add in TAPI support to it, and so we switched to Debug from Release at that point while he and I worked together to test the change. We ended up shipping the DLL still compiled as Debug with /RTC and wrote output to a text file that we periodically checked for any other issues. We had several overflows and uninitialized variables.
Edit: The overflows and uninitialized variables were in parts of the DLL that used TCP/IP rather than async dialup. We had to get the TAPI functionality out the door as fast as we could, but we had heard / seen instances of the auth crashing or hanging on the TCP/IP side, so we left the checks in and found what was giving us grief. Of course, that had management in knots for a while with the concept of "debug" code running in a live environment... We convinced them that there wasn't a real performance impact, and we hadn't changed anything (yet), so if the auth crashed, it would've crashed anyway.
In case anyone notices, my last two results with 4.23 were faster than the first two. There is a reason for it...the same change as to 4.17 ;-) This indicates that the code is either still called periodically or that path was slowed by something in 4.23, but is still faster overall... I don't have a profiler installed, so it was just a guess that it would cause a change in performance. That was my "interesting" thing I was going to try...
There seems to be some thought that Homogenous Redundancy should be turned on. Is it fair to say that this would only mask "the problem" (credit not being granted) from mainly our perspective as volunteers and still leave an actual scientific problem from the science/project side
Yes.
The validation "problem" is only a symptom of an actual "scientific" problem.
The problem was technically a variable that was used uninitialized, but only in some cases. The value of such variables often ends up being zero on Unix-type systems (such as Linux an MacOS, depends also on the compiler and optimization), and some random value on Windows. Unfortunately even zero isn't a valid value from the meaning of this variable, so in these cases actually all results are scientifically invalid.
BM
Here's something that I just thought of. . .
While this bug still exists, would it be a good idea to not send out more work units? I know that some folks would complain, but it seems to me that it would lessen the chance of getting bad results.
RE: In my eyes it's a
)
As I said before, I'll respectfully disagree about it being off-topic.
Beyond the credit disparity, there is really the more troubling performance/watt penalty. A comparable system to mine running Linux will use around 20% less power for the same work with patched 4.17/4.23, and 40-50% if I were using the non-patched 4.17 app.
As for if the Win/Intel platform is selected for the next round of credit dropping, I won't fuss too terribly, provided that the severe penalty that exists now that directly impacts me is addressed to my satisfaction (within 5% of a comparable Linux system)...
RE: Did you get enough data
)
I know that it has improved, but the situations affected are actually very rare anyway. May add up to 1% of the errors related to checkpointing.
BM
BM
RE: Is it too early to ask
)
All platforms are affected, but not all workunits. People haven't reached a consensus yet how many this actually are. My current wild guess would be of an order of a few percent, but the main question is how reliable we can identify the ones affected without completely re-calculating them all.
BM
BM
RE: RE: Is it too early
)
There seems to be some thought that Homogenous Redundancy should be turned on. Is it fair to say that this would only mask "the problem" (credit not being granted) from mainly our perspective as volunteers and still leave an actual scientific problem from the science/project side, or would turning on HR work?
Thanks...
RE: There seems to be some
)
Yes.
The validation "problem" is only a symptom of an actual "scientific" problem.
The problem was technically a variable that was used uninitialized, but only in some cases. The value of such variables often ends up being zero on Unix-type systems (such as Linux an MacOS, depends also on the compiler and optimization), and some random value on Windows. Unfortunately even zero isn't a valid value from the meaning of this variable, so in these cases actually all results are scientifically invalid.
BM
BM
RE: RE: There seems to be
)
Did you see my comment about using /RTC if you are using VC++ 2005? That will add run-time checks that can catch uninitialized variables. One other thing it helps with are array OOB conditions (buffer overflow)...
RE: Did you see my comment
)
Yes, I did. Actually that's how I found the bug (though we are still using VS ".NET" of 2003), but it was necessary for me, too, to get the Runtime Debugger working for this (and to dig out the right workunits).
Your posts are incredibly helpful, thanks a lot! Are you doing Windows programming for living?
BM
BM
RE: Are you doing Windows
)
I'm currently unemployed, and I think that there are likely many others who provided much better help... My only background with C++ was maintaining (not initial development) of a credit card communication DLL that was migrated from C, to C++ (Unix), and then to a Windows DLL. A guy I worked with that is much smarter than I could ever hope to be ended up having to add in TAPI support to it, and so we switched to Debug from Release at that point while he and I worked together to test the change. We ended up shipping the DLL still compiled as Debug with /RTC and wrote output to a text file that we periodically checked for any other issues. We had several overflows and uninitialized variables.
Edit: The overflows and uninitialized variables were in parts of the DLL that used TCP/IP rather than async dialup. We had to get the TAPI functionality out the door as fast as we could, but we had heard / seen instances of the auth crashing or hanging on the TCP/IP side, so we left the checks in and found what was giving us grief. Of course, that had management in knots for a while with the concept of "debug" code running in a live environment... We convinced them that there wasn't a real performance impact, and we hadn't changed anything (yet), so if the auth crashed, it would've crashed anyway.
Brian
In case anyone notices, my
)
In case anyone notices, my last two results with 4.23 were faster than the first two. There is a reason for it...the same change as to 4.17 ;-) This indicates that the code is either still called periodically or that path was slowed by something in 4.23, but is still faster overall... I don't have a profiler installed, so it was just a guess that it would cause a change in performance. That was my "interesting" thing I was going to try...
Brian...being "naughty" ;-)
RE: RE: There seems to be
)
Here's something that I just thought of. . .
While this bug still exists, would it be a good idea to not send out more work units? I know that some folks would complain, but it seems to me that it would lessen the chance of getting bad results.