How about the 0xc0000142 crash issues? I don't know if you got my email, as you haven't replied... I wish I knew more of what to help with, but that error is a vexing one...
Yep, got it. Sorry for not replying immediately, had two rather chaotic days. Wrote to Rom about it as you suggested.
Quote:
Edit: BTW, SIGABRT still seems to come up for Linux. See this result.
Yep. But not too many (190 in past week), most from the same 4 machines. Not my highest priority right now.
Looks like it failed on a routine task switch restart.
Alinator
Very strange: It restarts, finds the checkpoint-file (!), tries to open it but somehow can't (!), and exists with an error message that the checkpoint file isn't there at all ...
I have noticed one of these as well. At first glance it seems to be the same situation as Alinator's. It happened on the third result since the switch to 4.24.
Before I saw Alinator's report, I had attributed this error to hardware problems. With a large number of older machines, I've run across quite a number of motherboards which have developed the "swollen capacitor syndrome". Being curious by nature and owning a good quality Weller soldering iron, I've attempted the repair of about 10 such motherboards. Until now, my success rate has been 100% since all such repaired systems are back in production.
The result linked above was crunched on a machine where about 8 caps were replaced. I've only replaced caps that are obviously swollen so there are still some original caps left. It has been running fine for about 2 months since the repair but has started locking up about once a day recently. I've been restarting it as required and it has been completing work without any client errors until now. So I don't really know if the client error was associated with more faulty caps or a problem with the 4.24 app. Alinator's seemingly identical error is making me wonder if it's the app.
This weekend I'll probably take the mobo out and see if I can spot any more dodgy caps. If so I'll replace some more of them and see if that cures the lockups. It'll also be interesting to see if I get any more client errors on that particular combination of hardware, 4.24 app, and particular frequency data file, once I cure the lockups.
Very strange: It restarts, finds the checkpoint-file (!), tries to open it but somehow can't (!), and exists with an error message that the checkpoint file isn't there at all ...
Yep. Keeps me confused ever since I made the error messages a little more verbose. We actually get a lot of these errors, I'll write to Rom about that. Maybe boinc_fopen() does some funny things...
To answer the question below re 4.24 vs 4.17 with-w/o patch, I went ahead and did the 'ABC' patch and got the following results:
4.17 with 'ABC' patch applied yields approx 85.3k sec/WU
4.24 with no patch yields approx 90.9k sec/WU
4.24 with 'ABC' patch applied yields approx 88.3k sec/WU...about a 50 min. penalty/WU for running 4.24 vs 4.17 on this machine.
Quote:
Quote:
What is the effect that happens when you "ABC" again? Is that working against the modf() -> ftol() change, or is there still some activity going to modf() despite the change, meaning there's another "buggy detection" different from the one that was already worked around, or is that string changing some other function?
Brian
Anyway...the effect of the "ABC" patch is that on AMD CPUs that supports SSE2, a global flag in the runtime lib is set differently. This flag toggles the (usually) faster SSE2 codepath for several functions, not just modf. What Bernd did was to rewrite the code in the hot-loop so that it would no longer call modf but ftol, for which, in VS 2003, only one code path exists which is reasonable fast. The slow codepath will continue to be executed in the new Win apps, but no longer in the hot-loop, as I understand it, so the overall effect of "ABC"ing the app should be much smaller now.
Just to keep you updated of our plans, mainly regarding the cross-platform differences:
- Early next week (probably Monday) we'll issue a new validator that should make things easier for transition and probably fix some invalid results by itself
- After the new validator is in place, we'll issue a new set of Apps for public Beta Test (for all platforms) that incorporate the fixes accomplished so far. I'll keep on tracking problems and fixing bugs I find until the very last moment. The new Apps will also incorporate a new feature that we might need.
- If it turns out that we need this feature (using pre-calculated files instead of doing the calculations in the Apps to avoid platform differences there), we will issue new workunits (actually a new workunit generator) that will make use of this feature after the new Apps have been made "official".
- Once we got the validation working properly, I'll work on speeding up the computation in the Apps. The current code I plan to use for parts of the calculation btw. doesn't make use of neither modf() nor ftol() anymore but actually uses bit-operations to achieve something similar.
Excellent news, and just in time to deal with the 'new' monster workunits (>= 630 credits) that would otherwise cause quite a bit of frustration if crunched for zero credits because of the cross platform validation issue.
As to performance, I was surprised to see that the new app with ftol instead of modf seems to be slightly *faster* at least on some modern SSE2-capable Intel (!!!) CPUSs. I know it's a tad-bit slower on my Pentium M, but I checked one of the top 3 computers (see link on E@H homepage) and there was no decrease in crunching performance when the switch happened.
My AMD 3500+ liked the 4.24, it went from about 38hr with 4.17 to ~28hr with 4.24 on WU from the same set of datafile, still waiting for my crunch partner to see if it is valid. If there is more to do too speed it up it´s great but I understand that the validation problem must be looked at first.
After the new validator is in place, we'll issue a new set of Apps for public Beta Test (for all platforms) that incorporate the fixes accomplished so far...
Bernd,
You might like to consider posting a short news item (linking to your latest message) on the front page right now. This would give more people who might like to participate in the next beta test some time to do a bit of research before things get going in earnest. There probably aren't a whole lot of participants following this particular thread anymore :).
The other major benefit is for all those people who wouldn't participate in a beta test anyway. At least they should be highly encouraged to see that something is happening to address issues that may currently be turning them off this project.
RE: How about the
)
Yep, got it. Sorry for not replying immediately, had two rather chaotic days. Wrote to Rom about it as you suggested.
Yep. But not too many (190 in past week), most from the same 4 machines. Not my highest priority right now.
BM
BM
Just had a 4.24 crap out
)
Just had a 4.24 crap out about half way thorough its first result run with 4.24.
85487605
Looks like it failed on a routine task switch restart.
Alinator
RE: Just had a 4.24 crap
)
Very strange: It restarts, finds the checkpoint-file (!), tries to open it but somehow can't (!), and exists with an error message that the checkpoint file isn't there at all ...
CU
BRM
I have noticed one of these
)
I have noticed one of these as well. At first glance it seems to be the same situation as Alinator's. It happened on the third result since the switch to 4.24.
Before I saw Alinator's report, I had attributed this error to hardware problems. With a large number of older machines, I've run across quite a number of motherboards which have developed the "swollen capacitor syndrome". Being curious by nature and owning a good quality Weller soldering iron, I've attempted the repair of about 10 such motherboards. Until now, my success rate has been 100% since all such repaired systems are back in production.
The result linked above was crunched on a machine where about 8 caps were replaced. I've only replaced caps that are obviously swollen so there are still some original caps left. It has been running fine for about 2 months since the repair but has started locking up about once a day recently. I've been restarting it as required and it has been completing work without any client errors until now. So I don't really know if the client error was associated with more faulty caps or a problem with the 4.24 app. Alinator's seemingly identical error is making me wonder if it's the app.
This weekend I'll probably take the mobo out and see if I can spot any more dodgy caps. If so I'll replace some more of them and see if that cures the lockups. It'll also be interesting to see if I get any more client errors on that particular combination of hardware, 4.24 app, and particular frequency data file, once I cure the lockups.
Cheers,
Gary.
RE: Very strange: It
)
Yep. Keeps me confused ever since I made the error messages a little more verbose. We actually get a lot of these errors, I'll write to Rom about that. Maybe boinc_fopen() does some funny things...
BM
BM
To answer the question below
)
To answer the question below re 4.24 vs 4.17 with-w/o patch, I went ahead and did the 'ABC' patch and got the following results:
4.17 with 'ABC' patch applied yields approx 85.3k sec/WU
4.24 with no patch yields approx 90.9k sec/WU
4.24 with 'ABC' patch applied yields approx 88.3k sec/WU...about a 50 min. penalty/WU for running 4.24 vs 4.17 on this machine.
Seti Classic Final Total: 11446 WU.
Just to keep you updated of
)
Just to keep you updated of our plans, mainly regarding the cross-platform differences:
- Early next week (probably Monday) we'll issue a new validator that should make things easier for transition and probably fix some invalid results by itself
- After the new validator is in place, we'll issue a new set of Apps for public Beta Test (for all platforms) that incorporate the fixes accomplished so far. I'll keep on tracking problems and fixing bugs I find until the very last moment. The new Apps will also incorporate a new feature that we might need.
- If it turns out that we need this feature (using pre-calculated files instead of doing the calculations in the Apps to avoid platform differences there), we will issue new workunits (actually a new workunit generator) that will make use of this feature after the new Apps have been made "official".
- Once we got the validation working properly, I'll work on speeding up the computation in the Apps. The current code I plan to use for parts of the calculation btw. doesn't make use of neither modf() nor ftol() anymore but actually uses bit-operations to achieve something similar.
BM
BM
Excellent news, and just in
)
Excellent news, and just in time to deal with the 'new' monster workunits (>= 630 credits) that would otherwise cause quite a bit of frustration if crunched for zero credits because of the cross platform validation issue.
As to performance, I was surprised to see that the new app with ftol instead of modf seems to be slightly *faster* at least on some modern SSE2-capable Intel (!!!) CPUSs. I know it's a tad-bit slower on my Pentium M, but I checked one of the top 3 computers (see link on E@H homepage) and there was no decrease in crunching performance when the switch happened.
CU
BRM
My AMD 3500+ liked the 4.24,
)
My AMD 3500+ liked the 4.24, it went from about 38hr with 4.17 to ~28hr with 4.24 on WU from the same set of datafile, still waiting for my crunch partner to see if it is valid. If there is more to do too speed it up it´s great but I understand that the validation problem must be looked at first.
RE: After the new validator
)
Bernd,
You might like to consider posting a short news item (linking to your latest message) on the front page right now. This would give more people who might like to participate in the next beta test some time to do a bit of research before things get going in earnest. There probably aren't a whole lot of participants following this particular thread anymore :).
The other major benefit is for all those people who wouldn't participate in a beta test anyway. At least they should be highly encouraged to see that something is happening to address issues that may currently be turning them off this project.
Just IMHO of course :).
Cheers,
Gary.