Akos, what about the S5R2 hot loop and SSE2? I know SSE1 and 3DNow brought a huge speed increase to the former applications, so the code seems to be fine for vectorization. These two work on 64 Bit, whereas you can do 128 Bits with SSE2. There's just not much benefit in doing so with P3/4/M and AXP/64. However, Core 2 D/Q and K10 would benefit greatly from such a codepath as they can process 128 Bit vectors in one cycle (OK, send and retire one such command per clock per unit :) instead of two cycles for the former CPUs. I see great potential for optimizations here, which is hardly used, if at all, by any BOINC app today (or did I miss some fancy new project?)
The 'main' hot loop is designed for single precision ( against of S5R1 ), and the double precision parts are similar to the S5R1 code, where the SSE2 code was tried out, but wasn't faster (AXP,A64,P4,PM). So i don't think that the code will be optimised for SSE2. I know the C2D processors would be faster but not significantly. There is no time for this at moment.
The 64 bit SSE engine CPUs can execute an arithmetic instruction ( ADD,MUL,SUB ) in 5 cycles. The 128 bit wide SSE engine CPUs (C2D,K10?) executes them in 4 cycles. So the ratio is 5:4, not 2:1. Of course, there are 1 cycle difference.
Besides old, slow machines having a problem with that, so
do newer machines that run a low resource allocation to e@h.
With BIG units coming in, BOINC has to prioritize them
up, with the result of not honoring my chosen resource
shares, and cutting heavily into the time allocated to
my primary project, s@h.
You are aware that BOINC does scheduling on two layers, right? One is when running already downloaded WUs and one when deciding on when to fetch another WU. If E@H runs into dead-line problem and thus consumes more than its share of CPU time, it'll build up long term debt, which in turn will prevent it from fetching another E@H WU. This effectively means that your resource share split will be honoured in longer time scale (such as on monthly basis).
But you are right when saying your resource shares won't be matched daily ...
And I would have to agree with you on the value of the users running 1.0 -1.5 or slower chips, on dialup, and would be very interested in how the completed work would "breakdown" between the various CPU speeds. I think most people would be surprised. But I gotta agree with you that the backbone of many of these programs is made up of people with older machines that work perfectly well, but they are not being used after they are replaced with the latest and greatest screemin' gamer box. They need to take a close look before they run too many of these folks off for good.
Even if the slow machines themselves are considered expendable, i.e. their contributions would not be greatly missed, I hope their owners aren’t. I expect that any participants who may be forced to withdraw will be less inclined to attach their new or upgraded systems to the project in the future. So accommodating the ‘low-end’ users, to earn their loyalty—rather than sneering at them—would probably pay off in the long term.
You're 100% correct there. Still, there are technical necessities which sometimes force a project to have rather high system requirements. Over at CPDN, for example, they demand 512 MB of memory and at least a 1.6 GHz CPU- and believe me, they do need it. When I attached my old notebook (1.3 GHz and 496 MB of memory, so, just a little bit below the recommended minimum configuration) the whole thing was so unstable that I gave it up after a few weeks. I never got more than 8 hours into my model. (And no, it wasn't anything about the system, which cooperated with about every other BOINC project just fine, nor was it my fault cause on my more powerful desktop I didn't get any problems. My notebook was, very simply, not powerful enough) And I don't think they do that to annoy crunchers, it's simply because the app needs it. So, if the current science run and the hardware used for the project on the server side force the developers to make the WUs larger, I don't think that can really be helped. Of course, it is worth considering if they can extend the deadlines a bit- which in turn would probably mean higher "pendings" for people with medium to fast boxes, but then, having pending credit is not sooooo bad cause you always know you'll get it eventually.
I wish all projects would have optional "no nonsense" clients though, with only the math core and no graphics and stuff.
If you install BOINC as a system service, the graphics will be disabled. (Systems with video cards that do their own processing won’t get a lot of speed benefit from this.)
So i don't think that the code will be optimised for SSE2. I know the C2D processors would be faster but not significantly. There is no time for this at moment.
How about Penryn's SSE4? That should be out late this year / early next year. Are the new instructions of any use to E@H?
I wish all projects would have optional "no nonsense" clients though, with only the math core and no graphics and stuff.
If you install BOINC as a system service, the graphics will be disabled. (Systems with video cards that do their own processing won’t get a lot of speed benefit from this.)
Not using isn't the real problem, not loading (and initializing) the .DLL / .SO stuff would be a lot better. Lite and graphical version could even be in the same program, if the shared libraries loaders would be under program control instead of linker control.
p.s.: linking against a module with all dummy functions in it would do that job too of course ;-)
I wish all projects would have optional "no nonsense" clients though, with only the math core and no graphics and stuff.
If you install BOINC as a system service, the graphics will be disabled. (Systems with video cards that do their own processing won’t get a lot of speed benefit from this.)
Not using isn't the real problem, not loading (and initializing) the .DLL / .SO stuff would be a lot better. Lite and graphical version could even be in the same program, if the shared libraries loaders would be under program control instead of linker control.
p.s.: linking against a module with all dummy functions in it would do that job too of course ;-)
I'm an aging baby boomer who graduated with a minor in physics and astronomy many, many years ago, and still like to take my C-8 out at least once a month, but sometimes I wish to h*ll I could figure out what you youngsters are talkin' about.. I just got to a point where I could put together a machine using the first generation Athlons, and now I buy a board, chip, and accessories for the latest dual, and everything somehow looks kind of
foreign:(
Gary
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.....Douglas Adams
Hm, the first S5R2 WU i got needed about 12 hours to complete. The one i have now is at 2,5 hours and tells me: 26 hours to go - wow... Thats really long and for someone who doesnt crunch too often the 14 days limit could become a problem at this length...
RE: Akos, what about the
)
The 'main' hot loop is designed for single precision ( against of S5R1 ), and the double precision parts are similar to the S5R1 code, where the SSE2 code was tried out, but wasn't faster (AXP,A64,P4,PM). So i don't think that the code will be optimised for SSE2. I know the C2D processors would be faster but not significantly. There is no time for this at moment.
The 64 bit SSE engine CPUs can execute an arithmetic instruction ( ADD,MUL,SUB ) in 5 cycles. The 128 bit wide SSE engine CPUs (C2D,K10?) executes them in 4 cycles. So the ratio is 5:4, not 2:1. Of course, there are 1 cycle difference.
RE: Besides old, slow
)
You are aware that BOINC does scheduling on two layers, right? One is when running already downloaded WUs and one when deciding on when to fetch another WU. If E@H runs into dead-line problem and thus consumes more than its share of CPU time, it'll build up long term debt, which in turn will prevent it from fetching another E@H WU. This effectively means that your resource share split will be honoured in longer time scale (such as on monthly basis).
But you are right when saying your resource shares won't be matched daily ...
Metod ...
RE: And I would have to
)
Even if the slow machines themselves are considered expendable, i.e. their contributions would not be greatly missed, I hope their owners aren’t. I expect that any participants who may be forced to withdraw will be less inclined to attach their new or upgraded systems to the project in the future. So accommodating the ‘low-end’ users, to earn their loyalty—rather than sneering at them—would probably pay off in the long term.
You're 100% correct there.
)
You're 100% correct there. Still, there are technical necessities which sometimes force a project to have rather high system requirements. Over at CPDN, for example, they demand 512 MB of memory and at least a 1.6 GHz CPU- and believe me, they do need it. When I attached my old notebook (1.3 GHz and 496 MB of memory, so, just a little bit below the recommended minimum configuration) the whole thing was so unstable that I gave it up after a few weeks. I never got more than 8 hours into my model. (And no, it wasn't anything about the system, which cooperated with about every other BOINC project just fine, nor was it my fault cause on my more powerful desktop I didn't get any problems. My notebook was, very simply, not powerful enough) And I don't think they do that to annoy crunchers, it's simply because the app needs it. So, if the current science run and the hardware used for the project on the server side force the developers to make the WUs larger, I don't think that can really be helped. Of course, it is worth considering if they can extend the deadlines a bit- which in turn would probably mean higher "pendings" for people with medium to fast boxes, but then, having pending credit is not sooooo bad cause you always know you'll get it eventually.
My P3s/1266 Tualatins work
)
My P3s/1266 Tualatins work fine with HadCM3, one dual (1GB RAM) and one single (256MB RAM). I had a full run lately :-)
I wish all projects would have optional "no nonsense" clients though, with only the math core and no graphics and stuff.
RE: I wish all projects
)
If you install BOINC as a system service, the graphics will be disabled. (Systems with video cards that do their own processing won’t get a lot of speed benefit from this.)
RE: So i don't think that
)
How about Penryn's SSE4? That should be out late this year / early next year. Are the new instructions of any use to E@H?
RE: RE: I wish all
)
Not using isn't the real problem, not loading (and initializing) the .DLL / .SO stuff would be a lot better. Lite and graphical version could even be in the same program, if the shared libraries loaders would be under program control instead of linker control.
p.s.: linking against a module with all dummy functions in it would do that job too of course ;-)
RE: RE: RE: I wish all
)
I'm an aging baby boomer who graduated with a minor in physics and astronomy many, many years ago, and still like to take my C-8 out at least once a month, but sometimes I wish to h*ll I could figure out what you youngsters are talkin' about.. I just got to a point where I could put together a machine using the first generation Athlons, and now I buy a board, chip, and accessories for the latest dual, and everything somehow looks kind of
foreign:(
Gary
In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.....Douglas Adams
Hm, the first S5R2 WU i got
)
Hm, the first S5R2 WU i got needed about 12 hours to complete. The one i have now is at 2,5 hours and tells me: 26 hours to go - wow... Thats really long and for someone who doesnt crunch too often the 14 days limit could become a problem at this length...
(on an Athlon 64 3500+ at 2500 Mhz)