It definitely is. For (Intel) Macs, the compiler knows that he will have at least SSE and SSE2 support (the earliest Intel Macs came with the 'Yonah' Core Duo or Core Solo chips). This will give the Darwin app quite an edge over the Linux and Windows apps that are optimized only for SSE.
Hmmm. That recalls another thread about optimized apps. So, while some parts of the E@H app do have a "smart launcher," that only provides the SSE version? Since you clearly have a SSE2 version floating around, it would make sense to at minimum provide a Windows variant that can be hand installed (via app_info), like I do with Milkyway@Home. Presuming, that is, that the auto-launcher (which is cool) can't handle the distinction. It would provide the project with a significant speed up given the current population of (mostly Windows I imagine) crunchers.
Counterarguments? I'm not presuming I know of any of the "gotcha's".
"Better is the enemy of the good." - Voltaire (should be memorized by every requirements lead)
So, while some parts of the E@H app do have a "smart launcher," that only provides the SSE version?
The 'smart launcher' or 'switcher' app as Bernd calls it, only applies to the GW suite of apps. The APB1 app doesn't (yet?) employ that level of sophistication. I don't know how big a job it would be to set it up, or even if it's possible. Maybe Bikeman might know :-).
So, while some parts of the E@H app do have a "smart launcher," that only provides the SSE version?
The 'smart launcher' or 'switcher' app as Bernd calls it, only applies to the GW suite of apps. The APB1 app doesn't (yet?) employ that level of sophistication. I don't know how big a job it would be to set it up, or even if it's possible. Maybe Bikeman might know :-).
I would not be surprised if Bernd wrote the switcher in a way that you could even use the exact same binary :-). The complexity is not the switcher itself but dealing with different logical code branches (whether done by #ifdef or some other mechanism doesn't matter, you have to maintain and test them all, e.g. for cross-validation problems).
Anyway...ABP1 is soon to be replaced by ABP2 and I'm optimistic that even without separate versions for SSE2 and SSE, the performance gap between Darwin and (Linux, Windows) will narrow down quite a bit because of some extra optimizations in ABP2 that will work fine even on SSE.
I don't know but it looks like the Darwin APB1 app may be pretty fast.
It definitely is. For (Intel) Macs, the compiler knows that he will have at least SSE and SSE2 support...
... disadvantage of SSE-only-optimiziation is hurting the CUDA apps and the the CPU apps on Windows and Linux.
We're not STILL suffering the stupidity of the "Naughty Intel" slow-down sabotage[*] are we?!
... Is this a case for compiling non-crippled optimised versions that assume unconditionally that SSE2/SSE3 is available?
Regards,
Martin
[*] "It is a shame that the Intel compiler, which use to be almost the no-brainer choice if your primary concern was fast code, is now being coerced into being a marketing tool. Crippling the output for non-Intel chips may mean that some published benchmarks may end up bogusly favouring Intel over AMD, but the cost is that if you want to release fast production code I can't recommend the (unpatched) compiler. There are an awful lot of AMD machines out there!"
... Is this a case for compiling non-crippled optimised versions that assume unconditionally that SSE2/SSE3 is available?
Regards,
Martin
I don't think that Bikeman was suggesting the code was crippled: he was merely pointing out that the Windows/Linux ABP1 CPU apps might genuinely encounter a SSE-only (not SSE2) CPU in the field: whereas (unless it's a hackintosh) you'll always have SSE2 under Darwin.
Bikeman, how likely is it that the CUDA app would encounter an SSE-only host in the wild? I know it's possible - you can get PCI versions of some of the low-end cards - but it would be reasonable to demand SSE2 to run a decent CUDA card, methinks.
Edit - I've checked the Windows ABP1 v3.12 binary with a hex editor: Swallowtail's smoking gun is not present. I presume you've done the same with the Linux binary?
... Is this a case for compiling non-crippled optimised versions that assume unconditionally that SSE2/SSE3 is available?
I don't think that Bikeman was suggesting the code was crippled: he was merely pointing out that the Windows/Linux ABP1 CPU apps might genuinely encounter a SSE-only (not SSE2) CPU in the field: whereas (unless it's a hackintosh) you'll always have SSE2 under Darwin.
So it's actually a comment that the SSEx detect code isn't included and so the 'lowest common denominator' has been compiled for...
Quote:
Bikeman, how likely is it that the CUDA app would encounter an SSE-only host in the wild? I know it's possible - you can get PCI versions of some of the low-end cards - but it would be reasonable to demand SSE2 to run a decent CUDA card, methinks.
Very good thought. Or just make some optimised versions available for those interested? That's only compiler options and a download link...
Quote:
Edit - I've checked the Windows ABP1 v3.12 binary with a hex editor: Swallowtail's smoking gun is not present. I presume you've done the same with the Linux binary?
Now why would you suspect that?... ;-)
Yep, and no 'Naughty Intel' HEX found. However, there are quite a few Intel names in such as flags and variables names.
Indeed, I was NOT hinting at any slow-down code or other sinister things in the code :-), it's just the target-architecture that the apps are compiled for.
CUDA with SSE-only CPUs: Well, I first thought that this isn't an option because all CUDA cards were PCIx ... until somebody showed me CUDA cards for AGP slots. Yes, they exist! So you can have a CUDA card running in an Athlon XP system without problems, e.g. as an upgrade for gaming. Doesn't make that much sense but there will be systems like this.
As I said earlier: I'm confident that ABP2 will not require those particular lines of codes which compiled rather poorly under the SSE-only compile but shine with SSE2 compile. Just be patient :-)
RE: It definitely is. For
)
Hmmm. That recalls another thread about optimized apps. So, while some parts of the E@H app do have a "smart launcher," that only provides the SSE version? Since you clearly have a SSE2 version floating around, it would make sense to at minimum provide a Windows variant that can be hand installed (via app_info), like I do with Milkyway@Home. Presuming, that is, that the auto-launcher (which is cool) can't handle the distinction. It would provide the project with a significant speed up given the current population of (mostly Windows I imagine) crunchers.
Counterarguments? I'm not presuming I know of any of the "gotcha's".
"Better is the enemy of the good." - Voltaire (should be memorized by every requirements lead)
RE: So, while some parts of
)
The 'smart launcher' or 'switcher' app as Bernd calls it, only applies to the GW suite of apps. The APB1 app doesn't (yet?) employ that level of sophistication. I don't know how big a job it would be to set it up, or even if it's possible. Maybe Bikeman might know :-).
Cheers,
Gary.
RE: RE: So, while some
)
I would not be surprised if Bernd wrote the switcher in a way that you could even use the exact same binary :-). The complexity is not the switcher itself but dealing with different logical code branches (whether done by #ifdef or some other mechanism doesn't matter, you have to maintain and test them all, e.g. for cross-validation problems).
Anyway...ABP1 is soon to be replaced by ABP2 and I'm optimistic that even without separate versions for SSE2 and SSE, the performance gap between Darwin and (Linux, Windows) will narrow down quite a bit because of some extra optimizations in ABP2 that will work fine even on SSE.
CU
Bikeman
RE: It definitely is. For
)
This is also true of -any- host running a 64-bit OS. Does Einstein@Home send out 64-bit apps for any or all operating systems?
RE: RE: I don't know but
)
We're not STILL suffering the stupidity of the "Naughty Intel" slow-down sabotage[*] are we?!
... Is this a case for compiling non-crippled optimised versions that assume unconditionally that SSE2/SSE3 is available?
Regards,
Martin
[*] "It is a shame that the Intel compiler, which use to be almost the no-brainer choice if your primary concern was fast code, is now being coerced into being a marketing tool. Crippling the output for non-Intel chips may mean that some published benchmarks may end up bogusly favouring Intel over AMD, but the cost is that if you want to release fast production code I can't recommend the (unpatched) compiler. There are an awful lot of AMD machines out there!"
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
RE: ... Is this a case for
)
I don't think that Bikeman was suggesting the code was crippled: he was merely pointing out that the Windows/Linux ABP1 CPU apps might genuinely encounter a SSE-only (not SSE2) CPU in the field: whereas (unless it's a hackintosh) you'll always have SSE2 under Darwin.
Bikeman, how likely is it that the CUDA app would encounter an SSE-only host in the wild? I know it's possible - you can get PCI versions of some of the low-end cards - but it would be reasonable to demand SSE2 to run a decent CUDA card, methinks.
Edit - I've checked the Windows ABP1 v3.12 binary with a hex editor: Swallowtail's smoking gun is not present. I presume you've done the same with the Linux binary?
RE: RE: ... Is this a
)
So it's actually a comment that the SSEx detect code isn't included and so the 'lowest common denominator' has been compiled for...
Very good thought. Or just make some optimised versions available for those interested? That's only compiler options and a download link...
Now why would you suspect that?... ;-)
Yep, and no 'Naughty Intel' HEX found. However, there are quite a few Intel names in such as flags and variables names.
Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
RE: So it's actually a
)
Yes, I remember testing ABP1 on my Celeron MMX (RIP) for Bernd - it even has SSE-emulation code for those dinosaurs.
Indeed, I was NOT hinting at
)
Indeed, I was NOT hinting at any slow-down code or other sinister things in the code :-), it's just the target-architecture that the apps are compiled for.
CUDA with SSE-only CPUs: Well, I first thought that this isn't an option because all CUDA cards were PCIx ... until somebody showed me CUDA cards for AGP slots. Yes, they exist! So you can have a CUDA card running in an Athlon XP system without problems, e.g. as an upgrade for gaming. Doesn't make that much sense but there will be systems like this.
As I said earlier: I'm confident that ABP2 will not require those particular lines of codes which compiled rather poorly under the SSE-only compile but shine with SSE2 compile. Just be patient :-)
CU
Bikeman
RE: I first thought that
)
Same here. Well there you go => learn something new everyday.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal