CUDA Application under-performance

Martin Ryba
Martin Ryba
Joined: 9 Apr 09
Posts: 48
Credit: 153159171
RAC: 16648

RE: It definitely is. For

Message 95707 in response to message 95706

Quote:
It definitely is. For (Intel) Macs, the compiler knows that he will have at least SSE and SSE2 support (the earliest Intel Macs came with the 'Yonah' Core Duo or Core Solo chips). This will give the Darwin app quite an edge over the Linux and Windows apps that are optimized only for SSE.

Hmmm. That recalls another thread about optimized apps. So, while some parts of the E@H app do have a "smart launcher," that only provides the SSE version? Since you clearly have a SSE2 version floating around, it would make sense to at minimum provide a Windows variant that can be hand installed (via app_info), like I do with Milkyway@Home. Presuming, that is, that the auto-launcher (which is cool) can't handle the distinction. It would provide the project with a significant speed up given the current population of (mostly Windows I imagine) crunchers.

Counterarguments? I'm not presuming I know of any of the "gotcha's".

"Better is the enemy of the good." - Voltaire (should be memorized by every requirements lead)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110078908941
RAC: 23604688

RE: So, while some parts of

Message 95708 in response to message 95707

Quote:
So, while some parts of the E@H app do have a "smart launcher," that only provides the SSE version?


The 'smart launcher' or 'switcher' app as Bernd calls it, only applies to the GW suite of apps. The APB1 app doesn't (yet?) employ that level of sophistication. I don't know how big a job it would be to set it up, or even if it's possible. Maybe Bikeman might know :-).

Cheers,
Gary.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 691365100
RAC: 255554

RE: RE: So, while some

Message 95709 in response to message 95708

Quote:
Quote:
So, while some parts of the E@H app do have a "smart launcher," that only provides the SSE version?

The 'smart launcher' or 'switcher' app as Bernd calls it, only applies to the GW suite of apps. The APB1 app doesn't (yet?) employ that level of sophistication. I don't know how big a job it would be to set it up, or even if it's possible. Maybe Bikeman might know :-).

I would not be surprised if Bernd wrote the switcher in a way that you could even use the exact same binary :-). The complexity is not the switcher itself but dealing with different logical code branches (whether done by #ifdef or some other mechanism doesn't matter, you have to maintain and test them all, e.g. for cross-validation problems).

Anyway...ABP1 is soon to be replaced by ABP2 and I'm optimistic that even without separate versions for SSE2 and SSE, the performance gap between Darwin and (Linux, Windows) will narrow down quite a bit because of some extra optimizations in ABP2 that will work fine even on SSE.

CU
Bikeman

Ver Greeneyes
Ver Greeneyes
Joined: 26 Mar 09
Posts: 140
Credit: 9562235
RAC: 0

RE: It definitely is. For

Message 95710 in response to message 95706

Quote:
It definitely is. For (Intel) Macs, the compiler knows that he will have at least SSE and SSE2 support


This is also true of -any- host running a 64-bit OS. Does Einstein@Home send out 64-bit apps for any or all operating systems?

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86315601
RAC: 311

RE: RE: I don't know but

Message 95711 in response to message 95706

Quote:
Quote:
I don't know but it looks like the Darwin APB1 app may be pretty fast.

It definitely is. For (Intel) Macs, the compiler knows that he will have at least SSE and SSE2 support...

... disadvantage of SSE-only-optimiziation is hurting the CUDA apps and the the CPU apps on Windows and Linux.


We're not STILL suffering the stupidity of the "Naughty Intel" slow-down sabotage[*] are we?!

... Is this a case for compiling non-crippled optimised versions that assume unconditionally that SSE2/SSE3 is available?

Regards,
Martin

[*] "It is a shame that the Intel compiler, which use to be almost the no-brainer choice if your primary concern was fast code, is now being coerced into being a marketing tool. Crippling the output for non-Intel chips may mean that some published benchmarks may end up bogusly favouring Intel over AMD, but the cost is that if you want to release fast production code I can't recommend the (unpatched) compiler. There are an awful lot of AMD machines out there!"

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2142
Credit: 2775474034
RAC: 817215

RE: ... Is this a case for

Message 95712 in response to message 95711

Quote:

... Is this a case for compiling non-crippled optimised versions that assume unconditionally that SSE2/SSE3 is available?

Regards,
Martin


I don't think that Bikeman was suggesting the code was crippled: he was merely pointing out that the Windows/Linux ABP1 CPU apps might genuinely encounter a SSE-only (not SSE2) CPU in the field: whereas (unless it's a hackintosh) you'll always have SSE2 under Darwin.

Bikeman, how likely is it that the CUDA app would encounter an SSE-only host in the wild? I know it's possible - you can get PCI versions of some of the low-end cards - but it would be reasonable to demand SSE2 to run a decent CUDA card, methinks.

Edit - I've checked the Windows ABP1 v3.12 binary with a hex editor: Swallowtail's smoking gun is not present. I presume you've done the same with the Linux binary?

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86315601
RAC: 311

RE: RE: ... Is this a

Message 95713 in response to message 95712

Quote:
Quote:
... Is this a case for compiling non-crippled optimised versions that assume unconditionally that SSE2/SSE3 is available?

I don't think that Bikeman was suggesting the code was crippled: he was merely pointing out that the Windows/Linux ABP1 CPU apps might genuinely encounter a SSE-only (not SSE2) CPU in the field: whereas (unless it's a hackintosh) you'll always have SSE2 under Darwin.


So it's actually a comment that the SSEx detect code isn't included and so the 'lowest common denominator' has been compiled for...

Quote:
Bikeman, how likely is it that the CUDA app would encounter an SSE-only host in the wild? I know it's possible - you can get PCI versions of some of the low-end cards - but it would be reasonable to demand SSE2 to run a decent CUDA card, methinks.


Very good thought. Or just make some optimised versions available for those interested? That's only compiler options and a download link...

Quote:
Edit - I've checked the Windows ABP1 v3.12 binary with a hex editor: Swallowtail's smoking gun is not present. I presume you've done the same with the Linux binary?


Now why would you suspect that?... ;-)

Yep, and no 'Naughty Intel' HEX found. However, there are quite a few Intel names in such as flags and variables names.

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2142
Credit: 2775474034
RAC: 817215

RE: So it's actually a

Message 95714 in response to message 95713

Quote:
So it's actually a comment that the SSEx detect code isn't included and so the 'lowest common denominator' has been compiled for...


Yes, I remember testing ABP1 on my Celeron MMX (RIP) for Bernd - it even has SSE-emulation code for those dinosaurs.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 691365100
RAC: 255554

Indeed, I was NOT hinting at

Indeed, I was NOT hinting at any slow-down code or other sinister things in the code :-), it's just the target-architecture that the apps are compiled for.

CUDA with SSE-only CPUs: Well, I first thought that this isn't an option because all CUDA cards were PCIx ... until somebody showed me CUDA cards for AGP slots. Yes, they exist! So you can have a CUDA card running in an Athlon XP system without problems, e.g. as an upgrade for gaming. Doesn't make that much sense but there will be systems like this.

As I said earlier: I'm confident that ABP2 will not require those particular lines of codes which compiled rather poorly under the SSE-only compile but shine with SSE2 compile. Just be patient :-)

CU
Bikeman

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6542
Credit: 287195938
RAC: 97241

RE: I first thought that

Message 95716 in response to message 95715

Quote:
I first thought that this isn't an option because all CUDA cards were PCIx ... until somebody showed me CUDA cards for AGP slots. Yes, they exist!


Same here. Well there you go => learn something new everyday.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.