Let's assume one has a dedicated Boinc crunching box which is running the stock Debian Linux amd64 kernel. I've been wondering has anyone seen any significant performance gains from compiling a new kernel optimized for HPC style appilications? It seems 'Processor type and features' in the config has some options which could affect scheduler performance in high-load server environments. Does it make sense to just use the stock kernel (2.6.32-5-amd64)?
Copyright © 2024 Einstein@Home. All rights reserved.
Custom kernels
)
This may not be related at all - you be the judge. You have jogged my memory about something said recently here about compiled linux kernels on AMD's. See here.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Performance-wise this should
)
Performance-wise this should not be worth it, as in BOINC you normally have as many threads running as you've got logical cores, so scheduling is pretty straight forward.
Could be different with demanding GPU setups.. but that's not the topic here.
MrS
Scanning for our furry friends since Jan 2002
RE: Performance-wise this
)
I wouldn't agree that scheduling is "straight forward"! The boinc developers have rediscovered that...
However... Regardless of which Linux kernel scheduler (CPU/IO) you choose to use, the OS overhead is so small that you will not notice any improvement from trying compiler optimisation. That is with the possible exception of one option:
Using "-Os"
rather than the more normal "-O3".
The question comes for what the balance is for your system and application for code size, cache size, and cache churn vs instruction execution speed, data bandwidth, and memory bandwidth.
It is possible that the smaller code size from using "-Os" can give a greater speedup than by using optimisations that give faster execution at the expense of greater code size and higher instruction bandwidth...
I use "-Os" for small cache CPUs and slow disk IO systems, and "-O3" for large cache CPUs.
The greatest speedup to be had is in the algorithm and application, and in avoiding data bandwidth bottlenecks.
Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
RE: This may not be related
)
Well, it seems to be not much related. To me it looks like some issue with a) Gentoo distribution, or b) hardware problem on the poster's machine. A faulthy RAM chip needs to only flip one bit here and there and the whole calculation will be corrupted. So far my computers have produced only one (1) invalid work unit, and the cause was very likely on the other contributors hardware since it was producing dozens of errors.
And about the custom kernels, I was probing whether compiling a new one could be worth the trouble. But if the possible performance gains are in the 0,1% ballpark, it just doesn't seem to be worth it. Debian stock kernel already seems to have the scheduler set to 250Hz which fits a wide range of systems. The only optimization would be to lower it to 100Hz, and possibly compile the kernel only for a specific processor family type.
Hi! I'm not sure this can
)
Hi!
I'm not sure this can be tackled with options when compiling your own kernel, but some have claimed that "CPU locality" is something that would benefit BOINC performance: making sure that a process sticks to a given physical core during it's lifetime as much as possible. IIRC there were also attempts to have self-combiled BOINC clients that would enforce this to some degree, and some rather heated debates whether that was clever & effective or not.
Not sure who won that debate, tho.
These scheduling related issues aside, the OS-overhead really shouldn't matter for the overall runtime.
CU
HBE
RE: I wouldn't agree that
)
We're talking about the OS scheduling as many BOINC threads as there are logical cores, aren't we?
MrS
Scanning for our furry friends since Jan 2002
RE: Hi! I'm not sure this
)
There was the suggestion that gave an improvement for some cases on Windows systems. However, the vagaries suggested it wasn't worth pursuing.
Since then, I believe the latest Windows (Win7 and later?) kernels now have better localisation and HT support.
Linux has good localisation and good HT/NUMA support since some time ago.
The CPU scheduler should know very much better than Boinc as to what to do with the process threads! If it doesn't, then you really do need a new OS!!
Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
RE: RE: I wouldn't agree
)
That's the simplistic view that Boinc might have. However, the reality can vary dramatically.
I've got Boinc running "4 threads" on this desktop/test system. There's 306 CPU tasks running, with perhaps on average just 6 active at any instant, but many more on each minute and for the first few minutes of each hour.
Boinc likely gets pushed around a bit for those brief blizzards of tasks.
The OS scheduler must be doing a good job because CPU utilisation stays pegged at 100%.
Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
RE: Hi! I'm not sure this
)
AFAIK David Anderson was never convinced it would be of significant utility to the general boinc user (IIRC the only times large gains were seen were when pairing two apps with disparate CPU usage on the same core with HT); so it never made it out of forks like Crunchers and into the general client.
RE: RE: some have claimed
)
People just want to believe that affinity helps. Several of us have run decent tests for a specific possible case, and the ones I know about (including mine) all came up without benefit. I've not seen the counter-example mentioned by Dan, though I could imagine a case of that character might have been observed. But generalizing to what the key difference was form the "no help" results may be dangerous.
At the risk of protocol violation, I'll quote myself in a recent thread here
In that thread someone brought up to affinity point, and I made a test case for it, which in the environment under test came up with a pretty convincingly null benefit result, whereas the ap mixing effects were readily apparent.