Custom kernels

induktio
induktio
Joined: 1 Oct 10
Posts: 15
Credit: 10144774
RAC: 0
Topic 195373

Let's assume one has a dedicated Boinc crunching box which is running the stock Debian Linux amd64 kernel. I've been wondering has anyone seen any significant performance gains from compiling a new kernel optimized for HPC style appilications? It seems 'Processor type and features' in the config has some options which could affect scheduler performance in high-load server environments. Does it make sense to just use the stock kernel (2.6.32-5-amd64)?

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6588
Credit: 312813929
RAC: 182666

Custom kernels

Quote:
Let's assume one has a dedicated Boinc crunching box which is running the stock Debian Linux amd64 kernel. I've been wondering has anyone seen any significant performance gains from compiling a new kernel optimized for HPC style appilications? It seems 'Processor type and features' in the config has some options which could affect scheduler performance in high-load server environments. Does it make sense to just use the stock kernel (2.6.32-5-amd64)?


This may not be related at all - you be the judge. You have jogged my memory about something said recently here about compiled linux kernels on AMD's. See here.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 576513560
RAC: 192520

Performance-wise this should

Performance-wise this should not be worth it, as in BOINC you normally have as many threads running as you've got logical cores, so scheduling is pretty straight forward.
Could be different with demanding GPU setups.. but that's not the topic here.

MrS

Scanning for our furry friends since Jan 2002

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86562721
RAC: 1387

RE: Performance-wise this

Message 99978 in response to message 99977

Quote:
Performance-wise this should not be worth it, as in BOINC you normally have as many threads running as you've got logical cores, so scheduling is pretty straight forward. ...


I wouldn't agree that scheduling is "straight forward"! The boinc developers have rediscovered that...

However... Regardless of which Linux kernel scheduler (CPU/IO) you choose to use, the OS overhead is so small that you will not notice any improvement from trying compiler optimisation. That is with the possible exception of one option:

Using "-Os"

rather than the more normal "-O3".

The question comes for what the balance is for your system and application for code size, cache size, and cache churn vs instruction execution speed, data bandwidth, and memory bandwidth.

It is possible that the smaller code size from using "-Os" can give a greater speedup than by using optimisations that give faster execution at the expense of greater code size and higher instruction bandwidth...

I use "-Os" for small cache CPUs and slow disk IO systems, and "-O3" for large cache CPUs.

The greatest speedup to be had is in the algorithm and application, and in avoiding data bandwidth bottlenecks.

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

induktio
induktio
Joined: 1 Oct 10
Posts: 15
Credit: 10144774
RAC: 0

RE: This may not be related

Message 99979 in response to message 99976

Quote:

This may not be related at all - you be the judge. You have jogged my memory about something said recently here about compiled linux kernels on AMD's. See here.

Cheers, Mike.


Well, it seems to be not much related. To me it looks like some issue with a) Gentoo distribution, or b) hardware problem on the poster's machine. A faulthy RAM chip needs to only flip one bit here and there and the whole calculation will be corrupted. So far my computers have produced only one (1) invalid work unit, and the cause was very likely on the other contributors hardware since it was producing dozens of errors.

And about the custom kernels, I was probing whether compiling a new one could be worth the trouble. But if the possible performance gains are in the 0,1% ballpark, it just doesn't seem to be worth it. Debian stock kernel already seems to have the scheduler set to 250Hz which fits a wide range of systems. The only optimization would be to lower it to 100Hz, and possibly compile the kernel only for a specific processor family type.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 715624520
RAC: 965688

Hi! I'm not sure this can

Hi!

I'm not sure this can be tackled with options when compiling your own kernel, but some have claimed that "CPU locality" is something that would benefit BOINC performance: making sure that a process sticks to a given physical core during it's lifetime as much as possible. IIRC there were also attempts to have self-combiled BOINC clients that would enforce this to some degree, and some rather heated debates whether that was clever & effective or not.

Not sure who won that debate, tho.

These scheduling related issues aside, the OS-overhead really shouldn't matter for the overall runtime.

CU
HBE

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 576513560
RAC: 192520

RE: I wouldn't agree that

Message 99981 in response to message 99978

Quote:
I wouldn't agree that scheduling is "straight forward"! The boinc developers have rediscovered that...

We're talking about the OS scheduling as many BOINC threads as there are logical cores, aren't we?

MrS

Scanning for our furry friends since Jan 2002

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86562721
RAC: 1387

RE: Hi! I'm not sure this

Message 99982 in response to message 99980

Quote:

Hi!

I'm not sure this can be tackled with options when compiling your own kernel, but some have claimed that "CPU locality" is something that would benefit BOINC performance: making sure that a process sticks to a given physical core during it's lifetime as much as possible. IIRC there were also attempts to have self-combiled BOINC clients that would enforce this to some degree, and some rather heated debates whether that was clever & effective or not. ...


There was the suggestion that gave an improvement for some cases on Windows systems. However, the vagaries suggested it wasn't worth pursuing.

Since then, I believe the latest Windows (Win7 and later?) kernels now have better localisation and HT support.

Linux has good localisation and good HT/NUMA support since some time ago.

The CPU scheduler should know very much better than Boinc as to what to do with the process threads! If it doesn't, then you really do need a new OS!!

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

ML1
ML1
Joined: 20 Feb 05
Posts: 347
Credit: 86562721
RAC: 1387

RE: RE: I wouldn't agree

Message 99983 in response to message 99981

Quote:
Quote:
I wouldn't agree that scheduling is "straight forward"! The boinc developers have rediscovered that...

We're talking about the OS scheduling as many BOINC threads as there are logical cores, aren't we?

That's the simplistic view that Boinc might have. However, the reality can vary dramatically.

I've got Boinc running "4 threads" on this desktop/test system. There's 306 CPU tasks running, with perhaps on average just 6 active at any instant, but many more on each minute and for the first few minutes of each hour.

Boinc likely gets pushed around a bit for those brief blizzards of tasks.

The OS scheduler must be doing a good job because CPU utilisation stays pegged at 100%.

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 0

RE: Hi! I'm not sure this

Message 99984 in response to message 99980

Quote:

Hi!

I'm not sure this can be tackled with options when compiling your own kernel, but some have claimed that "CPU locality" is something that would benefit BOINC performance: making sure that a process sticks to a given physical core during it's lifetime as much as possible. IIRC there were also attempts to have self-combiled BOINC clients that would enforce this to some degree, and some rather heated debates whether that was clever & effective or not.

Not sure who won that debate, tho.

These scheduling related issues aside, the OS-overhead really shouldn't matter for the overall runtime.

AFAIK David Anderson was never convinced it would be of significant utility to the general boinc user (IIRC the only times large gains were seen were when pairing two apps with disparate CPU usage on the same core with HT); so it never made it out of forks like Crunchers and into the general client.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7214814931
RAC: 978164

RE: RE: some have claimed

Message 99985 in response to message 99984

Quote:
Quote:
some have claimed that "CPU locality" is something that would benefit BOINC performance: making sure that a process sticks to a given physical core during it's lifetime as much as possible.

(IIRC the only times large gains were seen were when pairing two apps with disparate CPU usage on the same core with HT)

People just want to believe that affinity helps. Several of us have run decent tests for a specific possible case, and the ones I know about (including mine) all came up without benefit. I've not seen the counter-example mentioned by Dan, though I could imagine a case of that character might have been observed. But generalizing to what the key difference was form the "no help" results may be dangerous.

At the risk of protocol violation, I'll quote myself in a recent thread here

Quote:
I looked at the dissimilar aps benefit as applied to mixing varying amounts of Einstein with SETI for the aps current in November 2007 and started this thread on it.

In that thread someone brought up to affinity point, and I made a test case for it, which in the environment under test came up with a pretty convincingly null benefit result, whereas the ap mixing effects were readily apparent.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.