Custom kernels

induktio

Joined: 1 Oct 10

Posts: 15

Credit: 10144774

RAC: 0

10 Oct 2010 23:37:03 UTC

Topic 195373

(moderation:

)

Let's assume one has a dedicated Boinc crunching box which is running the stock Debian Linux amd64 kernel. I've been wondering has anyone seen any significant performance gains from compiling a new kernel optimized for HPC style appilications? It seems 'Processor type and features' in the config has some options which could affect scheduler performance in high-load server environments. Does it make sense to just use the stock kernel (2.6.32-5-amd64)?

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6591

Credit: 325370405

RAC: 183114

Custom kernels

11 Oct 2010 1:18:39 UTC

Message 99976

(moderation:

)

Quote:

Let's assume one has a dedicated Boinc crunching box which is running the stock Debian Linux amd64 kernel. I've been wondering has anyone seen any significant performance gains from compiling a new kernel optimized for HPC style appilications? It seems 'Processor type and features' in the config has some options which could affect scheduler performance in high-load server environments. Does it make sense to just use the stock kernel (2.6.32-5-amd64)?

This may not be related at all - you be the judge. You have jogged my memory about something said recently here about compiled linux kernels on AMD's. See here.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 583188137

RAC: 148282

Performance-wise this should

11 Oct 2010 8:11:56 UTC

Message 99977

(moderation:

)

Performance-wise this should not be worth it, as in BOINC you normally have as many threads running as you've got logical cores, so scheduling is pretty straight forward.
Could be different with demanding GPU setups.. but that's not the topic here.

MrS

Scanning for our furry friends since Jan 2002

ML1

Joined: 20 Feb 05

Posts: 347

Credit: 86563414

RAC: 19

RE: Performance-wise this

11 Oct 2010 13:35:25 UTC

Message 99978 in response to message 99977

(moderation:

)

Quote:

Performance-wise this should not be worth it, as in BOINC you normally have as many threads running as you've got logical cores, so scheduling is pretty straight forward. ...

I wouldn't agree that scheduling is "straight forward"! The boinc developers have rediscovered that...

However... Regardless of which Linux kernel scheduler (CPU/IO) you choose to use, the OS overhead is so small that you will not notice any improvement from trying compiler optimisation. That is with the possible exception of one option:

Using "-Os"

rather than the more normal "-O3".

The question comes for what the balance is for your system and application for code size, cache size, and cache churn vs instruction execution speed, data bandwidth, and memory bandwidth.

It is possible that the smaller code size from using "-Os" can give a greater speedup than by using optimisations that give faster execution at the expense of greater code size and higher instruction bandwidth...

I use "-Os" for small cache CPUs and slow disk IO systems, and "-O3" for large cache CPUs.

The greatest speedup to be had is in the algorithm and application, and in avoiding data bandwidth bottlenecks.

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

induktio

Joined: 1 Oct 10

Posts: 15

Credit: 10144774

RAC: 0

RE: This may not be related

11 Oct 2010 14:17:19 UTC

Message 99979 in response to message 99976

(moderation:

)

Quote:

This may not be related at all - you be the judge. You have jogged my memory about something said recently here about compiled linux kernels on AMD's. See here.

Cheers, Mike.

Well, it seems to be not much related. To me it looks like some issue with a) Gentoo distribution, or b) hardware problem on the poster's machine. A faulthy RAM chip needs to only flip one bit here and there and the whole calculation will be corrupted. So far my computers have produced only one (1) invalid work unit, and the cause was very likely on the other contributors hardware since it was producing dozens of errors.

And about the custom kernels, I was probing whether compiling a new one could be worth the trouble. But if the possible performance gains are in the 0,1% ballpark, it just doesn't seem to be worth it. Debian stock kernel already seems to have the scheduler set to 250Hz which fits a wide range of systems. The only optimization would be to lower it to 100Hz, and possibly compile the kernel only for a specific processor family type.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 765567684

RAC: 1087430

Hi! I'm not sure this can

11 Oct 2010 17:42:29 UTC

Message 99980

(moderation:

)

Hi!

I'm not sure this can be tackled with options when compiling your own kernel, but some have claimed that "CPU locality" is something that would benefit BOINC performance: making sure that a process sticks to a given physical core during it's lifetime as much as possible. IIRC there were also attempts to have self-combiled BOINC clients that would enforce this to some degree, and some rather heated debates whether that was clever & effective or not.

Not sure who won that debate, tho.

These scheduling related issues aside, the OS-overhead really shouldn't matter for the overall runtime.

CU
HBE

ExtraTerrestria...

Joined: 10 Nov 04

Posts: 770

Credit: 583188137

RAC: 148282

RE: I wouldn't agree that

11 Oct 2010 21:23:42 UTC

Message 99981 in response to message 99978

(moderation:

)

Quote:

I wouldn't agree that scheduling is "straight forward"! The boinc developers have rediscovered that...

We're talking about the OS scheduling as many BOINC threads as there are logical cores, aren't we?

MrS

Scanning for our furry friends since Jan 2002

ML1

Joined: 20 Feb 05

Posts: 347

Credit: 86563414

RAC: 19

RE: Hi! I'm not sure this

12 Oct 2010 13:51:54 UTC

Message 99982 in response to message 99980

(moderation:

)

Quote:

Hi!

I'm not sure this can be tackled with options when compiling your own kernel, but some have claimed that "CPU locality" is something that would benefit BOINC performance: making sure that a process sticks to a given physical core during it's lifetime as much as possible. IIRC there were also attempts to have self-combiled BOINC clients that would enforce this to some degree, and some rather heated debates whether that was clever & effective or not. ...

There was the suggestion that gave an improvement for some cases on Windows systems. However, the vagaries suggested it wasn't worth pursuing.

Since then, I believe the latest Windows (Win7 and later?) kernels now have better localisation and HT support.

Linux has good localisation and good HT/NUMA support since some time ago.

The CPU scheduler should know very much better than Boinc as to what to do with the process threads! If it doesn't, then you really do need a new OS!!

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

ML1

Joined: 20 Feb 05

Posts: 347

Credit: 86563414

RAC: 19

RE: RE: I wouldn't agree

12 Oct 2010 13:58:28 UTC

Message 99983 in response to message 99981

(moderation:

)

Quote:

Quote:
I wouldn't agree that scheduling is "straight forward"! The boinc developers have rediscovered that...

We're talking about the OS scheduling as many BOINC threads as there are logical cores, aren't we?

That's the simplistic view that Boinc might have. However, the reality can vary dramatically.

I've got Boinc running "4 threads" on this desktop/test system. There's 306 CPU tasks running, with perhaps on average just 6 active at any instant, but many more on each minute and for the first few minutes of each hour.

Boinc likely gets pushed around a bit for those brief blizzards of tasks.

The OS scheduler must be doing a good job because CPU utilisation stays pegged at 100%.

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3562358667

RAC: 0

RE: Hi! I'm not sure this

12 Oct 2010 23:39:46 UTC

Message 99984 in response to message 99980

(moderation:

)

Quote:

Hi!

I'm not sure this can be tackled with options when compiling your own kernel, but some have claimed that "CPU locality" is something that would benefit BOINC performance: making sure that a process sticks to a given physical core during it's lifetime as much as possible. IIRC there were also attempts to have self-combiled BOINC clients that would enforce this to some degree, and some rather heated debates whether that was clever & effective or not.

Not sure who won that debate, tho.

These scheduling related issues aside, the OS-overhead really shouldn't matter for the overall runtime.

AFAIK David Anderson was never convinced it would be of significant utility to the general boinc user (IIRC the only times large gains were seen were when pairing two apps with disparate CPU usage on the same core with HT); so it never made it out of forks like Crunchers and into the general client.

archae86

Joined: 6 Dec 05

Posts: 3161

Credit: 7282395041

RAC: 2039678

RE: RE: some have claimed

12 Oct 2010 23:52:46 UTC

Message 99985 in response to message 99984

(moderation:

)

Quote:

Quote:
some have claimed that "CPU locality" is something that would benefit BOINC performance: making sure that a process sticks to a given physical core during it's lifetime as much as possible.

(IIRC the only times large gains were seen were when pairing two apps with disparate CPU usage on the same core with HT)

People just want to believe that affinity helps. Several of us have run decent tests for a specific possible case, and the ones I know about (including mine) all came up without benefit. I've not seen the counter-example mentioned by Dan, though I could imagine a case of that character might have been observed. But generalizing to what the key difference was form the "no help" results may be dangerous.

At the risk of protocol violation, I'll quote myself in a recent thread here

Quote:

I looked at the dissimilar aps benefit as applied to mixing varying amounts of Einstein with SETI for the aps current in November 2007 and started this thread on it.

In that thread someone brought up to affinity point, and I made a test case for it, which in the environment under test came up with a pretty convincingly null benefit result, whereas the ap mixing effects were readily apparent.

Custom kernels

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner