The puzzling thing is that when I use the 2 cpus the crunching time is 2.5 hours for each wu (an average 1.25 hours per wu), while when i use only 1 cpu (set in the general preferences) the average crunching time is 1 hour. That is (if my maths are correct) 20% faster with one cpu.
Is n´t that strange?
You are looking at this wrong. You are doing 2 WUs in 1.25 time, because you are doing 2 simutaneously. When you are doing 1 at at time, it takes 2 hours to do two WUs. Yes there is a loss for the HT, because it is a logical processor, but it is a gain. Part of the reason of the slow down is the competition for the L2 Cache, but it still is faster to do the work with it on.
this is what I read in the original post: 2 results in parallel => 1 * 2.5 hours = 2.5 hours for 2 results 2 results in series => 2 * 1 hour = 2 hours for 2 results
Mr Pernod stated it right! By using one logical cpu (by setting "at multiprocessors use maximum 1 cpu) i crunch 1 wu for 1 hour, that means 2wu in 2hours. Bu using two logical cpus (by setting "at multiprocessors use maximum 2 cpus) i crunch 2 wu in 2 and a half hours.
Moreover, I disabled HT and got 1 wu in 57 minutes and a second one in 71 minutes. So, i enabled it again!
Anyway, thank you all again for your valuable time trying to help me (and others) with this!
Cheers from Cyprus
Constantinos
Gravity increases significantly in Autumn, because apples fall in large numbers during that time!
Moreover, I disabled HT and got 1 wu in 57 minutes and a second one in 71 minutes. So, i enabled it again!
Constantinos
To conclude that some variable "A" causes result "XYZ" you have to make all known variables constant amongst test runs while simultaneously altering only one variable at a time. So if your 57-minute WU is different than the 71-minute WU how can you draw the conclusion that it was HT that caused the time difference?
I read all the Akosf related msg threads and frankly I'm befuddled by all the conclusions that people draw without properly controlling the variables. If I'm wrong, then people are at least guilty of failing to describe their experimental conditions before publishing their results. This leads people to believe in things that may or may not be true and causes them to make decisions about their operational situation. People are free to do what they want but please use a healthy dose of skepticism before "believing" what you read.
Moreover, I disabled HT and got 1 wu in 57 minutes and a second one in 71 minutes. So, i enabled it again!
Constantinos
To conclude that some variable "A" causes result "XYZ" you have to make all known variables constant amongst test runs while simultaneously altering only one variable at a time. So if your 57-minute WU is different than the 71-minute WU how can you draw the conclusion that it was HT that caused the time difference?
I read all the Akosf related msg threads and frankly I'm befuddled by all the conclusions that people draw without properly controlling the variables. If I'm wrong, then people are at least guilty of failing to describe their experimental conditions before publishing their results. This leads people to believe in things that may or may not be true and causes them to make decisions about their operational situation. People are free to do what they want but please use a healthy dose of skepticism before "believing" what you read.
I am not that ignorant about constants and variables. I will come back later with accurate results with and without HT in more detail.
P.S."People are free to do what they want but please use a healthy dose of skepticism before "believing" what you read." you said! I agree 100%!!!!!
Gravity increases significantly in Autumn, because apples fall in large numbers during that time!
I am not that ignorant about constants and variables. I will come back later with accurate results with and without HT in more detail.
For a given processor with HT enabled ( any other stuff constant ) you will get a benefit in total WU throughput in time for that machine - but not as a simple two for one. Hyperthreading is a ( fairly ) clever way of dealing with the overhead of task switching. For your interest, from Intel:
Quote:
Multithreaded software divides its workloads into processes and threads that can be independently scheduled and dispatched. In a multiprocessor system, those threads execute on different processors. HT Technology allows a single Pentium 4 processor to function as two virtual or logical processors.
Quote:
Hyper-Threading Technology enables thread-level parallelism (TLP) by duplicating the architectural state on each processor while sharing one set of processor execution resources. When scheduling threads, the operating system treats the two distinct architectural states as separate "logical" processors, which allows multiprocessor capable software to run unmodified on twice as many logical processors. Although Hyper-Threading Technology will not provide the level of performance scaling achieved by adding a second processor, benchmark tests show some server applications can experience a 30 percent gain in performance.
It's in the 'duplicating the architectural state' that the benefit arises.
I would emphasise that it's not just the processor type that is relevant, but also the application software and operating system too. This will impact on what 'Nothing But Idle Time' was ( correctly ) asserting about the 'operational situation'. As above, even Intel had to benchmark HT to get figures.....
Cheers, Mike.
( edit ) Note particularly the 'mere' 30% improvement with HT. That is enough, when doing comparisons, to overlap in performance with the usual spread of 'difficulty' of work units that we recieve from E@H....... ( meaning a couple of 'short' WU's with no-HT can favourably contrast with several 'longs' with-HT ).
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Multithreaded software divides its workloads into processes and threads that can be independently scheduled and dispatched
unfortunatly Albert is single-threaded.
instead of sharing the workload over multiple threads, albert is running as multiple instances on multi-cpu systems and those multiple programs are in fact competing with each other over the resources of the physical cpu.
this competition is fine up to a certain point of (in)efficiency of the code, but once the code get more and more efficient (optimized) the use of cpu-resources gets to a point where competition becomes contention and then the efficiency of hyperthreading drops dramaticaly.
to give you an example:
I have just enabled hyperthreading on my dual xeon 2.4 (running at 2.8/800).
I also have a dual xeon 1.6 (also running at 2.8/800)with hyperthreading disabled.
both machines are running S41.07
let's see what difference in RAC there is between these two machines in a weeks time.
if I were a betting man, my money would be on the machine with hyperthreading disabled.
unfortunatly Albert is single-threaded.
instead of sharing the workload over multiple threads, albert is running as multiple instances on multi-cpu systems and those multiple programs are in fact competing with each other over the resources of the physical cpu.
this competition is fine up to a certain point of (in)efficiency of the code, but once the code get more and more efficient (optimized) the use of cpu-resources gets to a point where competition becomes contention and then the efficiency of hyperthreading drops dramaticaly.
Correct.
Quote:
to give you an example:
I have just enabled hyperthreading on my dual xeon 2.4 (running at 2.8/800).
I also have a dual xeon 1.6 (also running at 2.8/800)with hyperthreading disabled.
both machines are running S41.07
let's see what difference in RAC there is between these two machines in a weeks time.
if I were a betting man, my money would be on the machine with hyperthreading disabled.
In similiar vein I'll see if I can find one of my HT machines that absolutely hasn't got anything else to distract it. If so I'll try dropping HT for a week, then compare....
For your machines though I'll bet you, err .... a 100 credit note ..... that HT is best. :-)
Cheers, Mike.
PS. Got to make it interesting! Aussies will bet on anything..... a fly crawling up a wall.... even a fly crawling down a wall. :-)
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
...
I am not that ignorant about constants and variables. I will come back later with accurate results with and without HT in more detail.
P.S."People are free to do what they want but please use a healthy dose of skepticism before "believing" what you read." you said! I agree 100%!!!!!
I certainly don't think you're ignorant and wouldn't accuse anyone of it. And you don't have to prove anything to me.
I merely used this opportunity (perhaps I could have chosen a better place) to point out to the general population that --based on what I've been reading over time-- I view all reported "experiences" with Akosf apps with a healthy dose of skepticism. It's not a question of whether they are right or wrong, it's a question of believability. Right now people are in an Akosf frenzy. I take note of what people say and keep it in the back of my mind. But unless someone specifically runs a controlled experiment and fully publishes the operational environment and controlled variables... well I think people draw conclusions that aren't supported and are mis-leading to others.
I've tried various newer Akosf apps with my HT machine and I can't find anything better than S39L in the HT mode on my Prescott P4 with i925x/LGA775 chipset and socket (notice I fail to mention the amount of L1 or L2 cache, or memory). I even tried suspending all WUs leaving only 1 WU to run
all by itself and that WU took longer than when 2 WUs were running. I could report this finding and fail to mention that my wife working on the computer at the time!
Nonetheless, I've noticed that my machine sometimes reacts just as other Prescott users have reported and sometimes it doesn't. Hence my comment about the operational environment of any particular host not being the same as any other host.
I have noticed that a few individuals are running tests using the same WU repeatedly which gives me limited confidence in the comparisons they are reporting. Still, it's difficult to get a handle on it because their particular machine and operational environment is never identical to mine.:(
let's make things simple. I run the EAH for almost one year for the good of science. From the very first days of this experience I enjoyed the "award" of the credits, as many others I imagine, as a small incentive to continue contributing to the project. Thus, this double motivation made me wonder if I can improve the performance of my PC to crunch WU in as little time as possible.
Anyway, I am not that "obsessed" nor do I have the time to conduct scientific experiments to prove which configuration is best. I disabled and enabled HT once or twice and the overall evaluation was that the computer was crunching wu a bit faster with HT enabled!
It works for me!
Let's crunch more WU for the science (and for our account!!!!!!!!)
with no hard feelings
Constantinos
Gravity increases significantly in Autumn, because apples fall in large numbers during that time!
I spent my afternoon on controlled trials on the HT advantage/disadvantage then extended the testing as akosf requested for his "S41.07HT" code.
This post is a slightly edited version of the one I posted in the S41.xx observations thread. I put it here, and not just a link, as I think I have direct controlled experimental confirmation of the much-doubted observation of actual science production loss under hyperthreading for the more recent akosf aps.
Summary result
While the original distributed ap and akosf improvment up to S40.04 show a hyperthreading productivity gain of about 20%, S41.07 under direct measurement shows a major hyperthreading productivity loss, with system science output under hyperthreading on 73% of that with hyperthreading disabled.
Details
As on my previous postings, all results were taken on my Gallatin (P4 EE 3.2 GHz Northwood-descended with 2 Mbyte L3 on-chip cache added, WinXP Pro SP2). The test work unit was the same, r1_0265.5_2113_S4R2a_0. The "other" thread work came from two other short Einstein WU's, the same in each case, though the starting timing offset varied by some tens of seconds.[pre]
Version HT nHT HT/nHT productivity ratio
dist 7584 4630 1.22
S40 1954 1159 1.19
S40.04 1928 1141 1.18
S41.07 2367 870 0.73
S41.07HT 1948 1145 1.18[/pre]
Comments
Yes, doubters, real science productivity loss under HT exists for some code--not just an artifact of careless comparisons of dissimilar work units or operating conditions.
A handful of cases where I've rerun these test cases suggest the timing repeatability is on the order of 1%.
My Gallatin has a slower FSB motherboard than is probably common (133 MHz FSB instead of 200).
RE: RE: Can you all help
)
this is what I read in the original post:
2 results in parallel => 1 * 2.5 hours = 2.5 hours for 2 results
2 results in series => 2 * 1 hour = 2 hours for 2 results
Mr Pernod stated it right! By
)
Mr Pernod stated it right! By using one logical cpu (by setting "at multiprocessors use maximum 1 cpu) i crunch 1 wu for 1 hour, that means 2wu in 2hours. Bu using two logical cpus (by setting "at multiprocessors use maximum 2 cpus) i crunch 2 wu in 2 and a half hours.
Moreover, I disabled HT and got 1 wu in 57 minutes and a second one in 71 minutes. So, i enabled it again!
Anyway, thank you all again for your valuable time trying to help me (and others) with this!
Cheers from Cyprus
Constantinos
Gravity increases significantly in Autumn, because apples fall in large numbers during that time!
RE: Moreover, I disabled HT
)
To conclude that some variable "A" causes result "XYZ" you have to make all known variables constant amongst test runs while simultaneously altering only one variable at a time. So if your 57-minute WU is different than the 71-minute WU how can you draw the conclusion that it was HT that caused the time difference?
I read all the Akosf related msg threads and frankly I'm befuddled by all the conclusions that people draw without properly controlling the variables. If I'm wrong, then people are at least guilty of failing to describe their experimental conditions before publishing their results. This leads people to believe in things that may or may not be true and causes them to make decisions about their operational situation. People are free to do what they want but please use a healthy dose of skepticism before "believing" what you read.
RE: RE: Moreover, I
)
I am not that ignorant about constants and variables. I will come back later with accurate results with and without HT in more detail.
P.S."People are free to do what they want but please use a healthy dose of skepticism before "believing" what you read." you said! I agree 100%!!!!!
Gravity increases significantly in Autumn, because apples fall in large numbers during that time!
RE: I am not that ignorant
)
For a given processor with HT enabled ( any other stuff constant ) you will get a benefit in total WU throughput in time for that machine - but not as a simple two for one. Hyperthreading is a ( fairly ) clever way of dealing with the overhead of task switching. For your interest, from Intel:
It's in the 'duplicating the architectural state' that the benefit arises.
I would emphasise that it's not just the processor type that is relevant, but also the application software and operating system too. This will impact on what 'Nothing But Idle Time' was ( correctly ) asserting about the 'operational situation'. As above, even Intel had to benchmark HT to get figures.....
Cheers, Mike.
( edit ) Note particularly the 'mere' 30% improvement with HT. That is enough, when doing comparisons, to overlap in performance with the usual spread of 'difficulty' of work units that we recieve from E@H....... ( meaning a couple of 'short' WU's with no-HT can favourably contrast with several 'longs' with-HT ).
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
RE: Multithreaded software
)
unfortunatly Albert is single-threaded.
instead of sharing the workload over multiple threads, albert is running as multiple instances on multi-cpu systems and those multiple programs are in fact competing with each other over the resources of the physical cpu.
this competition is fine up to a certain point of (in)efficiency of the code, but once the code get more and more efficient (optimized) the use of cpu-resources gets to a point where competition becomes contention and then the efficiency of hyperthreading drops dramaticaly.
to give you an example:
I have just enabled hyperthreading on my dual xeon 2.4 (running at 2.8/800).
I also have a dual xeon 1.6 (also running at 2.8/800)with hyperthreading disabled.
both machines are running S41.07
let's see what difference in RAC there is between these two machines in a weeks time.
if I were a betting man, my money would be on the machine with hyperthreading disabled.
RE: unfortunatly Albert is
)
Correct.
In similiar vein I'll see if I can find one of my HT machines that absolutely hasn't got anything else to distract it. If so I'll try dropping HT for a week, then compare....
For your machines though I'll bet you, err .... a 100 credit note ..... that HT is best. :-)
Cheers, Mike.
PS. Got to make it interesting! Aussies will bet on anything..... a fly crawling up a wall.... even a fly crawling down a wall. :-)
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
RE: ... I am not that
)
I certainly don't think you're ignorant and wouldn't accuse anyone of it. And you don't have to prove anything to me.
I merely used this opportunity (perhaps I could have chosen a better place) to point out to the general population that --based on what I've been reading over time-- I view all reported "experiences" with Akosf apps with a healthy dose of skepticism. It's not a question of whether they are right or wrong, it's a question of believability. Right now people are in an Akosf frenzy. I take note of what people say and keep it in the back of my mind. But unless someone specifically runs a controlled experiment and fully publishes the operational environment and controlled variables... well I think people draw conclusions that aren't supported and are mis-leading to others.
I've tried various newer Akosf apps with my HT machine and I can't find anything better than S39L in the HT mode on my Prescott P4 with i925x/LGA775 chipset and socket (notice I fail to mention the amount of L1 or L2 cache, or memory). I even tried suspending all WUs leaving only 1 WU to run
all by itself and that WU took longer than when 2 WUs were running. I could report this finding and fail to mention that my wife working on the computer at the time!
Nonetheless, I've noticed that my machine sometimes reacts just as other Prescott users have reported and sometimes it doesn't. Hence my comment about the operational environment of any particular host not being the same as any other host.
I have noticed that a few individuals are running tests using the same WU repeatedly which gives me limited confidence in the comparisons they are reporting. Still, it's difficult to get a handle on it because their particular machine and operational environment is never identical to mine.:(
Nothing but idle
)
Nothing but idle time,
let's make things simple. I run the EAH for almost one year for the good of science. From the very first days of this experience I enjoyed the "award" of the credits, as many others I imagine, as a small incentive to continue contributing to the project. Thus, this double motivation made me wonder if I can improve the performance of my PC to crunch WU in as little time as possible.
Anyway, I am not that "obsessed" nor do I have the time to conduct scientific experiments to prove which configuration is best. I disabled and enabled HT once or twice and the overall evaluation was that the computer was crunching wu a bit faster with HT enabled!
It works for me!
Let's crunch more WU for the science (and for our account!!!!!!!!)
with no hard feelings
Constantinos
Gravity increases significantly in Autumn, because apples fall in large numbers during that time!
I spent my afternoon on
)
I spent my afternoon on controlled trials on the HT advantage/disadvantage then extended the testing as akosf requested for his "S41.07HT" code.
This post is a slightly edited version of the one I posted in the S41.xx observations thread. I put it here, and not just a link, as I think I have direct controlled experimental confirmation of the much-doubted observation of actual science production loss under hyperthreading for the more recent akosf aps.
Summary result
While the original distributed ap and akosf improvment up to S40.04 show a hyperthreading productivity gain of about 20%, S41.07 under direct measurement shows a major hyperthreading productivity loss, with system science output under hyperthreading on 73% of that with hyperthreading disabled.
Details
As on my previous postings, all results were taken on my Gallatin (P4 EE 3.2 GHz Northwood-descended with 2 Mbyte L3 on-chip cache added, WinXP Pro SP2). The test work unit was the same, r1_0265.5_2113_S4R2a_0. The "other" thread work came from two other short Einstein WU's, the same in each case, though the starting timing offset varied by some tens of seconds.[pre]
Version HT nHT HT/nHT productivity ratio
dist 7584 4630 1.22
S40 1954 1159 1.19
S40.04 1928 1141 1.18
S41.07 2367 870 0.73
S41.07HT 1948 1145 1.18[/pre]
Comments
Yes, doubters, real science productivity loss under HT exists for some code--not just an artifact of careless comparisons of dissimilar work units or operating conditions.
A handful of cases where I've rerun these test cases suggest the timing repeatability is on the order of 1%.
My Gallatin has a slower FSB motherboard than is probably common (133 MHz FSB instead of 200).