2) You get the best throughput if you run a mixture of ABP2 and GC.
Now that's a fascinating conclusion, with a good analogy from ethology ( study of animal behaviour ) - the greatest competition is with other members of your own species as they have the closest requirements to you. Here a given task when replicated is likely to be competing/consuming the same resources as the other threads about. Whereas dissimiliar threads have a greater chance of leaving an unused resource lying about for another to perhaps use, hence for HT mechanisms to pounce upon. It'll be a timing/average/statistical thing but the trend is there.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
2) You get the best throughput if you run a mixture of ABP2 and GC.
Now that's a fascinating conclusion, with a good analogy from ethology ( study of animal behaviour ) - the greatest competition is with other members of your own species as they have the closest requirements to you. Here a given task when replicated is likely to be competing/consuming the same resources as the other threads about. Whereas dissimiliar threads have a greater chance of leaving an unused resource lying about for another to perhaps use, hence for HT mechanisms to pounce upon. It'll be a timing/average/statistical thing but the trend is there.
Cheers, Mike.
I looked at the dissimilar aps benefit as applied to mixing varying amounts of Einstein with SETI for the aps current in November 2007 and started this thread on it.
The particular aps will matter, as will the internal architecture of the CPU on which they are being run. But as a general proposition, the odds seem to favor at least some benefit in productivity to mixing.
Sadly the current pattern of work issue and the way the BOINC software chooses what to run next both seem to reduce the mismatched ap case rate to less than it would be if simply random, at least for SETI and for Einstein.
2) You get the best throughput if you run a mixture of ABP2 and GC.
Now that's a fascinating conclusion, with a good analogy from ethology ( study of animal behaviour ) - the greatest competition is with other members of your own species as they have the closest requirements to you. Here a given task when replicated is likely to be competing/consuming the same resources as the other threads about. Whereas dissimiliar threads have a greater chance of leaving an unused resource lying about for another to perhaps use, hence for HT mechanisms to pounce upon. It'll be a timing/average/statistical thing but the trend is there.
Simplistically:
Another view is to consider (passenger) buses along their bus routes. Buses all following the same route must queue up behind each other for any hold-up. Whereas buses following different routes can avoid being forced to queue up behind a bus that might suffer a hold up on another route.
Recent CPUs have a mix of "execution units" that you can view as different bus routes in the above analogy.
It would be very interesting if Boinc could build up statistics for which mix of applications gives the best throughput... Even better if it could schedule to take advantage of that!
I'm currently not running SETI on this system except for Astropulse running on the ATI graphics card (a lunatics release from Raistmer within the last month).
I think the SETI work varies enough from unit to unit that I'd need to do something much more careful than a simple average to get a useful HT improvement number from a small sample.
I thought about this some more and decided I should be able to add CPU SETI to app_info without disturbing my stock of GPU Astropulse. But as I often do I botched something and trashed my queue. The end result was that I got a big gulp of CPU multibeam, which I am running off right now. As it happens I ran HT and nHT on several results of the exact same Angle Range, so for that AR I have a got a pretty good figure on the nHT vs. HT performance on my rig (for the case of fully stocked running SETI of that general AR).
Angle Range .427597
Hyperthreaded 8 results average CPU time 7039.03
non Hyperthreaded 4 results average CPU time 4090.13
HT productivity improvement observed for this case 16%
As of a few years ago, the computational characteristics of work at AR greater than about 1.05 were quite different than lower, and to a lesser degree the extremely low AR stuff was different yet again. I'd be little surprised if the improvement in those regimes differed from this 16% appreciably. But I suspect this improvement observed is probably close to the average result across the broad middle range of AR for my particular rig.
It appears that the SETI site had some sort of progressive failure just a few hours before the normal start of their maintenance shutdown, so I'll suspend until Friday or so, and perhaps if I'm ambitious survey my SETI queue to see if it would support comparison in VHAR or VLAR regime.
Backing away from the detail for a moment, it is well to mention two other considerations.
1. Some folks who push to the ragged edge on overclocking have observed that their rig has a lower maximum safe clock in HT than nHT. If one makes this choice the HT advantage may thus be less.
2. Though I neglected to log numbers, I've previously observed rather substantial power consumption increase going from nHT to HT on this rig--the proportionate increase in CPU power consumption considerably exceeds the increase in productivity. I don't recall whether it is so bad as to make the system level power consumption per unit work worse or not--but doubt it (this rig has a pretty substantial non-CPU overhead to be amortized).
When buying new machines I tend to split my spending between Intel and AMD machines. For regular computing tasks, AMD is a pretty good value. On the high end, Intel seems to have the edge.
When buying new machines I tend to split my spending between Intel and AMD machines. For regular computing tasks, AMD is a pretty good value. On the high end, Intel seems to have the edge.
And that probably won't change anytime soon. A chip is not a chip is not a chip in this case. Intel gets to a result of a calculation going one way and AMD gets the same result going another way, think of it as the bus routes posted earlier, which way is faster, right now AMD is. BUT AMD takes shortcuts and that is why for high end stuff Intel is better, adding 1 + 1 is easy but doing thousands of calculations over nm sized chips is much more difficult, to do it the Intel way you go from a to b to c to d etc, etc. The AMD way says hey I already know what the answer is I am skipping some intermediate steps and displaying the results, that is how they get a slower chip to produce results as fast as Intel's faster chips. The problem is the answer AMD has may not be 'accurate enough' for the high end stuff. Some Boinc projects do better with Intel and some with AMD, this is why. For my friends new non Boinc machine we are going to go with an AMD 6 core machine, if he were doing Boinc we would have to put different things into the equation and we may have ended with an Intel based system instead.
When buying new machines I tend to split my spending between Intel and AMD machines. For regular computing tasks, AMD is a pretty good value. On the high end, Intel seems to have the edge.
The problem with comparing anything in computers is the requirement that "all else being equal" and it never is.
Myself I have been partial to AMD since I first used their math co-processor which kicked Intel's ass by more than an order of magnitude in all cases. Of course Intel caught up some time in the last 20 years so call me sentimental.
Problem is two identical machines will perform differently depending upon what it does so there are test benchmarks used by the PC mags. Fine but no one uses a balanced set of applications as in a benchmark so they are meaningless for each of us. Other than the usual email and browsing and such most everything I do is number crunching from Povray to graphics processing to capturing TV, processing out the commercials (MythTV) and watching the result and pure physics boinc projects. It is all pipeline in and out which I guess barely fills L2 caches so L3 caches are not in interest space.
Another problem is that Intel is much more expensive for the same advertized specs that most people read. To minimize the price differential the makers skimp on other things. Look for slower RAM or DDR2 instead of DDR3 on Intel machines or if you feel like doing the research the supporting chipset discover they are cheaper for a good reason.
So in one sense the gamers have the right idea although for the wrong reasons. Vendors cater to them with all the performance possible at a price level without cutting corners to be competitive. For those vendors a high price is a bragging right for the buyer. However what all that does for boinc projects is problematic. For projects like Seti where the only thing of interest is an FFT on expects near optimum performance no matter what they do. FFTs have been around for over 40 years and it is hard to pick a stupid way to do them.
The wrong reasons are that the bang for the buck is not linear with machine price. Right now I can pay more for a hot graphics card than for a refurbed quad core AMD machine. There is no boinc v. graphics card table of prices. Clearly the less that is pure number crunching in a project the less the graphics card contributes. For Seti there should be a major contribution but unless I am missing something I do not see it can add much at all to protein folding. (Yes, I expect someone to tell me what I am missing.;)
I sort of get the idea that a basic $50 graphics card added to any $4-500 machine will get most of the performance of a high end gaming machine on a core for core basis. I have been trying to benchmark this but other problems have cropped up so I have not been able to benchmark the pre-graphics card machine yet.
Thanks for the hard
)
Thanks for the hard numbers!
Agreed.
MrS
Scanning for our furry friends since Jan 2002
RE: 2) You get the best
)
Now that's a fascinating conclusion, with a good analogy from ethology ( study of animal behaviour ) - the greatest competition is with other members of your own species as they have the closest requirements to you. Here a given task when replicated is likely to be competing/consuming the same resources as the other threads about. Whereas dissimiliar threads have a greater chance of leaving an unused resource lying about for another to perhaps use, hence for HT mechanisms to pounce upon. It'll be a timing/average/statistical thing but the trend is there.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
RE: RE: 2) You get the
)
I looked at the dissimilar aps benefit as applied to mixing varying amounts of Einstein with SETI for the aps current in November 2007 and started this thread on it.
The particular aps will matter, as will the internal architecture of the CPU on which they are being run. But as a general proposition, the odds seem to favor at least some benefit in productivity to mixing.
Sadly the current pattern of work issue and the way the BOINC software chooses what to run next both seem to reduce the mismatched ap case rate to less than it would be if simply random, at least for SETI and for Einstein.
RE: RE: 2) You get the
)
Simplistically:
Another view is to consider (passenger) buses along their bus routes. Buses all following the same route must queue up behind each other for any hold-up. Whereas buses following different routes can avoid being forced to queue up behind a bus that might suffer a hold up on another route.
Recent CPUs have a mix of "execution units" that you can view as different bus routes in the above analogy.
It would be very interesting if Boinc could build up statistics for which mix of applications gives the best throughput... Even better if it could schedule to take advantage of that!
Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
RE: I'm currently not
)
I thought about this some more and decided I should be able to add CPU SETI to app_info without disturbing my stock of GPU Astropulse. But as I often do I botched something and trashed my queue. The end result was that I got a big gulp of CPU multibeam, which I am running off right now. As it happens I ran HT and nHT on several results of the exact same Angle Range, so for that AR I have a got a pretty good figure on the nHT vs. HT performance on my rig (for the case of fully stocked running SETI of that general AR).
Angle Range .427597
Hyperthreaded 8 results average CPU time 7039.03
non Hyperthreaded 4 results average CPU time 4090.13
HT productivity improvement observed for this case 16%
As of a few years ago, the computational characteristics of work at AR greater than about 1.05 were quite different than lower, and to a lesser degree the extremely low AR stuff was different yet again. I'd be little surprised if the improvement in those regimes differed from this 16% appreciably. But I suspect this improvement observed is probably close to the average result across the broad middle range of AR for my particular rig.
It appears that the SETI site had some sort of progressive failure just a few hours before the normal start of their maintenance shutdown, so I'll suspend until Friday or so, and perhaps if I'm ambitious survey my SETI queue to see if it would support comparison in VHAR or VLAR regime.
Backing away from the detail for a moment, it is well to mention two other considerations.
1. Some folks who push to the ragged edge on overclocking have observed that their rig has a lower maximum safe clock in HT than nHT. If one makes this choice the HT advantage may thus be less.
2. Though I neglected to log numbers, I've previously observed rather substantial power consumption increase going from nHT to HT on this rig--the proportionate increase in CPU power consumption considerably exceeds the increase in productivity. I don't recall whether it is so bad as to make the system level power consumption per unit work worse or not--but doubt it (this rig has a pretty substantial non-CPU overhead to be amortized).
When buying new machines I
)
When buying new machines I tend to split my spending between Intel and AMD machines. For regular computing tasks, AMD is a pretty good value. On the high end, Intel seems to have the edge.
RE: When buying new
)
And that probably won't change anytime soon. A chip is not a chip is not a chip in this case. Intel gets to a result of a calculation going one way and AMD gets the same result going another way, think of it as the bus routes posted earlier, which way is faster, right now AMD is. BUT AMD takes shortcuts and that is why for high end stuff Intel is better, adding 1 + 1 is easy but doing thousands of calculations over nm sized chips is much more difficult, to do it the Intel way you go from a to b to c to d etc, etc. The AMD way says hey I already know what the answer is I am skipping some intermediate steps and displaying the results, that is how they get a slower chip to produce results as fast as Intel's faster chips. The problem is the answer AMD has may not be 'accurate enough' for the high end stuff. Some Boinc projects do better with Intel and some with AMD, this is why. For my friends new non Boinc machine we are going to go with an AMD 6 core machine, if he were doing Boinc we would have to put different things into the equation and we may have ended with an Intel based system instead.
Host # 2 now - AuthenticAMD
)
Host # 2 now - AuthenticAMD Six-Core AMD Opteron(tm) Processor 8431 [Family 16 Model 8 Stepping 0].
:)
RE: Host # 2 now -
)
If this host would stop processing ABP WUs, it would have an even higher RAC. At least more than 20,000. :)
25,190.41 sec/GW tasks(~250 Cr)
32,871.80 sec/ABB tasks(200 Cr)
That's the reason why all my AMD hosts don't do ABP WUs.
RE: When buying new
)
The problem with comparing anything in computers is the requirement that "all else being equal" and it never is.
Myself I have been partial to AMD since I first used their math co-processor which kicked Intel's ass by more than an order of magnitude in all cases. Of course Intel caught up some time in the last 20 years so call me sentimental.
Problem is two identical machines will perform differently depending upon what it does so there are test benchmarks used by the PC mags. Fine but no one uses a balanced set of applications as in a benchmark so they are meaningless for each of us. Other than the usual email and browsing and such most everything I do is number crunching from Povray to graphics processing to capturing TV, processing out the commercials (MythTV) and watching the result and pure physics boinc projects. It is all pipeline in and out which I guess barely fills L2 caches so L3 caches are not in interest space.
Another problem is that Intel is much more expensive for the same advertized specs that most people read. To minimize the price differential the makers skimp on other things. Look for slower RAM or DDR2 instead of DDR3 on Intel machines or if you feel like doing the research the supporting chipset discover they are cheaper for a good reason.
So in one sense the gamers have the right idea although for the wrong reasons. Vendors cater to them with all the performance possible at a price level without cutting corners to be competitive. For those vendors a high price is a bragging right for the buyer. However what all that does for boinc projects is problematic. For projects like Seti where the only thing of interest is an FFT on expects near optimum performance no matter what they do. FFTs have been around for over 40 years and it is hard to pick a stupid way to do them.
The wrong reasons are that the bang for the buck is not linear with machine price. Right now I can pay more for a hot graphics card than for a refurbed quad core AMD machine. There is no boinc v. graphics card table of prices. Clearly the less that is pure number crunching in a project the less the graphics card contributes. For Seti there should be a major contribution but unless I am missing something I do not see it can add much at all to protein folding. (Yes, I expect someone to tell me what I am missing.;)
I sort of get the idea that a basic $50 graphics card added to any $4-500 machine will get most of the performance of a high end gaming machine on a core for core basis. I have been trying to benchmark this but other problems have cropped up so I have not been able to benchmark the pre-graphics card machine yet.