// DBOINCP-300: added node comment count condition in order to get Preview working ?>
Kenny Frew
Joined: 8 May 05
Posts: 4
Credit: 349401
RAC: 0
20 Jun 2005 4:37:00 UTC
Topic 189379
(moderation:
)
I've been running seti for over 2 years but i decided to put it through the boinc manager. I've got them set at equal shares but one is always preemted. Can't they run at the same time?
No, they can't run at the same time unless you have a hyperthreading processor. If they ran at the same time, they would be fighting each other for CPU cycles. Instead, the projects switch back and forth depending on your resource share.
No, they can't run at the same time unless you have a hyperthreading processor. If they ran at the same time, they would be fighting each other for CPU cycles. Instead, the projects switch back and forth depending on your resource share.
Further detail for those who are interested.
The reason BOINC switches so infrequently is to cut down the cost - each switchover costs a little time, and the more often you do it the less time is available to the processes.
Even on a single core non-hyperthreaded machine BOINC could leave the operating system to 'run them both at the same time', in which casewindows/linux/macosX/etc would manage the switching.
By doing the switchovers less often than any operating system would, BOINC can make it more efficient. Recently BOINC has also added intelligent control for things like doing a rush job when one project is close to a deadline.
[pedantry]
It goes even further:
Actually, on a fine enough timescale, they can't even only run at the same time on a hyperthreading processor, only if you have a dual-core or dual-cpu.
Inside a single core hyperthreading processor two threads 'fight each other for cpu cycles'. Hyperthreading means that the control and management of the switching between those two threads is done on the chip itself by the hardware, so that from the operating system it looks like they are running at the same time. Certainly BOINC treats a single core hyperthreaded chip as if it was a double core, and cannot track the pre-emption between the two 'hyperthreads'.
Also, hyperthreading switches between the threads many times a second, BOINC switches around every hour, so from a human timescale we can say that the hyperthreaded tasks are running 'at the same time' whereas from a human timescale we say BOINC switches back and forth.
You sound as if HyperThreading didn't make sense for BOINC - but that is not the case.
Let's say we have a Dual Xeon system and run a hypothetic WU
- running only one WU will need 4:51
- running 2 WUs will need 5:11 - but that's 2 results in that time
- running 4 WUs hyperthreaded will need 7:26 - but that's 4 in that time
So the average time when running 2 WUs is about 2:55, running 4 WUs will finish 1 WU every 1:52 in average. The time for each single WU is worse but the overall WU throughput is better if you use HT.
(Actually this is not a hypothetic example, it's the SETI Classic reference WU on a Xeon 1800 running Linux)
______
The advantage of multiple CPUs and/or HT grows when having a different type of load on the separate BOINC client threads - like one that needs mostly the FPUs, one that needs a lot of integer stuff and one that is mostly affected by memory throughput.
E.g. you get most out of your computer if you manage to set the result shares so that it runs CPDN on one (virtual) CPU, Einstein on the next, SETI on the third and LHC on the last one.
It's nearly impossible to make BOINC run projects like that but a mix of different projects still has advantages on those systems.
You can read his good explanation how hyper-threading works here[/url].
I think the original is in here somewhere or over at Seti,but I searched the Wiki instead. ;)
You sound as if HyperThreading didn't make sense for BOINC - but that is not the case.
I did not mean to suggest that, sorry.
HyperThreading makes a lot of sense, and the reason for this is because of cache stalls.
A stall is possible whenever a chip is designed with its internal clock running faster than the memory bus. The program needs a number that is not yet in the on-chip cache.
A non-hyperthreaded cpu has to wait for the numbers to come from main memory, and as we know main memory runs slower than the chip (maybe 400MHz instead of 2GHz, in round numbers) so during a stall the cpu runs at maybe 20% its potential speed.
Hyperthreading helps here. As soon as one thread stalls, the chip swaps to the other thread, and meanwhile the on-chip cache manager on chip starts grabbing the missing data for the first task. If it works perfectly, that data will arrive before the second task stalls.
Even if it is less than perfect, there will be considerable saving from whatever overlap was successful. Once both threads are stalled together you've reached the max benefit, but you are still no worse off than before and you keep whatever gain you got from the overlap.
This is all done by the chip, the programmer does not have to know about it. The really clever thing about hyperthreading is the way thay make all this invisible to the operating system, so it 'looks like' a second processor.
As you say, a mix of projects would be even better from the point of view of the best interleaving of cache stalls, but the effect is not as big as you might think. Inside most apps there is a sequence of tasks that make different demands on the processor. Even if you start two absolutely identical processes, within a few minutes one of the apps has randomly got ahead of the other, and while they remain out of phase (which is a lot of the time) they can still fit nicely round each other.
Your figures show very neatly that you do not get 2x the performance when you turn on hyperthreading, but you do get a significant gain.
ps: Terminology
I say 'stall', the industry jargon is 'fault'. I prefer to say 'stall' because some users get upset when you say their cache is producing faults...
... As you say, a mix of projects would be even better from the point of view of the best interleaving of cache stalls, but the effect is not as big as you might think. ...
On a dual CPU machine (two physical CPUs) it makes quite a difference.
When switching from two CPDN models to one CPDN model plus one Einstein WU the CPDN timesteps went from 4.6 to 3.9 :-)
The effect on my Xeon HT is not that strong - but I guess that's mostly because I cannot keep it from downloading three CPDN WUs and sometimes pausing two of them. There is an effect though, the full run times went down when I added more Einstein shares - it takes quite a while until the effect is visible though, it's a slowish Xeon with only FSB100
Would be interesting to measure the effet on a P4/HT. If anyone has done this, I would be very interested in the result.
Preempted?Can't Einstein& seti run at the same time?
)
No, they can't run at the same time unless you have a hyperthreading processor. If they ran at the same time, they would be fighting each other for CPU cycles. Instead, the projects switch back and forth depending on your resource share.
Thanks Heffed - Now I Get It
)
Thanks Heffed - Now I Get It .
RE: No, they can't run at
)
Further detail for those who are interested.
The reason BOINC switches so infrequently is to cut down the cost - each switchover costs a little time, and the more often you do it the less time is available to the processes.
Even on a single core non-hyperthreaded machine BOINC could leave the operating system to 'run them both at the same time', in which casewindows/linux/macosX/etc would manage the switching.
By doing the switchovers less often than any operating system would, BOINC can make it more efficient. Recently BOINC has also added intelligent control for things like doing a rush job when one project is close to a deadline.
[pedantry]
It goes even further:
Actually, on a fine enough timescale, they can't even only run at the same time on a hyperthreading processor, only if you have a dual-core or dual-cpu.
Inside a single core hyperthreading processor two threads 'fight each other for cpu cycles'. Hyperthreading means that the control and management of the switching between those two threads is done on the chip itself by the hardware, so that from the operating system it looks like they are running at the same time. Certainly BOINC treats a single core hyperthreaded chip as if it was a double core, and cannot track the pre-emption between the two 'hyperthreads'.
Also, hyperthreading switches between the threads many times a second, BOINC switches around every hour, so from a human timescale we can say that the hyperthreaded tasks are running 'at the same time' whereas from a human timescale we say BOINC switches back and forth.
[/pedantry]
~~gravywavy
You sound as if
)
You sound as if HyperThreading didn't make sense for BOINC - but that is not the case.
Let's say we have a Dual Xeon system and run a hypothetic WU
- running only one WU will need 4:51
- running 2 WUs will need 5:11 - but that's 2 results in that time
- running 4 WUs hyperthreaded will need 7:26 - but that's 4 in that time
So the average time when running 2 WUs is about 2:55, running 4 WUs will finish 1 WU every 1:52 in average. The time for each single WU is worse but the overall WU throughput is better if you use HT.
(Actually this is not a hypothetic example, it's the SETI Classic reference WU on a Xeon 1800 running Linux)
______
The advantage of multiple CPUs and/or HT grows when having a different type of load on the separate BOINC client threads - like one that needs mostly the FPUs, one that needs a lot of integer stuff and one that is mostly affected by memory throughput.
E.g. you get most out of your computer if you manage to set the result shares so that it runs CPDN on one (virtual) CPU, Einstein on the next, SETI on the third and LHC on the last one.
It's nearly impossible to make BOINC run projects like that but a mix of different projects still has advantages on those systems.
You can read his good
)
You can read his good explanation how hyper-threading works here[/url].
I think the original is in here somewhere or over at Seti,but I searched the Wiki instead. ;)
RE: You sound as if
)
I did not mean to suggest that, sorry.
HyperThreading makes a lot of sense, and the reason for this is because of cache stalls.
A stall is possible whenever a chip is designed with its internal clock running faster than the memory bus. The program needs a number that is not yet in the on-chip cache.
A non-hyperthreaded cpu has to wait for the numbers to come from main memory, and as we know main memory runs slower than the chip (maybe 400MHz instead of 2GHz, in round numbers) so during a stall the cpu runs at maybe 20% its potential speed.
Hyperthreading helps here. As soon as one thread stalls, the chip swaps to the other thread, and meanwhile the on-chip cache manager on chip starts grabbing the missing data for the first task. If it works perfectly, that data will arrive before the second task stalls.
Even if it is less than perfect, there will be considerable saving from whatever overlap was successful. Once both threads are stalled together you've reached the max benefit, but you are still no worse off than before and you keep whatever gain you got from the overlap.
This is all done by the chip, the programmer does not have to know about it. The really clever thing about hyperthreading is the way thay make all this invisible to the operating system, so it 'looks like' a second processor.
As you say, a mix of projects would be even better from the point of view of the best interleaving of cache stalls, but the effect is not as big as you might think. Inside most apps there is a sequence of tasks that make different demands on the processor. Even if you start two absolutely identical processes, within a few minutes one of the apps has randomly got ahead of the other, and while they remain out of phase (which is a lot of the time) they can still fit nicely round each other.
Your figures show very neatly that you do not get 2x the performance when you turn on hyperthreading, but you do get a significant gain.
ps: Terminology
I say 'stall', the industry jargon is 'fault'. I prefer to say 'stall' because some users get upset when you say their cache is producing faults...
~~gravywavy
RE: ... As you say, a mix
)
On a dual CPU machine (two physical CPUs) it makes quite a difference.
When switching from two CPDN models to one CPDN model plus one Einstein WU the CPDN timesteps went from 4.6 to 3.9 :-)
The effect on my Xeon HT is not that strong - but I guess that's mostly because I cannot keep it from downloading three CPDN WUs and sometimes pausing two of them. There is an effect though, the full run times went down when I added more Einstein shares - it takes quite a while until the effect is visible though, it's a slowish Xeon with only FSB100
Would be interesting to measure the effet on a P4/HT. If anyone has done this, I would be very interested in the result.