One observation that no one seems to mention: On my HT machine if 2 Albert WUs are running simultaneously each runs (sometimes significantly) longer than when there is 1 Albert running simultaneously against a different WU like Rosetta. Whether it is resource contention or whatever, it does effect the run time on my machine.
Hyperthreading works by sharing unused parts of the CPU between two different processes. Since 2 copies of the same program are more likely to be attempting to use the same parts of the CPU at the same time you will get less of a gain compared to running 2 different programs. It also depends on what the programs are doing. If they need lots of data moved from memory or disk then the CPU will be waiting for that data more often, but the other thread may have it's data ready.
Setting affinity offers no help and may even slow things down on HT CPUs. This is because the OS may be more efficient at swapping if it has it's choice of swapping the task or the data. However this difference is not very large and I would recommend setting affinity on a dual xeon box, or if one setting applied to multi-CPU and HT-CPUs.
edit: The next update to the CPU scheduler (hopefully in 5.4.x) should increase the odds of running different projects on each CPU. How much depends on if it is a side effect or an actual goal.
On my A64x2 2.4gig A36 and S37a both run about the same speed, while C37 runs a minute or two slower. 1:35:30 v 1:37:00 (both numbers are approx, but since i keep a 3day queue I can see shifts due to the DCF changing as opposed to a new dataset having a small deltaETA).
MT A1500 (preXP) is taking about 2:30 with A36. I haven't tried either of the new clients because 1 or 2 percent isn't worth the hassle of plugging a keyboard, etc into it. (I normally control it via VNC but ZA won't take input that way to allow the new app net access)
Setting affinity offers no help and may even slow things down on HT CPUs. This is because the OS may be more efficient at swapping if it has it's choice of swapping the task or the data.
That's what has happend on my computer. When I've set seperate affinities to the 2 units crunching their estimated time remaining increases.
S38 isn't perfect yet. Probably a bug was implemented successfully. :)
Comment: The precision of calculation is good enough.
@akosf
I understand that C37 is what I need to install for a HT P4 3.2 GHz prescott. I just want to make sure I've done the right thing installing S38 for my P3 1.2 GHz CPU's?
I understand that C37 is what I need to install for a HT P4 3.2 GHz prescott. I just want to make sure I've done the right thing installing S38 for my P3 1.2 GHz CPU's?
Pentium III knows SSE instructions and S38 has better method than C37.
RE: One observation that
)
Hyperthreading works by sharing unused parts of the CPU between two different processes. Since 2 copies of the same program are more likely to be attempting to use the same parts of the CPU at the same time you will get less of a gain compared to running 2 different programs. It also depends on what the programs are doing. If they need lots of data moved from memory or disk then the CPU will be waiting for that data more often, but the other thread may have it's data ready.
Setting affinity offers no help and may even slow things down on HT CPUs. This is because the OS may be more efficient at swapping if it has it's choice of swapping the task or the data. However this difference is not very large and I would recommend setting affinity on a dual xeon box, or if one setting applied to multi-CPU and HT-CPUs.
edit: The next update to the CPU scheduler (hopefully in 5.4.x) should increase the odds of running different projects on each CPU. How much depends on if it is a side effect or an actual goal.
BOINC WIKI
BOINCing since 2002/12/8
On my A64x2 2.4gig A36 and
)
On my A64x2 2.4gig A36 and S37a both run about the same speed, while C37 runs a minute or two slower. 1:35:30 v 1:37:00 (both numbers are approx, but since i keep a 3day queue I can see shifts due to the DCF changing as opposed to a new dataset having a small deltaETA).
MT A1500 (preXP) is taking about 2:30 with A36. I haven't tried either of the new clients because 1 or 2 percent isn't worth the hassle of plugging a keyboard, etc into it. (I normally control it via VNC but ZA won't take input that way to allow the new app net access)
S38 is ready, test in
)
S38 is ready, test in progress. :)
RE: S38 is ready, test in
)
Great! What systems is it designed for? Thanks! :)
RE: RE: S38 is ready,
)
SSE is enough.
I'm thinking on a faster SSE2 version...
(That needs less data-type conversion and big L2 cache. :) )
RE: S38 is ready, test in
)
S38 isn't perfect yet. Probably a bug was implemented successfully. :)
Comment: The precision of calculation is good enough.
RE: Setting affinity offers
)
That's what has happend on my computer. When I've set seperate affinities to the 2 units crunching their estimated time remaining increases.
me-[at]-rescam.org
Comparison C37 and S37a on My
)
Comparison C37 and S37a on My Athlon64 2800+
C37:
Long WU's
Min time: 7023.17 sec
Max time: 7153.86 sec
Avg time: 7064.49 sec
No of results: 6
Short WU's
Min time: 2069.39 sec
Max time: 2176.25 sec
Avg time: 2083.95 sec
No of results: 19
S37a:
Long WU's
Min time: 5596.66 sec
Max time: 6918.09 sec
Avg time: 6106.02 sec
No of results: 10
Short WU's
Min time: 1624.65 sec
Max time: 2058.17 sec
Avg time: 1868.14 sec
No of results: 15
It's very interesting the defference between min and max time by using S37a
RE: RE: S38 is ready,
)
@akosf
I understand that C37 is what I need to install for a HT P4 3.2 GHz prescott. I just want to make sure I've done the right thing installing S38 for my P3 1.2 GHz CPU's?
Join the #1 Aussie Alliance on Einstein
RE: I understand that C37
)
Pentium III knows SSE instructions and S38 has better method than C37.