Times (Elapsed / CPU) for BRP5/6/6-Beta on various CPU/GPU combos - RESULTS ONLY

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,551
Credit: 79,147,158,313
RAC: 66,193,244

HOST 07 -

HOST 07 - g640-02

[pre]
CPU: Pentium Dual Core G640 @ 2.8GHz (Sandy Bridge architecture)
Cores/Threads: 2 / 2
Motherboard: Asrock H61M-VS
PCIe slot x16 Version 2 (single slot)
1st GPU: NVIDIA GTX550Ti 1GB (3x)
2nd GPU: -
3rd GPU: -
RAM: 2 x 2GB DDR3 1333MHz
Concurrency: 3 @ 0.2 CPUs + 0.33 GPUs
CPU Tasks: 2 x FGRP4
Free CPU cores: 0
OS: PCLinuxOS 2014.04 64bit - kernel 3.12.16-pclos3
Driver: NVIDIA 331.49
BOINC Version: 7.2.42 64bit
RAC improvement: Late Feb, RAC=~33,000. As at 24-Mar-15, RAC=~44,700 (on my hosts, RACs are auto-recorded and saved every 8 hours).

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 V1.39 27,021 27,179 27,437 33 4,346 4,393 4,809 32 388 Long term stats prior to arrival of BRP6.
BRP6 V1.39 34,640 34,880 35,828 185 5,893 6,160 6,716 130 32 Limited sample size - joined beta test quickly.
BRP6 early betas - - - - - - - - - All results discarded, see note below.
BRP6-beta V1.52 21,736 25,036 29,262 1,133 583 652 805 31 100 Contiguous tasks all well after start of beta 1.52.

Note: All statistics were done in LibreOffice after importing directly from job_log_einstein.phys.uwm.edu.txt found in the BOINC data directory.
The correct data points could be identified by selecting on the 'nm' field - BRP5 start with 'PB...' and BRP6 with 'PM...'. The transition
from BRP6-1.39 to 1st beta is very clear but the transition from early betas to 1.52 is impossible to judge with confidence. I decided
to exclude all data from the end of 1.39 up until 1.52 was firmly established. At that time, I selected the next 100 contiguous results.
If the job log actually included the app version used, it would be an easier process to extract results for each app version.
[/pre]

COMMENTS

  • * The long term statistics for BRP5 are really very stable - low standard deviations for both CPU and elapsed times.
    * The sample average run time for BRP6 1.39 is just over 28% larger than the value for BRP5 - a slight improvement on the predicted 33%.
    * The sample average run time for BRP6 1.52 is only 0.72 of that for 1.39 - a 28% reduction in elapsed time.
    * The 1.52 app gives a surprisingly large std deviation for run time with little variation in CPU time - see attached chart below.
    * I'm wondering if this variability is related to using a GPU concurrency of 3. I'll check this next on a similar host running 2x.

====####====

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,551
Credit: 79,147,158,313
RAC: 66,193,244

HOST 08 -

HOST 08 - g645-01

[pre]
CPU: Pentium Dual Core G645 @ 2.9GHz (Sandy Bridge architecture)
Cores/Threads: 2 / 2
Motherboard: Asrock H61M-HVS
PCIe slot x16 Version 2 (single slot)
1st GPU: NVIDIA GTX550Ti 1GB (2x)
2nd GPU: -
3rd GPU: -
RAM: 2 x 4GB DDR3 1333MHz
Concurrency: 2 @ 0.2 CPUs + 0.50 GPUs
CPU Tasks: 2 x FGRP4
Free CPU cores: 0
OS: PCLinuxOS 2014.04 64bit - kernel 3.12.16-pclos3
Driver: NVIDIA 331.49
BOINC Version: 7.2.42 64bit
RAC improvement: Mid Feb, RAC=~36,000 (enhanced by 3x - see Note). As at 25-Mar-15, RAC=~42,600 (on this host, RACs are auto-recorded and saved twice per day).

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 V1.39 24,985 25,086 25,840 117 - - - - 148 Data recorded (@ 3x) prior to host rebuild - see Note.
BRP6 V1.39 - - - - - - - - - This app not used on this host.
BRP6 early betas - - - - - - - - - The only beta app used was 1.52-beta.
BRP6-beta V1.52 15,646 17,883 22,012 1,060 501 563 764 37 148 Contiguous tasks all well after start of 1.52-beta.

Note: All statistics were done in LibreOffice after importing directly from job_log_einstein.phys.uwm.edu.txt found in the BOINC data directory.
At the time the beta test was announced, this host was running an old (2012-2013) 32bit Linux with old 310.xx NVIDIA drivers. It had been
left that way deliberately, because it was returning a 10% higher RAC compared to other similar hosts that had been upgraded. Before joining
the beta test, the host was upgraded as listed above. It was a complete rebuild - getting rid of old Windows and SUSE partitions and wiping
the hard disk. I restored the PCLinuxOS /home partition and upgraded BOINC and the science apps to 64 bit versions before reinstalling the
64 bit setup as listed above. The job log was preserved through all this. The BRP5 1.39 data was at a concurrency of 3x. All 1.52-beta data
is at a concurrency of 2x. For the 1.39 data, I chose the same sample size (148) and these are contiguous points just before the rebuild.
[/pre]

COMMENTS

  • * The long term run time average for BRP5 shows the ~10% advantage of the old OS / driver - check against the 1.39 data for Host 07 in previous message.
    * The BRP6-beta results still show considerable variability (SD=1,060) even at x2 concurrency.
    * The average elapsed time

per task is 8,942 secs compared to 8,345 secs per task when done at x3 (see Host 07). That's nearly 7% improvement for x3 over 2x.
* There is a very surprising regular cycle in elapsed time - see attached 2nd chart below. If you can explain that please join the discussion thread :-).
* Changing the concurrency to 2x has lowered the performance by ~7% and changed the variability remarkably without seemingly improving it.

====####====

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,551
Credit: 79,147,158,313
RAC: 66,193,244

HOST 09 -

HOST 09 - g540-03

[pre]
CPU: Celeron Dual Core G540 @ 2.5GHz (Sandy Bridge architecture)
Cores/Threads: 2 / 2
Motherboard: Asrock H61M-HVS
PCIe slot x16 Version 2 (single slot)
1st GPU: NVIDIA GTX650 1GB (2x)
2nd GPU: -
3rd GPU: -
RAM: 2 x 2GB DDR3 1333MHz
Concurrency: 2 @ 0.2 CPUs + 0.50 GPUs
CPU Tasks: 2 x FGRP4
Free CPU cores: 0
OS: PCLinuxOS 2014.04 64bit - kernel 3.12.16-pclos3
Driver: NVIDIA 331.49
BOINC Version: 7.2.42 64bit
RAC (approx): Mid Feb, RAC=~31,700. As at 26-Mar-15, RAC=~39,300 (on this host, RACs are auto-logged every 8 hours).

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 V1.39 19,839 20,234 20,315 44 1,932 2,081 2,122 28 300 Contiguous data recorded prior to beta test start.
BRP6 V1.39 - - - - - - - - - Reliable data not available.
BRP6 early betas - - - - - - - - - Reliable data not available.
BRP6-beta V1.52 23,209 23,441 24,226 103 682 739 1,645 99 111 Contiguous tasks all well after start of 1.52-beta.

Average run time per task (BRP5) = 20234/2 = 10,117 seconds. Average run time per task (beta-1.52) = 23441/2 = 11,721 seconds.

Note: All statistics were done in LibreOffice after importing directly from job_log_einstein.phys.uwm.edu.txt found in the BOINC data directory.
Due to limitations with what is recorded in the job log, it is not possible to distinguish reliably between BRP6-1.39 and the following early betas.
[/pre]

COMMENTS

  • * Using a factor of say 1.3 for the expected BRP5 to BRP6 transition, the average run time for BRP6-1.39 (non-beta) could be predicted as 20,234x1.3=~26,300secs. This gives a speedup for v1.52 in the vicinity of 11%
    * The variability of elapsed time is quite low (SD=103) with only a very small number of outliers - see charts below.
    * The 11% estimated speedup for v1.52 is rather low when compared to the ~28% value seen with a GTX550Ti in

Host 07.
* Perhaps low to mid-range Kepler series GPUs like the GTX650 don't respond quite so well to the optimisations in beta-1.52.

====####====

HOST 10 - g540-05

[pre]
CPU: Celeron Dual Core G540 @ 2.5GHz (Sandy Bridge architecture)
Cores/Threads: 2 / 2
Motherboard: Intel DH61WW
PCIe slot x16 Version 2 (single slot)
1st GPU: NVIDIA GTX650 1GB (3x)
2nd GPU: -
3rd GPU: -
RAM: 2 x 2GB DDR3 1333MHz
Concurrency: 3 @ 0.2 CPUs + 0.33 GPUs
CPU Tasks: 2 x FGRP4
Free CPU cores: 0
OS: PCLinuxOS 2014.04 64bit - kernel 3.12.16-pclos3
Driver: NVIDIA 331.49
BOINC Version: 7.2.42 64bit
RAC (approx): Mid Feb, RAC=~32,300. As at 26-Mar-15, RAC=~36,100 (on this host, RACs are auto-logged every 8 hours).

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 V1.39 27,532 28,285 28,734 453 2,069 2,112 2,145 14 300 Contiguous data recorded prior to beta test start.
BRP6 V1.39 - - - - - - - - - Reliable data not available.
BRP6 early betas - - - - - - - - - Reliable data not available.
BRP6-beta V1.52 32,197 35,107 38,402 1,900 698 744 960 34 100 Contiguous tasks all well after start of 1.52-beta.

Average run time per task (BRP5) = 28285/3 = 9,428 seconds. Average run time per task (beta-1.52) = 35107/3 = 11,702 seconds.

Note: All statistics were done in LibreOffice after importing directly from job_log_einstein.phys.uwm.edu.txt found in the BOINC data directory.
Due to limitations with what is recorded in the job log, it is not possible to distinguish reliably between BRP6-1.39 and the following early betas.

[/pre]

COMMENTS

  • * Using a factor of say 1.3 for the expected BRP5 to BRP6 transition, the average run time for BRP6-1.39 (non-beta) could be predicted as 28,285x1.3=~36,770sec. For beta-1.52 the measured value is 35,107sec - less than 5% apparent gain for the optimised app.
    * The charts below show that there are consequences for running 3x on a GTX650 - the elapsed times are divided into two distinct clusters, separated by a considerable time gap. For the old BRP5-1.39 app, the gap is around 1000secs (3rd chart), and around 4000secs for the beta-1.52 app (2nd chart). In each case, there are about twice as many 'slow' results as 'fast' results, and grouped in such a fashion as to suggest that one of the concurrent tasks is always running faster than the other two in any trio. This behaviour seems specific to GTX650s - doesn't really seem to be apparent in the GTX550Ti results of Host 07/08 in the two previous messages.
    * It would seem that GTX650s are adversely affected by running 3x. There was a performance gain of around 700sec per task when running 3x with BRP5-1.39 but this has not translated to beta-1.52. The average elapsed time per task for beta-1.52 is 11,702 seconds - only a 19 second advantage over the 2x value of 11,721 seconds.

====####====

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,551
Credit: 79,147,158,313
RAC: 66,193,244

HOST 11 -

HOST 11 - phenom-07

[pre]
CPU: Phenom II x2 555 Dual Core @ 2.6GHz
Motherboard: Asrock N68-VS3 UCC
PCIe slot x16 Version 1.x (single slot)
1st GPU: NVIDIA GTX650Ti 1GB (3x for both 1.39 apps and 2x for 1.52 - see Note below data)
2nd GPU: -
3rd GPU: -
RAM: 2 x 2GB DDR3 1333MHz
Concurrency: 3 @ 0.2 CPUs + 0.33 GPUs for 1.39 BRP5/6 and 2 @ 0.2 CPUs + 0.50 GPUs for BRP6-beta 1.52
CPU Tasks: 2 x FGRP4
Free CPU cores: 0
OS: PCLinuxOS 2014.04 64bit - kernel 3.12.16-pclos3
Driver: NVIDIA 331.49
BOINC Version: 7.2.42 64bit
RAC (approx): Mid Feb, RAC=~29,000. As at 26-Mar-15, RAC=~41,000 (on this host, RACs are auto-logged every 8 hours).

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 V1.39 28,883 24,025 24,173 50 4,412 4,479 4,834 36 300 Contiguous data prior to beta test @ 3x concurrency.
BRP6 V1.39 30,445 30,571 30,759 84 6,134 6,194 6,273 35 30 BRP6 1.39 still @ 3x concurrency
BRP6 early betas - - - - - - - - - Reliable data not available - concurrency -> 2x here.
BRP6-beta V1.52 17,250 18,657 20,814 319 673 753 969 39 153 Contiguous tasks (@ 2x) well after the change.

Average run time/task (BRP5) = 24025/3 = 8,008 secs. Average run time/task (BRP6-1.39) = 30571/3 = 10,190 seconds - ie 27% longer than for BRP5.
Average run time per task (BRP6-1.52) = 18657/2 = 9,329 seconds - ie 8.5% improvement over BRP6-1.39

Note: All statistics were done in LibreOffice after importing directly from job_log_einstein.phys.uwm.edu.txt found in the BOINC data directory.
Due to concerns about hardware stability when the beta test started, the concurrency was reduced to 2x prior to starting 1.52 tasks.
[/pre]

COMMENTS

  • * This host was a little flaky when running the 1.39 apps @ 3x so I decided to change to 2x for the beta test. The issues have stopped, resulting in an increased uptime. This is partly responsible for the improved RAC, on top of the benefit from the optimised app.
    * The 1.52 run times do show some variability not evident for either of the 1.39 versions. This variability is less than what was seen with a GTX650 CPU earlier.
    * Previously, this host (with a 650Ti) struggled to produce 30K RAC when all my GTX650s could deliver around 30K to 33K. The increase to 41K is very pleasing and at least has overtaken that of most GTX650s. I think the motherboard has been part of the problem.

====####====

HOST 12 - phenom-02 Updated 6-Apr-15 with data @ x2 concurrency

[pre]
CPU: Phenom II x4 B55 Quad Core @ 3.2GHz
Cores/Threads: 4 / 4
Motherboard: Asrock 880GM-LE
PCIe slot x16 Version 1.x (single slot)
1st GPU: AMD HD7770 1GB (3x)
2nd GPU: -
3rd GPU: -
RAM: 2 x 4GB DDR3 1333MHz
Concurrency: 3 @ 0.67 CPUs + 0.33 GPUs
CPU Tasks: 2 x FGRP4
Free CPU cores: 2
OS: PCLinuxOS 2014.04 64bit - kernel 3.12.16-pclos3
Driver: fglrx driver and OpenCL libs from Catalyst 13.12
BOINC Version: 7.2.42 64bit
RAC (approx): Mid Feb, RAC=~38,000. As at 26-Mar-15, RAC=~46,100 (on this host, RACs are auto-logged every 8 hours).

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 V1.39 21,458 22,911 23,986 330 2,133 2,192 2,331 22 300 Contiguous data recorded prior to beta test start.
BRP6 V1.39 28,444 28,952 29,663 296 2,777 2,803 2,836 19 16 Small sample size only.
BRP6 early betas - - - - - - - - - Reliable data not available.
BRP6-beta V1.52 27,485 27,996 28,377 159 1,403 1,524 1,857 59 153 Contiguous tasks all well after start of 1.52-beta.
BRP6-beta V1.52 18,911 19,406 19,813 229 1,341 1,388 1,473 26 60 Contiguous tasks after change to x2 concurrency.

Average run time/task (BRP5) = 22911/3 = 7,637 secs. Average run time/task (BRP6-1.39) = 29663/3 = 9,651 seconds - ie 26% longer than for BRP5.
Average run time per task (BRP6-1.52) = 27996/3 = 9,332 seconds - ie 3.3% improvement over BRP6-1.39

Note: All statistics were done in LibreOffice after importing directly from job_log_einstein.phys.uwm.edu.txt found in the BOINC data directory.

[/pre]

The only change to the above data table is the addition of the final line showing results for 60 tasks @ x2 concurrency. This gives:-
Average run time per task (v1.52 @ x2) = 19406/2 = 9,703 seconds - ie almost 4% increase in average run time compared to running 3x.

COMMENTS

  • * The sample size for the BRP6-1.39 results is quite small, so the run time average (only 26% larger than for BRP5) may be artificially too low.
    * There don't seem to be any problems with running 3x on the HD7770 GPU and the run time variability is quite good - better than for the BRP5 app on this host.
    * The further improvement from BRP6-1.39 to beta-1.52 is only 3.3%, or perhaps a bit higher if the 1.39 average is artificially low. This was disappointing.
    * The low standard deviation of 1.52 run time even at 3x and the RAC improvement from 38K to 46K are both pleasing.
    *

The x2 concurrency results added above show there is a 3.8% gain by running at 3x.

====####====

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,551
Credit: 79,147,158,313
RAC: 66,193,244

HOST 13 -

HOST 13 - phenom-08

[pre]
CPU: Phenom II x4 B55 Quad Core @ 3.0GHz
Motherboard: Asrock N68-VS3 FX
PCIe slot x16 Version 2 (single slot)
1st GPU: AMD HD7850 2GB (4x)
2nd GPU: -
3rd GPU: -
RAM: 2 x 2GB DDR3 1333MHz
Concurrency: 4 @ 0.5 CPUs + 0.25 GPUs
CPU Tasks: 2 x FGRP4
Free CPU cores: 2
OS: PCLinuxOS 2014.04 64bit - kernel 3.12.16-pclos3
Driver: fglrx driver and OpenCL libs from Catalyst 13.12
BOINC Version: 7.2.42 64bit
RAC (approx): Mid Feb, RAC=~65,000. As at 28-Mar-15, RAC=~91,000 (on this host, RACs are auto-logged every 8 hours).

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 V1.39 19,072 19,361 19,706 100 1,811 2,089 2,204 94 300 Contiguous data prior to BRP6 start.
BRP6 V1.41 24,865 26,049 26,581 469 2,713 2,747 2,805 19 30 Low number of results
BRP6 early betas - - - - - - - - - Reliable data not available.
BRP6-beta V1.52 17,408 17,859 18,467 181 1,273 1,299 1,399 19 200 Contiguous tasks after the change to 1.52-beta.

Average run time/task (BRP5) = 19361/4 = 4,840 secs. Average run time/task (BRP6-1.41) = 26049/4 = 6,512 seconds - ie 34% longer than for BRP5.
Average run time per task (BRP6-1.52) = 17859/4 = 4,465 seconds - ie 31.5% improvement over BRP6-1.41

Note: All statistics were done in LibreOffice after importing directly from job_log_einstein.phys.uwm.edu.txt found in the BOINC data directory.
[/pre]

COMMENTS

  • * This host has a PCIe v2 slot but GPU tasks were always slower than for similar hosts with a different motherboard (but still v2). A weak motherboard??
    * The standard BRP6 results were 34% slower than for BRP5, so it looked like the slow trend would continue. This figure is usually 26-28%.
    * The BRP6-beta results show a big improvement (31.5% faster than non-beta) and the RAC (now over 90K) has responded accordingly.
    * The beta-1.52 tasks are completing faster than the previous BRP5 tasks, despite the size difference, so the beta app has really solved the issue.

====####====

HOST 03 - fx_6300-01 (This host was reported in the opening post of this thread - much more detail given here)

[pre]
CPU: AMD FX 6300 Hexa-Core @ 3.5GHz
Cores/Threads: 6 / 6 (These are integer cores - there are only 3 FPUs, each one shared between two integer cores)
Motherboard: Asrock N68-VS3 FX
PCIe slot x16 Version 2 (single slot)
1st GPU: AMD HD7850 2GB (4x)
2nd GPU: -
3rd GPU: -
RAM: 2 x 4GB DDR3 1333MHz
Concurrency: 4 @ 0.76 CPUs + 0.25 GPUs
CPU Tasks: 3 x FGRP4
Free CPU cores: 3
OS: PCLinuxOS 2014.04 64bit - kernel 3.12.16-pclos3
Driver: fglrx driver and OpenCL libs from Catalyst 13.12
BOINC Version: 7.2.42 64bit
RAC (approx): Mid Feb, RAC=~65,000. As at 28-Mar-15, RAC=~87,000 (on this host, RACs are auto-logged every 8 hours).

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 V1.39 18,586 18,942 19,107 113 1,814 2,013 2,076 43 300 Contiguous data recorded prior to beta test start.
BRP6 V1.41 24,636 25,633 25,991 325 2,601 2,640 2,697 20 38 Small sample size only.
BRP6 early betas - - - - - - - - - Reliable data not available.
BRP6-beta V1.52 18,045 18,133 18,222 42 1,285 1,324 1,421 22 200 Contiguous tasks after start of 1.52-beta.

Average run time/task (BRP5) = 18942/4 = 4,736 secs. Average run time/task (BRP6-1.41) = 25633/4 = 6,408 seconds - ie 26% longer than for BRP5.
Average run time per task (BRP6-1.52) = 18133/4 = 4,533 seconds - ie 29% improvement over BRP6-1.41

Note: All statistics were done in LibreOffice after importing directly from job_log_einstein.phys.uwm.edu.txt found in the BOINC data directory.

[/pre]

COMMENTS

  • * This host has the same motherboard and GPU and twice the RAM of Host 13 reported above. The only difference is the CPU - FX 6300 x6 vs the older Phenom II x4. With the beta app, the older X4 CPU is actually slightly outperforming it.
    * For both of these hosts, the GPU performance is comparable with that of far more modern hosts. This means that full GPU performance (at least for HD7850s) seems to be relatively independent of the CPU architecture - a real incentive for people wanting to add a fairly cheap GPU to an older host for a big credit boost.
    * It's quite pleasing that the beta-1.52 results are showing little variability for a decent sample size of 200.
    * The BRP6-beta app is again proving to be vital for good performance on older or under-performing AMD GPU hosts like these.

====####====

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,551
Credit: 79,147,158,313
RAC: 66,193,244

HOST 14 -

HOST 14 - q8400-07

[pre]
CPU: Intel Q8400 Core 2 Quad (2.66GHz) @ 3.12GHz
Motherboard: Asus P5KPL-AM SE
PCIe slot x16 Version 1.x (single slot)
1st GPU: AMD HD7850 2GB (4x)
2nd GPU: -
3rd GPU: -
RAM: 2 x 2GB DDR2 800MHz
Concurrency: 4 @ 0.5 CPUs + 0.25 GPUs
CPU Tasks: 2 x FGRP4
Free CPU cores: 2
OS: PCLinuxOS 2014.04 64bit - kernel 3.12.16-pclos3
Driver: fglrx driver and OpenCL libs from Catalyst 13.12
BOINC Version: 7.2.42 64bit
RAC (approx): Mid Feb, RAC=~45,000. As at 30-Mar-15, RAC=~87,000 (on this host, RACs are auto-logged every 8 hours).

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 V1.39 27,856 28,163 28,322 74 3,663 3,784 3,966 55 300 Contiguous data prior to BRP6 start.
BRP6 V1.41 38,771 39,029 39,221 213 4,989 5,330 5,556 262 4 Only 4 were completed before switching
BRP6 early betas - - - - - - - - - Reliable data not available.
BRP6-beta V1.52 17,611 18,273 18,744 138 1,324 1,502 1,771 56 300 Contiguous tasks after the change to 1.52-beta.

Average run time/task (BRP5) = 28163/4 = 7,041 secs. Average run time/task (BRP6-1.41) = 39029/4 = 9,772 seconds - ie 39% longer than for BRP5.
Average run time per task (BRP6-1.52) = 18273/4 = 4,568 seconds - ie 53% improvement over BRP6-1.41 - unreliable because of very low sample size.

Note: All statistics were done in LibreOffice after importing directly from job_log_einstein.phys.uwm.edu.txt found in the BOINC data directory.
The 4 non-optimised BRP6 results are insufficient for useful comparison but I decided to list them since I had them. The run times appear to be too high.
[/pre]

COMMENTS

  • * This host suffers from two factors - a PCIe v1.x slot and slow DDR2 800MHz RAM. The standard apps for both BRP5 and BRP6 perform quite poorly.
    * With the beta-1.52 optimisations, both these impediments seem to be largely overcome. The average run time per task is virtually comparable with values for much more modern hosts with the latest CPU architecture and RAM types.
    * The improvement is so spectacular that the beta-1.52 app is now 35% faster than the BRP5 1.39 app, despite the 30% larger task size.
    * To put things in context for the beta-1.52 app, compare the average run time for this host (4,568 sec) with that of the next host reported below, with a Haswell refresh CPU (4,505 sec) - only 63 sec difference.

====####====

HOST 04 - g3258-01 (This host was reported in the opening post of this thread - much more detail given here)

[pre]
CPU: Intel G3258 Pentium Dual Core 3.2GHz @ 3.9GHz (Haswell Refresh)
Cores/Threads: 2 / 2
Motherboard: Asrock H81M-DGS
PCIe slot x16 Version 2 (single slot)
1st GPU: AMD HD7850 2GB (4x)
2nd GPU: -
3rd GPU: -
RAM: 2 x 4GB DDR3 1333MHz
Concurrency: 4 @ 0.45 CPUs + 0.25 GPUs
CPU Tasks: 1 x FGRP4
Free CPU cores: 1
OS: PCLinuxOS 2014.04 64bit - kernel 3.12.16-pclos3
Driver: fglrx driver and OpenCL libs from Catalyst 13.12
BOINC Version: 7.2.42 64bit
RAC (approx): Mid Feb, RAC=~73,000. As at 30-Mar-15, RAC=~89,000 (on this host, RACs are auto-logged every 8 hours).

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 V1.39 15,446 15,601 15,756 64 1,141 1,176 1,317 21 300 Contiguous data recorded prior to beta test start.
BRP6 V1.41 19,926 20,103 20,806 192 1,510 1,533 1,571 15 27 Small sample size only.
BRP6 early betas - - - - - - - - - Reliable data not available.
BRP6-beta V1.52 17,746 18,018 18,760 140 589 618 662 13 300 Contiguous tasks after start of 1.52-beta.

Average run time/task (BRP5) = 15601/4 = 3,900 secs. Average run time/task (BRP6-1.41) = 20103/4 = 5,026 seconds - ie 29% longer than for BRP5.
Average run time per task (BRP6-1.52) = 18018/4 = 4,505 seconds - ie 10.4% improvement over BRP6-1.41

Note: All statistics were done in LibreOffice after importing directly from job_log_einstein.phys.uwm.edu.txt found in the BOINC data directory.

[/pre]

COMMENTS

  • * This host was able to cope very well with only 1 free CPU core for 4 concurrent GPU tasks when BRP5 was running. With the big CPU component reduction for beta-1.52, this will continue to be the case.
    * The average run time per task of only 4,505 secs and the low 618 secs CPU component are both very pleasing values to find.
    * The BRP6-beta app is again showing quite low variability of results over a significant sample size.

====####====

Cheers,
Gary.

FourOh
FourOh
Joined: 25 Jan 13
Posts: 1
Credit: 100,004,609
RAC: 0

HOST 1 -

HOST 1 - http://einsteinathome.org/host/11091174

[pre]
CPU: 2x Xeon X5650
Cores/Threads: 12/24 HT Enabled
Motherboard: Dell T5500
PCIe slot x16 PCIe V2.0
1st GPU: R9 280X 3Gb 1000Mhz / 1500Mhz
2nd GPU: -
3rd GPU: -
RAM: 9 x 4Gb 1333Mhz DDR3
Concurrency: 3 @ 1 CPUs + 0.33 GPUs
CPU Tasks: 21 x Various BOINC Projects
Free CPU cores: None
OS: Win 7 64-bit
Driver: AMD Catalyst 14.9
BOINC Version: 7.4.36 64bit

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 - - - - - - - - -
BRP6 (non-beta) - - - - - - - - -
BRP6-beta <1.52 - - - - - - - - -
BRP6-beta 1.52 7,471 7,975 8,505 242 3,910 4,372 4,816 208 100

[/pre]
COMMENTS

  • * Core GPU Clock Adjusted between 950Mhz - 1020Mhz to control heat.
    * Average Production BRP6-Beta 143k/day

====####====

HOST 2 - http://einsteinathome.org/host/11722552
[pre]
CPU: AMD A8-3870K
Cores/Threads: 4
Motherboard: ASUS F1-A75M Pro
PCIe slot x16 PCIe V2.0
1st GPU: HD 7970 1000Mhz / 1400Mhz
2nd GPU: -
3rd GPU: -
RAM: 2 x 4Gb 1600Mhz DDR3
Concurrency: 4 @ .5 CPUs + 0.25 GPUs
CPU Tasks: 2 x Various BOINC Projects
Free CPU cores: None
OS: Win 7 64-bit
Driver: AMD Catalyst 13.12
BOINC Version: 7.4.42 64bit

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 - - - - - - - - -
BRP6 (non-beta) - - - - - - - - -
BRP6-beta <1.52 - - - - - - - - -
BRP6-beta 1.52 9,228 11,007 11,758 364 1,562 1,954 2,633 242 100

[/pre]
COMMENTS

  • * Default Clocks 925Ghz / 1375 Ghz
    * Average Production BRP6-Beta 138k/day

====####====

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 738,985,628
RAC: 0

HOST 2 -

HOST 2 - http://einsteinathome.org/host/8516714

[pre]
CPU: 1x Intel 920
Cores/Threads: 4/4 HT Disabled
Motherboard: EVGA x58 E759
PCIe slot x16 PCIe V2.0
1st GPU: EVGA 780 Ti SC ACX
2nd GPU: -
3rd GPU: -
RAM: 3 x 2GB DDR3 1600 MHz CL6
Concurrency: 1 @ 0.2 CPUs + 1.0 GPUs
CPU Tasks: None
Free CPU cores: 3
OS: Win XP 64-bit
Driver: NVIDIA 337.50 64-bit
BOINC Version: 7.4.42 64bit

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 - - - - - - - - -
BRP6 (non-beta) - - - - - - - - -
BRP6-beta <1.52 - - - - - - - - -
BRP6-beta 1.52 2,382 2,437 2,564 40 328 364 399 17 36

[/pre]
COMMENTS

  • * GPU Data: Load: 92%, Boost Frequency: 1149 MHz, Memory Frequency: 3600 MHz, Peak Power Draw: 258w
    * CPU Data: Load: 3-7%
    * Average Production BRP6 v1.52: 156K/Day

====####====

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,551
Credit: 79,147,158,313
RAC: 66,193,244

HOST 01 - q8400-15 (This host

HOST 01 - q8400-15 (This host was reported in the opening post of this thread - much more detail given here)

[pre]
CPU: Intel Core 2 Quad (2.66GHz) @ 3.12 GHz
Cores/Threads: 4 / 4
Motherboard: Asus P5KPL AM/PS
PCIe slot x16 Version 1.x (single slot)
1st GPU: AMD HD7850 2GB (4x)
2nd GPU: -
3rd GPU: -
RAM: 2 x 2GB DDR2 800MHz
Concurrency: 4 @ 0.5 CPUs + 0.25 GPUs
CPU Tasks: 2 x FGRP4
Free CPU cores: 2
OS: PCLinuxOS 2014.04 64bit - kernel 3.12.16-pclos3
Driver: fglrx driver and OpenCL libs from Catalyst 13.12
BOINC Version: 7.2.42 64bit
RAC (approx): Mid Feb, RAC=~44,000. As at 06-Apr-15, RAC=~84,000 (on this host, RACs are auto-logged every 8 hours).

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 V1.39 27,631 27,887 28,074 79 3,521 3,713 3,933 53 300 Contiguous data prior to BRP6 start.
BRP6 V1.41 36,923 38,092 38,793 590 4,792 4,914 5,003 62 17 Only 17 were completed before switching
BRP6 early betas - - - - - - - - - Reliable data not available.
BRP6-beta V1.52 17,268 17,719 19,010 483 1,278 1,417 1,829 123 300 Contiguous tasks after the change to 1.52-beta.

Average run time/task (BRP5) = 27887/4 = 6,972 secs. Average run time/task (BRP6-1.41) = 38092/4 = 9,523 seconds - ie 36% longer than for BRP5.
Average run time per task (BRP6-1.52) = 17719/4 = 4,430 seconds - ie 53% shorter than for BRP6-1.41 - a bit doubtful because of small sample size.

Note: All statistics were done in LibreOffice after importing directly from job_log_einstein.phys.uwm.edu.txt found in the BOINC data directory.
The sample of 17 non-optimised BRP6 results is too small for full confidence but the results do agree quite well with what was found for the very similar HOST 14.
[/pre]

COMMENTS

  • * This host suffers from two factors - a PCIe v1.x slot and slow DDR2 800MHz RAM. The standard apps for both BRP5 and BRP6 perform quite poorly.
    * With the beta-1.52 optimisations, both these impediments seem to be largely overcome. The average run time per task is virtually comparable with values for much more modern hosts with the latest CPU architecture and RAM types.
    * The improvement is so spectacular that the beta-1.52 app is now 36% faster than the BRP5 1.39 app, despite the 30% larger task size.
    * To put things in context for the beta-1.52 app, compare the average run time for this host (4,430 sec) with that of the next host reported below, with a Haswell i3 CPU (4,521 sec) - this host is 91 sec faster per task on average.
    * The 2nd plot below of 300 consecutive beta-1.52 run times shows an interesting change starting around #235. The step change in run time may indicate a different data 'beam' with a different (less effective) response to the optimisations. Otherwise it's a pretty strange change to happen so suddenly with no change in crunching conditions.

====####====

HOST 02 - i3_4130-02 (This host was reported in the opening post of this thread - much more detail given here)

[pre]
CPU: Intel i3 4130 Dual Core 3.4GHz (Haswell architecture)
Cores/Threads: 2 / 4
Motherboard: Asrock H81M-HDS
PCIe slot x16 Version 2 (single slot)
1st GPU: AMD HD7850 2GB (4x)
2nd GPU: -
3rd GPU: -
RAM: 2 x 4GB DDR3 1333MHz
Concurrency: 4 @ 0.5 CPUs + 0.25 GPUs
CPU Tasks: 2 x FGRP4
Free CPU cores: 2 (Virtual)
OS: PCLinuxOS 2014.04 64bit - kernel 3.12.16-pclos3
Driver: fglrx driver and OpenCL libs from Catalyst 13.12
BOINC Version: 7.2.42 64bit
RAC (approx): Mid Feb, RAC=~78,000. As at 07-Apr-15, RAC=~88,000 (on this host, RACs are auto-logged every 8 hours).

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 V1.39 14,891 15,830 16,161 207 1,657 1,730 2,290 50 300 Contiguous data recorded prior to beta test start.
BRP6 V1.41 19,796 20,031 20,343 156 2,219 2,258 2,294 20 32 Small sample size only.
BRP6 early betas - - - - - - - - - Reliable data not available.
BRP6-beta V1.52 17,875 18,085 18,376 79 908 961 1,408 46 300 Contiguous tasks after start of 1.52-beta.

Average run time/task (BRP5) = 15830/4 = 3,958 secs. Average run time/task (BRP6-1.41) = 20031/4 = 5,008 seconds - ie 27% longer than for BRP5.
Average run time per task (BRP6-1.52) = 18085/4 = 4,521 seconds - ie 9.7% improvement over BRP6-1.41

Note: All statistics were done in LibreOffice after importing directly from job_log_einstein.phys.uwm.edu.txt found in the BOINC data directory.

[/pre]

COMMENTS

  • * This host shows a much higher CPU component average of 961 secs compared to that of HOST 04 which has a Haswell Refresh CPU. Looks like Haswell Refresh made quite a difference :-).
    * The average run time per task of 4,521 secs is pretty close to the value that all my hosts with 7850 GPUs are achieving irrespective of age and architecture of the CPU. This continues to suggest that GPU crunch times are pretty much decoupled from whatever the CPU, memory and PCIe bus blockages were that affected the former apps.
    * The BRP6-beta app is again showing quite low variability of results over a significant sample size.

====####====

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,551
Credit: 79,147,158,313
RAC: 66,193,244

HOST 05 - i3_2120-01 (This

HOST 05 - i3_2120-01 (This host was reported in the opening post of this thread - much more detail given here)

[pre]
CPU: Intel i3-2120 Dual Core 3.3GHz (Sandy Bridge architecture)
Cores/Threads: 2 / 4
Motherboard: Asus P8H61-MLE
PCIe slot x16 Version 2 (single slot)
1st GPU: AMD HD7850 2GB (4x)
2nd GPU: -
3rd GPU: -
RAM: 2 x 4GB DDR3 1333MHz
Concurrency: 4 @ 0.5 CPUs + 0.25 GPUs
CPU Tasks: 2 x FGRP4
Free CPU cores: 2 (Virtual)
OS: PCLinuxOS 2014.04 64bit - kernel 3.12.16-pclos3
Driver: fglrx driver and OpenCL libs from Catalyst 13.12
BOINC Version: 7.2.42 64bit
RAC (approx): Mid Feb, RAC=~76,000. As at 09-Apr-15, RAC=~91,000 (on this host, RACs are auto-logged twice per day).

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 V1.39 15,268 16,206 16,824 369 2,146 2,212 2,342 39 300 Contiguous data recorded prior to beta test start.
BRP6 V1.41 - - - - - - - - - Data not available.
BRP6-beta V1.47 17,327 18,424 21,371 655 1,320 1,487 2,436 272 162 1st stage of optimisation - times show high variability.
BRP6-beta V1.52 17,609 18,318 18,944 206 1,284 1,367 1,627 47 300 Contiguous tasks after start of 1.52-beta.

Average run time/task (BRP5) = 16206/4 = 4,052 secs. Average run time/task (BRP6-beta 1.47) = 18424/4 = 4,606 secs.
Average run time per task (BRP6-beta 1.52) = 18318/4 = 4,580 seconds - only a very small further improvement over 1st stage of optimisation.

Note: All statistics were done in LibreOffice after importing directly from job_log_einstein.phys.uwm.edu.txt found in the BOINC data directory.

[/pre]

COMMENTS

  • * Sandy Bridge i3's have a much larger CPU component when compared to later architectures like Haswell and Haswell refresh.
    * Despite this, the 4,580 sec average run time per task is still close to the 4,532 secs average for all HD7850 hosts assessed so far.
    * The above results are a good example of how the 2nd stage of optimisation made a large impact on the variability of run times seen after the 1st stage.
    * The BRP6-beta 1.52 app is again showing quite low variability of results over a significant sample size of 300 tasks.

====####====

HOST 06 - hebe (This host was reported in the opening post of this thread - much more detail given here)

[pre]
CPU: Intel Core 2 Quad (Kentsfield 2.66GHz) @ 3.04 GHz
Cores/Threads: 4 / 4
Motherboard: Asus P5QPL-AM
PCIe slot x16 Version 1.x (single slot)
1st GPU: AMD HD7850 2GB (4x)
2nd GPU: -
3rd GPU: -
RAM: 2 x 2GB DDR2 800MHz
Concurrency: 4 @ 0.5 CPUs + 0.25 GPUs
CPU Tasks: 2 x FGRP4
Free CPU cores: 2
OS: PCLinuxOS 2014.04 64bit - kernel 3.12.16-pclos3
Driver: fglrx driver and OpenCL libs from Catalyst 13.12
BOINC Version: 7.2.42 64bit
RAC (approx): Mid Feb, RAC=~55,000. As at 09-Apr-15, RAC=~89,000 (on this host, RACs are auto-logged every 8 hours).

Elapsed Time Statistics CPU time Statistics
----------------------------------- ------------------------------ Sample
Search Min Mean Max S.D. Min Mean Max S.D. Size Notes / Comments
====== ======= ======= ======= ===== ====== ====== ====== ==== ====== ================
BRP5 V1.39 22,196 22,638 23,104 206 4,436 4,606 4,777 78 300 Contiguous data prior to BRP6 start.
BRP6 V1.41 29,590 29,997 30,631 263 6,275 6,486 6,840 136 18 Only 18 were completed before switching to 1.52-beta
BRP6 early betas - - - - - - - - - Reliable data not available.
BRP6-beta V1.52 17,857 18,610 19,236 170 1,533 1,678 2,143 68 300 Contiguous tasks after the change to 1.52-beta.

Average run time/task (BRP5) = 22638/4 = 5,660 secs. Average run time/task (BRP6-1.41) = 29997/4 = 7,499 seconds - ie 32% longer than for BRP5.
Average run time per task (BRP6-1.52) = 18610/4 = 4,653 seconds - ie 38% shorter than for BRP6-1.41 - a bit doubtful because of small sample size.

Note: All statistics were done in LibreOffice after importing directly from job_log_einstein.phys.uwm.edu.txt found in the BOINC data directory.
The sample of 18 non-optimised BRP6 results is too small for full confidence but the results do agree quite well with the 30% larger task size than for BRP5.
[/pre]

COMMENTS

  • * This host also suffers from a PCIe v1.x slot and slow DDR2 800MHz RAM but it always did perform significantly better on BRP5 than all other similar hosts (eg Host 01 and Host 14).
    * With the beta-1.52 optimisations, this host is now very much on par with all similar Q8400 hosts - the previous advantage has completely been negated. The average run time per task for all HD7850 equipped hosts is quite similar across all hosts irrespective of CPU architecture and RAM type.
    * The improvement in the BRP6 beta-1.52 app over the BRP5 1.39 app is 18%, despite the 30% larger task size for BRP6.
    * To date I've examined 8 separate hosts with HD7850 GPUs. The average run time per task for the whole 8 is 4,532 secs. At 4,653 secs this host has the slowest. The fastest is 4,430 secs on Host 01 - also a Q8400 equipped host.

====####====

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.