Times (Elapsed / CPU) for BRP5/6/6-Beta on various CPU/GPU combos - DISCUSSION Thread

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,870
Credit: 115,890,054,486
RAC: 35,373,817

I have barely enough results

I have barely enough results for v1.52 on my HOST 05 in the opening post of the RESULTS thread but I've just updated the table there and decided to add them. I had extra results for 1.47 so I added a new line for 162 results (107+55 more) whilst leaving the previous 107 intact for comparison.

The results for 1.52 look very promising.

Cheers,
Gary.

MarkHNC
MarkHNC
Joined: 31 Aug 12
Posts: 37
Credit: 170,965,842
RAC: 0

RE: RE: My 1st beta 1.52

Quote:
Quote:
My 1st beta 1.52 is running on my Intel HD4000

Actually we are not sure yet whether we want to keep serving the rather big BRP6 work units to the Intel iGPUs. At least the less powerful among them like the HD 2500 will take longer to crunch than we usually like tasks to take to complete. It is quite possible that we'll stop BRP6 beta on the Intel iGPUs after some initial tests.

HB

I have an i3 3220/HD 2500 that runs at 100% CPU and 100% GPU nearly 24/7 and that picked up two of these big tasks. I have it setup to run two Einstein tasks at the time, because it was peaking around 70% GPU utilization when running the single-unit Arecibo tasks. I also run three WCG units alongside, mostly FAAH-Vina with the occasional CEP2. So it is now running two of these biggies.

First is 50%, elapsed 23h30m, remaining 11h52m
Second is 48%, elapsed 24h49m, remaining 12h58m

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,924,540,827
RAC: 885,109

RE: RE: RE: My 1st beta

Quote:
Quote:
Quote:
My 1st beta 1.52 is running on my Intel HD4000

Actually we are not sure yet whether we want to keep serving the rather big BRP6 work units to the Intel iGPUs. At least the less powerful among them like the HD 2500 will take longer to crunch than we usually like tasks to take to complete. It is quite possible that we'll stop BRP6 beta on the Intel iGPUs after some initial tests.

HB


I have an i3 3220/HD 2500 that runs at 100% CPU and 100% GPU nearly 24/7 and that picked up two of these big tasks. I have it setup to run two Einstein tasks at the time, because it was peaking around 70% GPU utilization when running the single-unit Arecibo tasks. I also run three WCG units alongside, mostly FAAH-Vina with the occasional CEP2. So it is now running two of these biggies.

First is 50%, elapsed 23h30m, remaining 11h52m
Second is 48%, elapsed 24h49m, remaining 12h58m


Once this pair have finished, you might find it interesting to run the next pair with the CPU at 75% utilisation. What were your run (elapsed) times for the small Arecibo tasks on the HD 2500?

MarkHNC
MarkHNC
Joined: 31 Aug 12
Posts: 37
Credit: 170,965,842
RAC: 0

RE: RE: RE: RE: My

Quote:
Quote:
Quote:
Quote:
My 1st beta 1.52 is running on my Intel HD4000

Actually we are not sure yet whether we want to keep serving the rather big BRP6 work units to the Intel iGPUs. At least the less powerful among them like the HD 2500 will take longer to crunch than we usually like tasks to take to complete. It is quite possible that we'll stop BRP6 beta on the Intel iGPUs after some initial tests.

HB


I have an i3 3220/HD 2500 that runs at 100% CPU and 100% GPU nearly 24/7 and that picked up two of these big tasks. I have it setup to run two Einstein tasks at the time, because it was peaking around 70% GPU utilization when running the single-unit Arecibo tasks. I also run three WCG units alongside, mostly FAAH-Vina with the occasional CEP2. So it is now running two of these biggies.

First is 50%, elapsed 23h30m, remaining 11h52m
Second is 48%, elapsed 24h49m, remaining 12h58m


Once this pair have finished, you might find it interesting to run the next pair with the CPU at 75% utilisation. What were your run (elapsed) times for the small Arecibo tasks on the HD 2500?

So far, what has been downloaded into the cache is more Arecibo. I'll try to remember to watch for BRP6.

Last 40 Arecibo:
Average: 3,538.75; Median: 3,551.86; Min: 3,047.78; Max: 3,688.01

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,924,540,827
RAC: 885,109

RE: RE: RE: I have an

Quote:
Quote:
Quote:

I have an i3 3220/HD 2500 that runs at 100% CPU and 100% GPU nearly 24/7 and that picked up two of these big tasks. I have it setup to run two Einstein tasks at the time, because it was peaking around 70% GPU utilization when running the single-unit Arecibo tasks. I also run three WCG units alongside, mostly FAAH-Vina with the occasional CEP2. So it is now running two of these biggies.

First is 50%, elapsed 23h30m, remaining 11h52m
Second is 48%, elapsed 24h49m, remaining 12h58m


Once this pair have finished, you might find it interesting to run the next pair with the CPU at 75% utilisation. What were your run (elapsed) times for the small Arecibo tasks on the HD 2500?

So far, what has been downloaded into the cache is more Arecibo. I'll try to remember to watch for BRP6.

Last 40 Arecibo:
Average: 3,538.75; Median: 3,551.86; Min: 3,047.78; Max: 3,688.01


I have HD 4000 and HD 4600, but I don't think there's a big difference from HD 2500. By reducing CPU loading, I get something below 700 seconds (in round figures) for tasks running one at a time. I'll go get some accurate figures.

Edit: last 40 validated on each host
HD 4000 Average: 665.92 Median: 664.79 Min: 658.34 Max: 681.67
HD 4600 Average: 653.09 Median: 652.31 Min: 646.25 Max: 665.45

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 569,007,006
RAC: 150,680

HD2500 has just 6 EUs,

HD2500 has just 6 EUs, compared to 16 in HD4000 and 20 in HD4600. Compute performance varies from the expected factor of 2.3 slower than HD4000 (SmallLux) to almost a factor of 4 for the fluid simulation. Based on the shaders and Richard's HD4000 we could expect 1773s per single WU, or 3547s for 2 of them if there was no scaling benefit. That's pretty close to the mean 3539s per 2 WUs which Mark reported. With BRP4 and app 1.39 there was an advantage of ~10% for running 2 WUs concurrently on my HD4000 - this may make up for a slightly lower clock speed on Mark's i3.

Richard, do you see a performance difference between HD4000 and HD4600 with 1.52? I always suspected there was none with 1.39 because the main memory bandwidth limited throughput. Now that this bottleneck is significantly widened I'd expect to see some performance difference. It's 16 vs. 20 EUs, after all.

BTW: my Intel is only getting 1.39 since the first two successful ones. I'd love to use 1.52 even on BRP4 work, but don't want the hassle of an anonymous platform.

MrS

Scanning for our furry friends since Jan 2002

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,870
Credit: 115,890,054,486
RAC: 35,373,817

I've updated the stats table

I've updated the stats table for HOST 01 in the opening post of the results thread. I've added a lot more to the 1.47 sample. I've also added a preliminary stats line for 1.52 which I'll update with more results when available. The further improvement from 1.52 on this host is quite impressive!!

Cheers,
Gary.

Mumak
Joined: 26 Feb 13
Posts: 325
Credit: 3,428,774,161
RAC: 1,762,881

If someone is curious how a

If someone is curious how a Tesla performs here ;-)
Tesla K20c
BRP6 v1.39, running 2 WUs
705 MHz, ECC ON: 60 C, 100 W, ~13000 secs
758 MHz, ECC OFF: 61 C, 108 W, ~11200 secs

Disabling ECC improves memory bandwidth, which is important for current BRP tasks. This is likely to change for the new versions.. But sorry, I haven't opted into the Beta yet. Too much other work..

-----

Daniels_Parents
Daniels_Parents
Joined: 9 Feb 05
Posts: 101
Credit: 1,877,689,213
RAC: 0

[pre] Comparing GTX670 and

[pre]
Comparing GTX670 and GTX560Ti 2 x GTX670 2 x GTX560Ti

GPU Clock (MHz) 1097 900
Memory Clock (MHz) 3005 2106
Memory Size (GB) 2 1
Memory Bandwith (GB/s) 192 128
Memory Interface With (Bit) 256 256
Voltage (V) 1.162 1.062
CUDA Cores 1344 384

CPU Processor i7 950 / 3.051 MHz 920 / 2.7 MHz
Memory 12 GB / DDR3 1066 12 GB / DDR3 1333
Motherboard EVGA E768 Gigabyte GA-EX58 Extreme
PCI-E x16 /x16 x16 / x8
Operating System Win7 x64 Home Premium Home Premium

RCA (cobblestones about) 97'000 100'000

All GPUs are now running 4 x BRP6-Beta 1.52 each, remaining CPU cores working on other tasks (GPU usage 97% shown by Precision-X)
The system with the GTX560Ti cards is slightly more successful

Seen from my point of view, the system with the GTX670 cards should cut significantly better ... am I wrong ?

[/pre]

I know I am a part of a story that starts long before I can remember and continues long beyond when anyone will remember me [Danny Hillis, Long Now]

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,924,540,827
RAC: 885,109

RE: Richard, do you see a

Quote:

Richard, do you see a performance difference between HD4000 and HD4600 with 1.52? I always suspected there was none with 1.39 because the main memory bandwidth limited throughput. Now that this bottleneck is significantly widened I'd expect to see some performance difference. It's 16 vs. 20 EUs, after all.

BTW: my Intel is only getting 1.39 since the first two successful ones. I'd love to use 1.52 even on BRP4 work, but don't want the hassle of an anonymous platform.

MrS


I don't think we're comparing like with like. My posted times were for v1.34 (BRP4) work - I don't think v1.39 (BRP6) was ever released for intel_gpu, we only started running them with the Beta.

I've never seen anything like a 25% difference between the 4000 and 4600 - the 4600 (in an i5) has been better than the 4000 (in an i7), but only by the sort of 10 second margin you saw in the figures I posted yesterday. Both figures have improved slightly with newer drivers, but that has to be done carefully because of the validation issues we've discussed elsewhere.

I tried running a dozen or so tasks 2-up on the HD 4600 overnight, but only saw about a 3% improvement on the overall throughput:

HD 4600 Average: 1,267.31 Median: 1,261.45 Min: 1,248.90 Max: 1,310.41

I'll watch out for a comparison between 4000 and 4600 with the v1.52 BRP6 Beta, but it'll take time to gather enough numbers.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.