Times (Elapsed / CPU) for BRP5/6/6-Beta on various CPU/GPU combos - DISCUSSION Thread

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 760
Credit: 182,682,256
RAC: 7,736

GPU utilization on the HD4000

GPU utilization on the HD4000 is pretty good for a single task: ~95%, with short periodic dips to 0. I suppose that's when the CPU calculations happen. Running 2 concurrently nails the value pretty much at 99%, but the gain will probably significantly less than for the previous app.

MrS

Scanning for our furry friends since Jan 2002

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,233
Credit: 44,699,593,019
RAC: 38,483,250

RE: RE: @Gary I will post

Quote:
Quote:
@Gary I will post my results here as i haven´t heard if it´s ok with the format proposed for multiple GPU cards - I will not have much time next few weeks, and i would like to put up what i have.

I have no problem with your modified format - in fact it's an improvement - much easier to take in the sequence of min -> mean -> max -> SD for both Elapsed and CPU. I'll move your set of results to the RESULTS thread later on.


Done.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,233
Credit: 44,699,596,484
RAC: 38,482,835

Many thanks to Holmis for

Many thanks to Holmis for posting results with graphs.

Makes it very easy to see how many outliers there are and the highly skewed nature of the distribution of times - certainly not a standard bell curve :-). Love the fact that the min and the mean are so close together - lots of tasks are benefiting very strongly from the optimisations.

I find it a bit puzzling to see beta outliers that are significantly worse than any values in the non-optimised app set. For example, there must be something else going on for that really big outlier just past the middle of the set of points.

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 1,941
Credit: 273,418,652
RAC: 99,882

RE: RE: My 1st beta 1.52

Quote:
Quote:
My 1st beta 1.52 is running on my Intel HD4000

Actually we are not sure yet whether we want to keep serving the rather big BRP6 work units to the Intel iGPUs. At least the less powerful among them like the HD 2500 will take longer to crunch than we usually like tasks to take to complete. It is quite possible that we'll stop BRP6 beta on the Intel iGPUs after some initial tests.

HB


Confirming that BRP 6 completes in 9 hours on my HD 4000 and 4600.

BUT that's only by controlling the running environment, either by freeing a whole CPU core or by tying it to realtime priority using Process Lasso. Without that support, processing speed drops from 'several clicks (0.001%) per second' to 'several seconds per click'. I haven't got the patience to let one run with the default project settings, but I'd predict two days at least - which is too long.

I suggest that you change the default plan class setting for intel_gpu to assign a whole CPU core, so that on unmanaged hosts, tasks run at the 9-hour mark rather than the 2-day mark.

(I'd also suggest that this extremely high CPU dependency be investigated as a bug, but that's for another day)

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2,557,091
RAC: 44

My runtimes for my GTX760

My runtimes for my GTX760 went from ~10,500secs for the v1.39 BRP5-cuda32-nv301 app, down to ~5,400secs for the v1.50 BRP6-Beta-cuda32-nv301 app,
the HD7770 was managing times ~10,500 (+/-1000)secs on the v1.47 BRP6-Beta-opencl-ati app, Both GPUs on a PCIe2 x8 link, CPUs idle.

Host 2542754

Sorry, it's not a fulltime Einstein host, and has been rather temperamental over it's life, so much so that i removed it's 4.7GHz overclock the other week,
this didn't help the recent screen lockups (since i got the GTX760) i was seeing while i was running the Einstein BRP5/6/6-Beta apps,
it's been much more stable since it's returned to running Seti, althrough it's times have increased a bit without the overclock,
time for a 4th Gen i7 i think.

Claggy

Holmis
Joined: 4 Jan 05
Posts: 1,118
Credit: 796,101,326
RAC: 154,002

RE: Many thanks to Holmis

Quote:

Many thanks to Holmis for posting results with graphs.

Makes it very easy to see how many outliers there are and the highly skewed nature of the distribution of times - certainly not a standard bell curve :-). Love the fact that the min and the mean are so close together - lots of tasks are benefiting very strongly from the optimisations.

I find it a bit puzzling to see beta outliers that are significantly worse than any values in the non-optimised app set. For example, there must be something else going on for that really big outlier just past the middle of the set of points.


Happy to help, will return with updated graphs once I've had the chance to run a few v1.52 tasks to see if that app is even better.

As to the long running task one explanation might be that it's my only computer and thus is used for all the things I do, mostly browsing the web and watching videos. If I use it for more demanding tasks I usually stop Boinc.
Also using the iGPU seems to impact the performance of the 660Ti and also to some degree the CPU tasks.

Holmis
Joined: 4 Jan 05
Posts: 1,118
Credit: 796,101,326
RAC: 154,002

Early results from v1.52 does

Early results from v1.52 does look like it might have eliminated the variability in run time.

Do take note that this is data based on a very small sample size and any conclusions might be totally wrong.

[pre] Elapsed Time Statistics CPU time Statistics
---------------------------------- ------------------------------------ Sample
Search Min Mean Max Std Dev Min Mean Max Std Dev Size Notes / Comments
====== ====== ====== ====== ======= ======= ====== ====== ======= ====== ================
BRP6b v1.52 10442 10646 11063 186 858 1022 1191 108 12 Data from the online database, very small sample size.
[/pre]
I'll continue to collect data and return with more numbers and updated graphs once I feel confident that the sample size is big enough to draw conclusions from. Say 50+ results or so.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,233
Credit: 44,699,603,414
RAC: 38,483,502

RE: Early results from

Quote:

Early results from v1.52 does look like it might have eliminated the variability in run time.

Do take note that this is data based on a very small sample size and any conclusions might be totally wrong.


For easy comparison, I've pasted in your previous results to save having to jump threads :-). Interesting, but as you say, too early to really tell.

Quote:

[pre] Elapsed Time Statistics CPU time Statistics
---------------------------------- ------------------------------------ Sample
Search Min Mean Max Std Dev Min Mean Max Std Dev Size Notes / Comments
====== ====== ====== ====== ======= ======= ====== ====== ======= ====== ================
GPU-BRP5 13456 14874 19130 1696 4666 5629 7356 886 13 Data from the online database, small sample size.
GPU-BRP6 10797 18690 25095 2119 4224 7543 11054 1351 48 Data from the online database
GPU-BRP6b 8058 12268 34133 4585 1042 3356 24500 4597 64 Data from the online database, decent sample size.
BRP6b v1.52 10442 10646 11063 186 858 1022 1191 108 12 Data from the online database, very small sample size.
[/pre]
I'll continue to collect data and return with more numbers and updated graphs once I feel confident that the sample size is big enough to draw conclusions from. Say 50+ results or so.


Cheers,
Gary.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 760
Credit: 182,682,256
RAC: 7,736

RE: Confirming that BRP 6

Quote:
Confirming that BRP 6 completes in 9 hours on my HD 4000 and 4600.


A bit shy of 16 hours for 2 of them on my HD4000 @ 1.30 GHz. This increases maximum throughput from 9500 RAC to a bit over 13k RAC :)
(sample size: 2)

MrS

Scanning for our furry friends since Jan 2002

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 738,423,020
RAC: 0

With the new BRP6 version

With the new BRP6 version 1.52 application, I am seeing that my 3rd 7970 connected at x8 3.0 is able to keep up with the other two cards connected at x16 3.0 now. With previous BRP application versions, the card connected at x8 3.0 would run at least 20% slower than the other two cards.

Jeroen

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.