Times (Elapsed / CPU) for BRP5/6/6-Beta on various CPU/GPU combos - DISCUSSION Thread

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 586751404
RAC: 116273

RE: Take a look at Host 08,

Quote:
Take a look at Host 08, the data for which I've just posted.


I estimate the period of those oscillations at ~10 WUs. At 8940 s/WU that's pretty close to a period of 1 day. Is something else CPU-intensive running daily? Are the FGRP4 tasks on this host taking about 24h? Maybe they do something special at the beginning or end of a WU, which might interfere with supporting the GPUs.

MrS

Scanning for our furry friends since Jan 2002

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5885
Credit: 119092567750
RAC: 23449120

Thanks very much for having a

Thanks very much for having a look at this.

Quote:
Is something else CPU-intensive running daily?


Nope :-). All my hosts with GPUs do nothing but crunch.

I do have a server machine (that also crunches) that runs a bash script continuously which visits every single host in the fleet every 8 hours. It spends a little under 6 minutes per host and it finishes the lot with about 20-30 mins to spare before starting the next loop. Within that per host window of 6 mins it does the following things on the host:

  • * Checks that the host is alive and well and that BOINC is running.
    * Attempts to restart BOINC if it has stopped but the host seems otherwise OK.
    * Checks with rsync (using a local data file cache) that the host's cache of LATeahxxx data files is fully up to date in order to avoid unnecessary data file downloads from EAH servers.
    * Increments the host's work cache from 0.5 days to 3.5 days to force task fetch at a time when the host's data file cache is fully up-to-date.
    * Keeps a log on the server host of details of new tasks, new data files and the host's current status and RAC.
    * Uploads to the local server cache (again using rsync), any new LATeahxxx data file that happens to be downloaded from EAH in the 6 min window, so that it can be immediately available to subsequent hosts in the loop when they update their data file caches.
    * Does a similar set of things for other projects (like milkyway and POGS) that might be running on the host. It works out the attached projects at the time of visit.
    * Sets the work cache back to whatever it was (0.5 days) when task fetch is over, just before moving on to the next host in the loop.

The script doesn't seem to have any effect on the server host, let alone the slave hosts, during the 6min window every 8 hours.

Quote:
Are the FGRP4 tasks on this host taking about 24h?


Nope :-). They only take around 6 hours each on that host and around 6.6 hours each on the (slightly slower) 3x host.

I've just posted another pair of hosts, this time with a GTX650 in each, one running at 2x and 3x respectively. These new results are quite an eye-opener and (I think) show that GTX650s and the new app are not a particularly good match for each other for some reason.

When I first started using GPUs, I bought a GTX550Ti (before kepler came out) which didn't seem to be doing very well at all (very early version of the BRPx app). The app got improved and the 550Ti did a lot better so I bought a low cost 650 which also did surprisingly well - not far short of what the 550Ti was doing and at a significantly lower power draw. On the basis of that I bought quite a few more 650s. At the time and for about the same money, AMD had a 7770 so I bought one for comparison. The AMD drivers at that time were quite sub-standard and the 7770 gave about 20% less performance than a 650 and a slightly higher power draw. As the drivers improved, so did the 7770 performance and it now beats a 650, probably by more than 20%.

When AMD GPUs started to dominate the top hosts list, I bought a HD7850 on special for $AUD149 (about $US130 at the time) which wasn't all that much more than what the 650s had cost. It performed so well running at 4x that I bought a whole lot more while they were still on special. They are now showing quite big gains still running at 4x with the new app.

Somewhere along the way, I did buy a 650Ti that didn't deliver enough improvement over a 650 to warrant further attention. I'm now keen to have a look at it and see how it's handling the new app.

Cheers,
Gary.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 586751404
RAC: 116273

Regarding your GTX650: I

Regarding your GTX650: I wouldn't worry too much about those two task clusters. They don't harm overall productivity (3x is still slightly ahead of 2x). I'd probably move to 2 concurrent tasks on those GPUs, though.

The Ti version has appreciably more shaders, so might react better to 3x. On ther other hand it's quite limited by its 128 bit memory bus, which hurts especially Einstein BRP. This probably explains the limited gain compared to your GTX650's. You might as well reduce the power target on that GPU (if this is possible under Linux), if it's really badly memory bandwidth limited. Or move it to some other project.

MrS

Scanning for our furry friends since Jan 2002

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RE: Thanks very much for

Quote:

Thanks very much for having a look at this.

Quote:
Is something else CPU-intensive running daily?

Nope :-). All my hosts with GPUs do nothing but crunch.

Looking at the 24 hour periodic cycle, i noticed the longer elapsed time WUs are occurring during the day. Is there perhaps some temperature related issue occurring?

I also noticed this CUDA error on all your tasks (on all your nVidia hosts I looked at) - Module nvidia_uvm not found. it does not seem to be causing a problem, but i do not see this error elsewhere.

The GTX650 is rather odd, as MrS suggests try running at 2x or see what happens at 4x and 1x.

Mumak
Joined: 26 Feb 13
Posts: 335
Credit: 3600993604
RAC: 1502045

Has anybody tried to run a

Has anybody tried to run a BRP6 (Beta) together with BRP4G ?
I just got some BRP4G units which caused a mix and noticed that on an AMD HD7950 the units got stuck when running together. It doesn't seem to happen always, since some of them passed OK...

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5885
Credit: 119092567750
RAC: 23449120

I've been continuing to post

I've been continuing to post results for different setups in the results thread. There has been enough time to get large numbers of results for beta-1.52 and I'm particularly pleased with how it's turning out on all my hosts (whatever vintage) that have been equipped with a HD7850 GPU. That GPU was originally chosen because this particular post by Robert had quite impressed me when I first started thinking about adding more GPUs to the mix. So if you're still lurking, thank you Robert :-).

I didn't get any 7870s - they were far too expensive - but I figured a 7850 would just be a bit lower down that best efficiency line on the graph and for the very cheap price and lower power draw, they would suit me just fine.

(I had rather thoughtlessly wandered off-topic into a new conversation at this point. If you are wondering where it has gone to, check here.)

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5885
Credit: 119092567750
RAC: 23449120

I'd just like to thank FourOh

I'd just like to thank FourOh for making a very nice submission to the results thread. It's a pretty impressive 'first post' :-).

Obviously, FourOh is not new to BOINC projects - there's a rather long list of projects that have benefited from these contributions over quite a period. I'm happy that Einstein is also a beneficiary.

Once again, thank you very much, FourOh.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5885
Credit: 119092567750
RAC: 23449120

I have updated the data for

I have updated the data for my HOST 12 which is the second host listed in this particular message.

That host has a 1GB AMD HD7770 GPU (3x concurrency) and I wondered if the performance was being adversely affected by running at 3x. After posting the 3x results, I changed the concurrency to 2x. 60 tasks have now been completed so I thought that would be enough to see the difference with confidence.

Anything that changed has been flagged in blue to make it fairly obvious. In summary, you get nearly 4% extra GPU output by running at 3x. There is a loss at 3x in that there were 2 free cores allowed but only 1 is needed at 2x. It seems that the extra GPU production and the faster CPU task time must have more than made up for the CPU task loss at 3x, because the RAC has been drifting down after the change to 2x was made. I'm now going to change it back to 3x.

Cheers,
Gary.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 586751404
RAC: 116273

Gary wrote:This host shows a

Gary wrote:
This host shows a much higher CPU component average of 961 secs compared to that of HOST 04 which has a Haswell Refresh CPU. Looks like Haswell Refresh made quite a difference :-)


Gary, this quote in the results thread caught my attention. I've seen your numbers, yet the explanation should be something else. "Haswell refresh" is using just the same silicon, configured for 100 MHz more here and there. It's hwat has traditionally been called a "speed bump". In the "good old times", where Intel would present a faster CPU every few months and pass the older ones down to lower price points. Without calling it a new generation, refresh etc.

There might be a new stepping involved, though. However, I'm not aware of any other report where the performance per clock of Haswell refresh differs in any way from Haswell. I suspect there is some other reason lurking deep in the system.

MrS

Scanning for our furry friends since Jan 2002

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

RE: Gary, this quote in the

Quote:
Gary, this quote in the results thread caught my attention. I've seen your numbers, yet the explanation should be something else. "Haswell refresh" is using just the same silicon, configured for 100 MHz more here and there.


I have both an i7-4770 and an i7-4790, and that is the only difference I see. In fact, they apparently use the same stepping (Revision C0, CPUID 0x306C3) according to Core Temp, so it may be only some process tweaks that separate them.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.