Binary Radio Pulsar Search (Perseus Arm Survey) "BRP5"

Neil Newell
Neil Newell
Joined: 20 Nov 12
Posts: 176
Credit: 169699457
RAC: 0

I'm not terribly bothered

I'm not terribly bothered about the actual credit, but it's always interesting to dig into things and try and understand what is going on.

Since BRP5 was launched around 24-May, I've calculated the following values for 10 of my hosts:-
BRP4a: Average run time for BRP4 tasks prior to BRP5 introduction
BRP4b: Average run time for BRP4 tasks since BRP5 introduction
BRP5: Average run time for BRP5

The "Relative change" shows the average run time relative to BRP4a; my hosts appear to range from 9.09 to 12.73 (but note the outlyers are slow hosts which haven't done a lot of BRP5 tasks yet).

The calculations were done by simply adding up the time taken for each of the 3 categories, and dividing by the number of tasks completed in the category (to give the average time per task). Note that the figures will be more accurate on the faster hosts (since they have done more tasks).

These hosts spend all their time on E@H, except for hosts 6570151 and 6123309 which spend 25% of their time on A@H.

If anyone wants to send me their 'job_log_einstein.phys.uwm.edu.txt', I should be able to run the same calculations pretty easily - it would be particular interesting to do this for the hosts that are in the 7x-8x BRP4 time range.

Beyond
Beyond
Joined: 28 Feb 05
Posts: 121
Credit: 2333926212
RAC: 5255178

RE: RE: [Can anyone tell

Quote:
Quote:
[Can anyone tell me what other projects (e.g. SETI) grant on average per GPU hour?

Ho Bernd. Don't know about seti, creditnew is bizarre and erratic to say the least. Probably one of the reasons for the mass exit from the project.

GPUGrid grants about 8250 credits/hour on a relatively slow GTX 650 Ti. Double that on a GTX 670. POEM, MilkyWay, Collatz, Donate and DistrRTgen are all higher (some much higher) for both ATI/AMD and NVidia GPUs. The WCG GPU app is gone but it granted very low credits (but higher than Einstein ;-).


The one I didn't list was PrimeGrid since I haven't run it for a while. So ran a PG WU yesterday on a very slow HD 7770 GPU to check:

http://www.primegrid.com/result.php?resultid=455017827

3371 credits for 5,899.13 seconds run time, so 0.571 credits/second
The same GPU averaged 3870 seconds for 2x Einstein WUs for 1000 credits, so 0.258 credits/second.

The PrimeGrid credits/second is still much lower than GPUGrid, POEM, Donate, Collatz, MW, DistrRTgen or Moo! I think that covers all the current GPU projects except seti, which basically HAS to run creditnew because of, well you know why...

Personally I think all GPU projects should as much as possible offer the same level of credits/hour. Unfortunately they don't. Einstein is the project that diverges most from the norm (except for perhaps seti which doesn't have a choice).

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 112

RE: it doesn't reflect the

Quote:
it doesn't reflect the fact that there is a big variance of runtimes of BRP4 and BRP5 WUs.


The variance that I found is when my GPU is doing something else on top of OpenCL calculations. In my case, BRP4s run at an average of ~2,300 seconds when I leave my system alone. Do I go play Minecraft, for which I haven't yet set up an exluisve_gpu_app exclusion, then the times will be 3,400-3,600 seconds for those same tasks.

Equally so for BRP5s, which run at around 17,500 seconds when left alone. Go play Minecraft and it's 21,500 seconds.

Now, although Minecraft is Java based, it uses OpenGL for the drawing of the pretty pictures on the screen. And guess what? OpenCL and OpenGL use the same processing parts of the GPU, thus playing Minecraft while doing OpenCL calculations will slow down these calculations.

So check what you're doing with your system as well, and see if that uses OpenGL or DirectX3D.

Chris
Chris
Joined: 9 Apr 12
Posts: 61
Credit: 45056670
RAC: 0

RE: If anyone wants to send

Quote:
If anyone wants to send me their 'job_log_einstein.phys.uwm.edu.txt', I should be able to run the same calculations pretty easily - it would be particular interesting to do this for the hosts that are in the 7x-8x BRP4 time range.

I didn't know the job log existed. Can someone explain the columns?

Neil Newell
Neil Newell
Joined: 20 Nov 12
Posts: 176
Credit: 169699457
RAC: 0

As far as I know:- Job log

As far as I know:-

Job log format
==============
nnnnnnnnn = Completion time
ue =  # estimated time?
ct = CPU time # Shown on task report as 'CPU Time'
fe = # fixed for job - estimate of FP ops?
nm = Task name
et = Elapsed time # time shown on Manager 'ready to report'/as 'Run time' on website


See also message 124729.

Eric_Kaiser
Eric_Kaiser
Joined: 7 Oct 08
Posts: 16
Credit: 25699305
RAC: 0

RE: So check what you're

Quote:

So check what you're doing with your system as well, and see if that uses OpenGL or DirectX3D.


Sleeping. Runs overnight. No other applications running. BOINC reduced to tray. No widgets. No screensaver.
I'm not investigating any further.
All activities and WUs will be cancelled this evening.
My appologies for starting to post here.

Time will tell wether I crunch for Einstein again or not

Over & out

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

I ran a few tasks on a GTX

I ran a few tasks on a GTX 680 last night. The card is installed in a PCI-E 2.0 x16 slot and the OS is Linux.

BRP4: 788 - 799 seconds
BRP5: 7348, 7377 seconds

This is running 1-task at a time. The ratio is 9.2:1 - 9.3:1 based on the minimum and maximum runtimes of the BRP4 and 5 tasks. I am noticing on my tri AMD system that PCI-E bandwidth may have a more significant impact with BRP5 than BRP4. The tasks running on the single card installed in an x8 3.0 slot appear to be running 10-15% slower than the tasks running on the cards installed in x16 3.0 slots. I need to do additional monitoring to confirm this is the case or not. The majority of tasks run in the 12.8 - 14K range while other tasks run in the 15-16K range. From what I can tell the longer running tasks are associated with the card in the x8 slot.

Neil Newell
Neil Newell
Joined: 20 Nov 12
Posts: 176
Credit: 169699457
RAC: 0

RE: I ran a few tasks on a

Quote:
I ran a few tasks on a GTX 680 last night. The card is installed in a PCI-E 2.0 x16 slot and the OS is Linux.
...
I am noticing on my tri AMD system that PCI-E bandwidth may have a more significant impact with BRP5 than BRP4. The tasks running on the single card installed in an x8 3.0 slot appear to be running 10-15% slower than the tasks running on the cards installed in x16 3.0 slots.

I've been testing 1x and 2x on a linux PCIe 2.0 system with GTX580.

1x: ~12,000s
2x: ~19,000s (9,500s/task)

As this is quite a difference, I'm wondering if even higher utilisation would be better (compared to BRP4, where NVIDIA at least don't seem to improve much beyond 2x).

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7219624931
RAC: 975722

Neil Newell wrote: I'm

Neil Newell wrote:
I'm wondering if even higher utilisation would be better (compared to BRP4, where NVIDIA at least don't seem to improve much beyond 2x).


I ran controlled comparison tests for BRP5 running 1x, 2x, 3x, and 4x on one GTX660 host
As both power consumption and GPU load were down at 3x from BRP4 on this system, and I am currently running only a single CPU job on this 4-core host, I thought I might get appreciable speedup of 4x over 3x, but the improvement in throughput was very small, and came at a cost of degraded system level power productivity. So I've reverted to 3x. For this system the 3x benefit over 2x is moderate, but definite, including a power efficiency improvement.

On previous experiments with BRP4, over a broader range of CPU and GPU jobs running, I noticed that for my system the shape of the GPU "copies" performance curve changes with number of CPU jobs. Specifically the optimum number of GPU jobs is lower with higher CPU job counts, and the penalty for being over optimum is much more severe there also.

Here are my observations:
[pre]CPU BRP5 watts hours/CPU hours/BRP5 GPU_load RAC RAC/watt
1 1 156.7 3.833 3.496 72% 29030 185.2
1 2 170.7 3.946 5.417 89% 36974 216.6
1 3 174.5 3.979 7.689 93% 38973 223.3
1 4 176.3 4.033 10.227 94% 39042 221.5[/pre]

The RAC column is a long-term full-utilization estimate computed using the elapsed times observed for CPU and GPU jobs, and using a GW credit estimate of 251.2 and BRP4 of 4000.

People running more CPU jobs or weaker host systems may see the benefit drop off faster and turn negative sooner. Possibly a more capable host, in particular using PCIe 3.0 (my motherboard is 2.0) may see better benefit from larger numbers of jobs.

Host details:

GTX660 GIGABYTE|GV-N660OC-2GD running at factory clock
i5-2500K CPU running at factory clock (Sandy Bridge quad core no HT 3.3 GHz)
Windows 7 64-bit
BOINC 7.0.28
Process Lasso used to set einsteinbinary*cuda* executable proriority to above normal
no CPU affinity set

The power measurements were made using an Ensupra meter purchased from the usual eBay vendor. For this particular purpose it has advantages in resolution and display over the otherwise estimable Kill-a-Watt models.

tbret
tbret
Joined: 12 Mar 05
Posts: 2115
Credit: 4862535627
RAC: 119544

RE: I ran controlled

Quote:

I ran controlled comparison tests for BRP5 running 1x, 2x, 3x, and 4x on one GTX660 host

Thank you for doing this and reporting your results.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.