Binary Radio Pulsar Search (Perseus Arm Survey) "BRP5"

Neil Newell

Joined: 20 Nov 12

Posts: 176

Credit: 169699457

RAC: 0

I'm not terribly bothered

29 May 2013 11:32:09 UTC

Message 115645 in response to message 115644

(moderation:

)

I'm not terribly bothered about the actual credit, but it's always interesting to dig into things and try and understand what is going on.

Since BRP5 was launched around 24-May, I've calculated the following values for 10 of my hosts:-
BRP4a: Average run time for BRP4 tasks prior to BRP5 introduction
BRP4b: Average run time for BRP4 tasks since BRP5 introduction
BRP5: Average run time for BRP5

The "Relative change" shows the average run time relative to BRP4a; my hosts appear to range from 9.09 to 12.73 (but note the outlyers are slow hosts which haven't done a lot of BRP5 tasks yet).

The calculations were done by simply adding up the time taken for each of the 3 categories, and dividing by the number of tasks completed in the category (to give the average time per task). Note that the figures will be more accurate on the faster hosts (since they have done more tasks).

These hosts spend all their time on E@H, except for hosts 6570151 and 6123309 which spend 25% of their time on A@H.

If anyone wants to send me their 'job_log_einstein.phys.uwm.edu.txt', I should be able to run the same calculations pretty easily - it would be particular interesting to do this for the hosts that are in the 7x-8x BRP4 time range.

Beyond

Joined: 28 Feb 05

Posts: 121

Credit: 2350346212

RAC: 5481317

RE: RE: [Can anyone tell

29 May 2013 13:14:46 UTC

Message 115646 in response to message 115546

(moderation:

)

Quote:

Quote:
[Can anyone tell me what other projects (e.g. SETI) grant on average per GPU hour?

Ho Bernd. Don't know about seti, creditnew is bizarre and erratic to say the least. Probably one of the reasons for the mass exit from the project.

GPUGrid grants about 8250 credits/hour on a relatively slow GTX 650 Ti. Double that on a GTX 670. POEM, MilkyWay, Collatz, Donate and DistrRTgen are all higher (some much higher) for both ATI/AMD and NVidia GPUs. The WCG GPU app is gone but it granted very low credits (but higher than Einstein ;-).

The one I didn't list was PrimeGrid since I haven't run it for a while. So ran a PG WU yesterday on a very slow HD 7770 GPU to check:

http://www.primegrid.com/result.php?resultid=455017827

3371 credits for 5,899.13 seconds run time, so 0.571 credits/second
The same GPU averaged 3870 seconds for 2x Einstein WUs for 1000 credits, so 0.258 credits/second.

The PrimeGrid credits/second is still much lower than GPUGrid, POEM, Donate, Collatz, MW, DistrRTgen or Moo! I think that covers all the current GPU projects except seti, which basically HAS to run creditnew because of, well you know why...

Personally I think all GPU projects should as much as possible offer the same level of credits/hour. Unfortunately they don't. Einstein is the project that diverges most from the norm (except for perhaps seti which doesn't have a choice).

Jord

Joined: 26 Jan 05

Posts: 2952

Credit: 5893653

RAC: 92

RE: it doesn't reflect the

29 May 2013 13:20:39 UTC

Message 115647 in response to message 115641

(moderation:

)

Quote:

it doesn't reflect the fact that there is a big variance of runtimes of BRP4 and BRP5 WUs.

The variance that I found is when my GPU is doing something else on top of OpenCL calculations. In my case, BRP4s run at an average of ~2,300 seconds when I leave my system alone. Do I go play Minecraft, for which I haven't yet set up an exluisve_gpu_app exclusion, then the times will be 3,400-3,600 seconds for those same tasks.

Equally so for BRP5s, which run at around 17,500 seconds when left alone. Go play Minecraft and it's 21,500 seconds.

Now, although Minecraft is Java based, it uses OpenGL for the drawing of the pretty pictures on the screen. And guess what? OpenCL and OpenGL use the same processing parts of the GPU, thus playing Minecraft while doing OpenCL calculations will slow down these calculations.

So check what you're doing with your system as well, and see if that uses OpenGL or DirectX3D.

Chris

Joined: 9 Apr 12

Posts: 61

Credit: 45056670

RAC: 0

RE: If anyone wants to send

29 May 2013 13:25:01 UTC

Message 115648 in response to message 115645

(moderation:

)

Quote:

If anyone wants to send me their 'job_log_einstein.phys.uwm.edu.txt', I should be able to run the same calculations pretty easily - it would be particular interesting to do this for the hosts that are in the 7x-8x BRP4 time range.

I didn't know the job log existed. Can someone explain the columns?

Neil Newell

Joined: 20 Nov 12

Posts: 176

Credit: 169699457

RAC: 0

As far as I know:- Job log

29 May 2013 13:44:19 UTC

Message 115649 in response to message 115648

(moderation:

)

As far as I know:-

Job log format
==============
nnnnnnnnn = Completion time
ue =  # estimated time?
ct = CPU time # Shown on task report as 'CPU Time'
fe = # fixed for job - estimate of FP ops?
nm = Task name
et = Elapsed time # time shown on Manager 'ready to report'/as 'Run time' on website

RE: So check what you're

29 May 2013 13:46:16 UTC

Message 115650 in response to message 115647

(moderation:

)

Quote:

So check what you're doing with your system as well, and see if that uses OpenGL or DirectX3D.

Sleeping. Runs overnight. No other applications running. BOINC reduced to tray. No widgets. No screensaver.
I'm not investigating any further.
All activities and WUs will be cancelled this evening.
My appologies for starting to post here.

Time will tell wether I crunch for Einstein again or not

Over & out

Jeroen

Joined: 25 Nov 05

Posts: 379

Credit: 740030628

RAC: 0

I ran a few tasks on a GTX

29 May 2013 16:51:22 UTC

Message 115652

(moderation:

)

I ran a few tasks on a GTX 680 last night. The card is installed in a PCI-E 2.0 x16 slot and the OS is Linux.

BRP4: 788 - 799 seconds
BRP5: 7348, 7377 seconds

This is running 1-task at a time. The ratio is 9.2:1 - 9.3:1 based on the minimum and maximum runtimes of the BRP4 and 5 tasks. I am noticing on my tri AMD system that PCI-E bandwidth may have a more significant impact with BRP5 than BRP4. The tasks running on the single card installed in an x8 3.0 slot appear to be running 10-15% slower than the tasks running on the cards installed in x16 3.0 slots. I need to do additional monitoring to confirm this is the case or not. The majority of tasks run in the 12.8 - 14K range while other tasks run in the 15-16K range. From what I can tell the longer running tasks are associated with the card in the x8 slot.

Neil Newell

Joined: 20 Nov 12

Posts: 176

Credit: 169699457

RAC: 0

RE: I ran a few tasks on a

29 May 2013 17:13:54 UTC

Message 115653 in response to message 115652

(moderation:

)

Quote:

I ran a few tasks on a GTX 680 last night. The card is installed in a PCI-E 2.0 x16 slot and the OS is Linux.
...
I am noticing on my tri AMD system that PCI-E bandwidth may have a more significant impact with BRP5 than BRP4. The tasks running on the single card installed in an x8 3.0 slot appear to be running 10-15% slower than the tasks running on the cards installed in x16 3.0 slots.

I've been testing 1x and 2x on a linux PCIe 2.0 system with GTX580.

1x: ~12,000s
2x: ~19,000s (9,500s/task)

As this is quite a difference, I'm wondering if even higher utilisation would be better (compared to BRP4, where NVIDIA at least don't seem to improve much beyond 2x).

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7221964931

RAC: 955454

Neil Newell wrote: I'm

29 May 2013 17:51:12 UTC

Message 115654 in response to message 115653

(moderation:

)

Neil Newell wrote:

I'm wondering if even higher utilisation would be better (compared to BRP4, where NVIDIA at least don't seem to improve much beyond 2x).

I ran controlled comparison tests for BRP5 running 1x, 2x, 3x, and 4x on one GTX660 host
As both power consumption and GPU load were down at 3x from BRP4 on this system, and I am currently running only a single CPU job on this 4-core host, I thought I might get appreciable speedup of 4x over 3x, but the improvement in throughput was very small, and came at a cost of degraded system level power productivity. So I've reverted to 3x. For this system the 3x benefit over 2x is moderate, but definite, including a power efficiency improvement.

On previous experiments with BRP4, over a broader range of CPU and GPU jobs running, I noticed that for my system the shape of the GPU "copies" performance curve changes with number of CPU jobs. Specifically the optimum number of GPU jobs is lower with higher CPU job counts, and the penalty for being over optimum is much more severe there also.

Here are my observations:
[pre]CPU BRP5 watts hours/CPU hours/BRP5 GPU_load RAC RAC/watt
1 1 156.7 3.833 3.496 72% 29030 185.2
1 2 170.7 3.946 5.417 89% 36974 216.6
1 3 174.5 3.979 7.689 93% 38973 223.3
1 4 176.3 4.033 10.227 94% 39042 221.5[/pre]

The RAC column is a long-term full-utilization estimate computed using the elapsed times observed for CPU and GPU jobs, and using a GW credit estimate of 251.2 and BRP4 of 4000.

People running more CPU jobs or weaker host systems may see the benefit drop off faster and turn negative sooner. Possibly a more capable host, in particular using PCIe 3.0 (my motherboard is 2.0) may see better benefit from larger numbers of jobs.

Host details:

GTX660 GIGABYTE|GV-N660OC-2GD running at factory clock
i5-2500K CPU running at factory clock (Sandy Bridge quad core no HT 3.3 GHz)
Windows 7 64-bit
BOINC 7.0.28
Process Lasso used to set einsteinbinary*cuda* executable proriority to above normal
no CPU affinity set

The power measurements were made using an Ensupra meter purchased from the usual eBay vendor. For this particular purpose it has advantages in resolution and display over the otherwise estimable Kill-a-Watt models.

tbret

Joined: 12 Mar 05

Posts: 2115

Credit: 4863248101

RAC: 156419

RE: I ran controlled

29 May 2013 19:56:56 UTC

Message 115655 in response to message 115654

(moderation:

)

Quote:

I ran controlled comparison tests for BRP5 running 1x, 2x, 3x, and 4x on one GTX660 host

Thank you for doing this and reporting your results.

Binary Radio Pulsar Search (Perseus Arm Survey) "BRP5"

Forums › Technical News

Comment viewing options

Forums › Technical News