Hello,
I have bought new rig, and decided to share my thoughs (statistics) on running multiple WU on GPUs.
My rig is AMD FX-8350, 8GB RAM, with two Gigabyte R280X cards.
Currently I have gone through Gamma-ray pulsar search #3 1.11 (FGRPopencl-ati). FGRP v1.11 seems to be heavily CPU bound. CPU utilization on most phases is about 8% (66% of single core), at the end (~91%-100%) it reaches 12.5% (100% of single core). GPU utilization never exceeded 60% (with single task it was about 30-40%). Statistics (FWR - for whole rig):
1 task per R280X takes 6050s average | 1 * 2 * 86.4ks / 6.05ks = ~28.50 tasks/day FWR = 18.81k credits
2 task per R280X takes 7550s average | 2 * 2 * 86.4ks / 7.55ks = ~45.75 tasks/day FWR = 30.2k credits
3 task per R280X takes 9780s average | 3 * 2 * 86.4ks / 9.78ks = ~53.00 tasks/day FWR = 35k credits
4 task per R280X takes 11840s average | 4 * 2 * 86.4ks / 11.84ks = ~58.40 tasks/day FWR = 38.5k credits
Since FGRP v.1.11 is CPU bound there was no reason to go for more than 8 tasks total. It would be possible to go for 12 tasks total with max CPU utilization for whole process, but I doubt that there would be more improvement than 5%...
I will move to GW search next.
Copyright © 2024 Einstein@Home. All rights reserved.
Multi WU Statistics
)
I have tested Binary Radio Pulsar Search (Perseus Arm Survey) v1.39 (BRP5-opencl-ati). BRP seems to be GPU bound. GPU utilization with 2/3 tasks stayed at ~80%. CPU utilization is minimal (<10% of single core for task). BRP seems to be definitely better optimized for GPU than FGRP.
Statistics:
1 task per R280X takes 6000s avarage | 1 * 2 * 86.4ks / 6.00ks = ~28.8 tasks/day FWR = 95.90k credits
2 task per R280X takes 8900s avarage | 2 * 2 * 86.4ks / 8.90ks = ~38.8 tasks/day FWR = 129.20k credits
3 task per R280X takes 12400s avarage | 3 * 2 * 86.4ks / 12.40ks = ~41.8 tasks/day FWR = 139.20k credits
I assume that BRP Search (Arceibo GPU) will behave exactly the same, since this is the same application. So I will most likely enable Beta applications and move finally to GW Search.
Hello, It seems that
)
Hello,
It seems that running multiple BRP4G-opencl-ati or BRP5-opencl-ati tasks is not achievable on Gigabyte R9 280X, or that my GPUs are somehow to blame (or that drivers are faulty).
Gamma-ray pulsar search #3 v1.11 (FGRPopencl-ati) was perfectly stable (no validate errors or marked as invalid), even for 4 tasks per GPU.
I have tested Binary Radio Pulsar Search (Arecibo, GPU) v1.39 (BRP4G-opencl-ati), and it does behave similar to BRP5-opencl-ati.
Statistics:
1 task per R280X takes 1690s average | 1 * 2 * 86.4ks / 1.69ks = ~102.25 tasks/day FWR = 102.25k credits
2 task per R280X takes 3000s avarage | 2 * 2 * 86.4ks / 3.00ks = ~115.20 tasks/day FWR = 115.20k credits
3 task per R280X takes 3900s avarage | 3 * 2 * 86.4ks / 3.90ks = ~132.92 tasks/day FWR = 132.92k credits
However:
On 3 tasks per R280X I got about 80% validate error, or marked as invalid.
On 2 tasks per R280X I got about 60% validate error, or marked as invalid.
On 1 task per R280X I got exactly 0% validate error, and 10% marked as invalid.
While BRP5-opencl-ati was less error prone, it also experienced:
About 15% validate errors, and 15% marked as invalid on 3 tasks per R280X.
(no meaningful stats for 1/2 task(s) per R280X for BRP5-opencl-ati).
I have already ran some Gravitational Wave S6 Directed Search (CasA) v1.08 (GWopencl-ati-Beta) but only using 1 task per GPU. Results are stable (no validate errors or marked as invalid).
I too have a Gigabyte R9 280X
)
I too have a Gigabyte R9 280X GPU and I can't even run a single BRP4G WU there - all end up with validation errors.
It has been suggested, that this card might need to decrease the GPU Memory Clock. I haven't tried this yet, but soon will do...
Indeed I also concluded that
)
Indeed I also concluded that memory frequency should be decreased during few days of Astropulse run.
1250MHz seems to be ok. 1375MHz gives validate errors/computation errors.
My old Statistics for Gravitational Wave S6 Directed Search (CasA) v1.08 (GWopencl-ati-Beta)
1 task per R280X takes 900s avarage | 1 * 2 * 86.4ks / 0.90ks = ~192.00 tasks/day FWR = 75.04k credits
2 task per R280X takes 1050s avarage | 2 * 2 * 86.4ks / 1.05ks = ~329.14 tasks/day FWR = 128.64k credits
3 task per R280X takes 1200s avarage | 3 * 2 * 86.4ks / 1.20ks = ~432.00 tasks/day FWR = 166.84k credits
4 task per R280X takes 1500s avarage | 4 * 2 * 86.4ks / 1.50ks = ~460.80 tasks/day FWR = 180.10k credits
I have also repeated BRP5 runs during few last days:
Binary Radio Pulsar Search (Perseus Arm Survey) v1.39 (BRP5-opencl-ati)
(400W) (1100MHz, 1250MHz) (66-67%)
1 task per R280X takes 5820s avarage | 1 * 2 * 86.4ks / 5.82ks = ~29.69 tasks/day FWR = 98.96k credits
(427W) (1100MHz, 1250MHz) (78-79%)
2 task per R280X takes 9380s avarage | 2 * 2 * 86.4ks / 9.38ks = ~36.84 tasks/day FWR = 122.79k credits
(430W) (1100MHz, 1250MHz) (84-85%)
3 task per R280X takes 13140s avarage | 3 * 2 * 86.4ks / 13.14ks = ~39.45 tasks/day FWR = 131.47k credits
(440W) (1100MHz, 1375MHz) (84-85%)
3 task per R280X takes 13060s avarage | 3 * 2 * 86.4ks / 13.06ks = ~39.69 tasks/day FWR = 132.23k credits
3 tasks on 1375MHz Memory Frequency are 80s faster, but I do get validate errors with this frequency. 3 tasks on 1250MHz also gave me minimal amount of validate errors, that is why I decided to stick with 2 tasks on 1250MHz. Credits per day are minimally decreased, but stability should be 100%.
Sadly BRP4 is no longer available... :( So I cannot repeat those runs.
I'll be repeating Gravitational Wave run on 2/3 tasks /R280X/1250MHz and I'll write my conclusion here. Already repeated run on 4 tasks/R280X/1250MHz.