Gamma-ray pulsar binary search #1 on GPUs

LunaticM

Joined: 6 Dec 15

Posts: 3

Credit: 16953703

RAC: 0

The granted credit of my last

22 Dec 2016 23:44:57 UTC

Message 153278

(moderation:

)

The granted credit of my last result is abnormal... any suggestion why?

TASK ID	WORKUNIT ID	COMPUTER	SENT	TIME REPORTED OR DEADLINE	STATUS	RUN TIME	CPU TIME	CLAIMED CREDIT	GRANTED CREDIT	APPLICATION
LATeah0010L_604.0_0_0.0_1208565_1	266821383	12460666	22 Dec 2016, 18:02:32 UTC	22 Dec 2016, 21:19:05 UTC	Completed and validated	2,519.50	2,402.97	24.97	1,365.00	Gamma-ray pulsar binary search #1 on GPUs v1.17 (FGRPopencl-nvidia) windows_x86_64
LATeah0010L_500.0_0_0.0_1783355_0	266744089	12460666	22 Dec 2016, 3:20:56 UTC	22 Dec 2016, 18:02:31 UTC	Completed and validated	2,694.85	2,395.45	24.89	3,465.00	Gamma-ray pulsar binary search #1 on GPUs v1.17 (FGRPopencl-nvidia) windows_x86_64
LATeah0010L_500.0_0_0.0_1718095_0	266744036	12460666	22 Dec 2016, 3:20:56 UTC	22 Dec 2016, 16:57:50 UTC	Completed and validated	2,682.26	2,358.56	24.51	3,465.00	Gamma-ray pulsar binary search #1 on GPUs v1.17 (FGRPopencl-nvidia) windows_x86_64

Mad_Max

Joined: 2 Jan 10

Posts: 154

Credit: 2213591387

RAC: 390827

Yes, my observation and

22 Dec 2016 23:46:39 UTC

Message 153279 in response to message 153240

(moderation:

)

LunaticM wrote:

On my system with 3 tasks running simultaneously on GTX 1070, the run time cost increased from about 730s to 2500s. I would recommend multiply the granted credits by 3-4.

Also, even though there're 3 tasks, GPU utilization rate still won't max out. Sometimes it will drop to 50%.

Yes, my observation and statistic about same - on AMD GPUs new WU batch running 3-3.5 times longer.

I also open WU internal data and looks like old WUs include processing 350 binary points from Fermi data per each WU, and new batch include processing of 1255 binary points from Fermi data set per each WUs.

So 1255/350 = 3.58x times more scientific data processed per WU and ~3.5 times longer runtimes is a good linear scaling.

granted credits should probably set at ~3.58x level too. x5 more CR looks like moderate overvaluation.

Mad_Max

Joined: 2 Jan 10

Posts: 154

Credit: 2213591387

RAC: 390827

Christian Beer wrote:Other

23 Dec 2016 0:50:45 UTC

Message 153281 in response to message 153249

(moderation:

)

Christian Beer wrote:

Other changes I just did while I wait on the benchmarks to finish: I doubled the speedup of all FGRPB1G apps so you should see an effect on DCF within the next days.

If some of you could monitor your DCF and report any changes, that would be great.

Yes i see <flops></flops> count in rising ~2 times for <app_name>hsgamma_FGRPB1G</app_name>

In my case it is from ~22 GFLOPS to ~45 GFLOPS. DCF is correcting too but not much - because you also increase flops_estimation for new WUs batch from 105 000 GFLOPs to 525 000 GFLOPs (x5 times) while real runtimes only ~3.5 times longer.

So DCF goes up from ~0.2 to ~0.3 only as a total result while BOINC run FGRPB1G. (and DCF is still >1 while BOINC run CPU tasks from E@H)

Look like we need at least another 3x increase in speedup (estimation of FGRPB1G app speed) if you want keep DCF near ~1 and matched to other E@H subprojects. Or lower flops_estimation for FGRPB1G WUs.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7223644931

RAC: 1004783

Christian Beer wrote:If some

23 Dec 2016 2:16:58 UTC

Message 153284 in response to message 153242

(moderation:

)

Christian Beer wrote:

If some of you could monitor your DCF and report any changes, that would be great.

I have four machines running this work. I've spent some hours since you wrote this request checking the DCF now and again, and recording the min and max values seen on each machine.

In all four cases there is reason to expect the DCF value to breathe. Three of the four run purely GPU work, only of this type, but each has two GPU units, of somewhat unmatched speed. I run the GPUs at 2X, and as none of the machines has more than four (even virtual) cores, I currently run zero CPU tasks on them. The fourth machine has only a single GPU, but as it has eight virtual cores (four physical) I do run a couple of GW CPU tasks on it in addition to the two FGRPB1 tasks. That one, of course, "breathes" in DCF depending on how many GPU tasks have finished since the most recent CPU task.

So here are the answers:

Min Max  GPU1 GPU2  CPU_tasks
.65 1.08 1050 none  2
.46  .61 1060 1050  none
.57  .73 970  750Ti none
.20  .25 1070 1060  none

Before the recent change the 1070 + 1060 machine got DCF values as low as not much above 0.1!

I suspect DCF results from machines with a single GPU, not running CPU tasks, and not running work from other projects will stabilize to a tighter range, and may be more useful for the intended purpose. Still, I suspect the results may vary quite a lot with particular GPU model, and also enough to matter with host platform characteristics. More reports are needed.

Stranger7777

Joined: 17 Mar 05

Posts: 436

Credit: 429820187

RAC: 78534

All my GPUs are still waiting

23 Dec 2016 5:34:59 UTC

Message 153290

(moderation:

)

All my GPUs are still waiting for 32bit version since no other GPU work is available for them now.

Christian Beer

Joined: 9 Feb 05

Posts: 595

Credit: 188455838

RAC: 247221

I'm not sure if plan_class

23 Dec 2016 9:11:01 UTC

Message 153294

(moderation:

)

I'm not sure if plan_class changes are instantaneous or take some time. So I'll wait some more before I change anything. Archae's values look a lot like what I would have expected from doubling the speedup. Maybe Mad_Max's values will also stabilize around 0.4.

As I said I've done some benchmarks in the meantime on a system with a GTX 750 Ti (and an idle CPU). The BRP4G workunits took almost exactly 1h to finish. Using the p_fpops value of this system and the formulas in the scheduler I calculated a theoretical speedup of 20 for the BRP4G app while the real speedup was set to 15. Since this was running for quite some time and worked well I'm trying to emulate this ratio for FGRPB1 too although this new search is using much more CPU than BRP4G which is not considered in the formulas used to calculate estimated runtimes. The theoretical speedup of FGRPB1G is 29 (runtime on reference system was 1h 18 min) while it currently is set to 14. Considering the same ratio for theoretical to real value as in BRP4G my next step is to set speedup to 20 and see what that does to your estimation and DCF calculation. But that will not happen today.

To also clear up the credit issue. The values we used in the beginning (~700 per task) where just some rough estimates and rather arbitrary. When we increased the science payload by 5 we also increased the credit by 5 (to ~3500) without checking runtimes. A check in the validator than prevented this new value from being used and Credit was clamped to 700 per task. Soon after I fixed that I finished the benchmark and calculated that the new credit per task value, based on BRP4G (1000 Credits for 1h GPU time), should be much lower. Since the amount of credit is added when a workunit is created there are now three different credit values in the system. I fixed that by clamping the value to 1365 in the validator just now. So in terms of credit we should be back at the same level we had with the BRP4G search. Of course you need to give the system some time until your RAC will show that and stabilizes again.

The issue with wrong estimated time and DCF is still under investigation.

AgentB

Joined: 17 Mar 12

Posts: 915

Credit: 513211304

RAC: 0

Thanks Christian, will the

23 Dec 2016 10:42:21 UTC

Message 153295 in response to message 153294

(moderation:

)

Thanks Christian, will the FGRPSSE (CPU tasks) values remain the same?

Christian Beer

Joined: 9 Feb 05

Posts: 595

Credit: 188455838

RAC: 247221

AgentB wrote:Thanks

23 Dec 2016 11:08:25 UTC

Message 153298 in response to message 153295

(moderation:

)

AgentB wrote:

Thanks Christian, will the FGRPSSE (CPU tasks) values remain the same?

We didn't touch the CPU search so there is no need to change anything there.

rbpeake

Joined: 18 Jan 05

Posts: 266

Credit: 1132297797

RAC: 757645

Just curious, what is the

23 Dec 2016 12:43:14 UTC

Message 153302 in response to message 153298

(moderation:

)

Just curious, what is the difference between the CPU and the GPU searches, scientifically? Thanks!

Filipe

Joined: 10 Mar 05

Posts: 186

Credit: 406474044

RAC: 358038

Stranger7777 wrote:All my

23 Dec 2016 12:44:18 UTC

Message 153303 in response to message 153290

(moderation:

)

Stranger7777 wrote:

All my GPUs are still waiting for 32bit version since no other GPU work is available for them now.

Gamma-ray pulsar binary search #1 on GPUs

Forums › Technical News

Comment viewing options

Forums › Technical News