Observations on FGRPB1 1.15 for Windows

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7397761687

RAC: 1984191

In reporting times here, I

14 Dec 2016 23:36:43 UTC

Message 152814 in response to message 152810

(moderation:

)

In reporting times here, I suggest mentioning the host CPU, not just the GPU. I currently believe that, although some of the work is sent to the GPU, that the GPU finishes that work so very fast that total elapsed time is dominated by the CPU contribution. As it happens I have three hosts with moderately dissimilar GPUs. Which of the two GPUs happens to run a particular one of these tasks has little, if any, systematic influence on the task elapsed time.

My fastest machine running this work is now taking about 2 hours 36 minutes elapsed time. That has a i5-4690K CPU running at stock clocks. The GPUs on that host are a GTX 1070 and a 6 GB 1060. I don't yet have stabalized values for the other machines.

Not only is the relationship between reported completion and actual completion non-linear, but it also differs drastically from machine to machine within my set. While some of the machines creep slowly up the last few percent toward 100, and either reach there or get very close, the fast machine actually finishes just after reporting slightly under 90% completion. While this is annoying, I'm am mentioning it as a warning to other users, not as an improvement request to the developer(s). I think development effort would at the moment far better be spent on moving more of the calculation from CPU to GPU than on perfecting completion reporting.

Jesse Viviano

Joined: 8 Jun 05

Posts: 33

Credit: 133045917

RAC: 0

I now see spikes even though

14 Dec 2016 23:40:33 UTC

Message 152815 in response to message 152812

(moderation:

)

I now see spikes even though I closed programs that would cause activity. You are right that the GPU is being used, but it is not effectively being used. I thought the spikes were due to my activity.

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1647003361

RAC: 679310

This does not bode well. I

15 Dec 2016 1:19:28 UTC

Message 152819 in response to message 152814

(moderation:

)

This does not bode well. I look forward to getting onto them after I run out of BRPs.

Mad_Max

Joined: 2 Jan 10

Posts: 165

Credit: 2267805441

RAC: 679641

Strange i get heavy run-time

15 Dec 2016 2:43:02 UTC

Message 152824

(moderation:

)

Strange i get heavy run-time variation between my hosts despite the fact that they have same GPUs (Radeon 7850/7870) and similar processors (two 2 AMD FX8320 and AMD Phenom II X6)

And also very nonlinear progress of WUs - longer they running the slower they become:
~50% after 1 hour of work
~75% after 2 hours
~90% after 3 hours
~95% after 4 hours
~98% after 5 hours
now @ 6 hours and not completed yet

Lol, it reminds me "paradox" of Achilles and the tortoise! (Achilles can never overtake the tortoise)

And app NOT freeze or stop working. CPU load about same all time: jumping in 30-100% range of 1 CPU core per each task. And GPU load at same level too: 20-40% for one task and 40-70% for 2 concurrent tasks on one GPU
Restart does nothing - WU successfully resumes from checkpoint but continue to work very slowly.

Inadequately inefficient version!
My hosts can do such WUs in 7-8 hour on 1 CPU core only(without using GPU at all) by running Gamma-ray pulsar binary search #1 on GPUs v1.05 (FGRPSSE) app instead.
So almost no any speed up from GPU use - only pure waste of resources: speed about same level, while electricity consumption(CPU+GPU vs CPU only) is approximately 3 times higher.

juan BFP

Joined: 18 Nov 11

Posts: 839

Credit: 421443712

RAC: 0

This new app is insane! Very

15 Dec 2016 3:06:29 UTC

Message 152826

(moderation:

)

This new app is insane!

Very low GPU usage (3 WU at a time <40%) and very large time to compleate > 4 hrs.

Looks like it not realy uses the GPU.

Any clues how to fix that?

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1647003361

RAC: 679310

Since I'm about out of BRPs

15 Dec 2016 3:13:49 UTC

Message 152827

(moderation:

)

Since I'm about out of BRPs and going to switch over to these I have a few questions.

On my GTX660 how many instances of this new app do I want to run? I have been very happy running 3 on BRPs.

Do I want to free up a core to geed the GPU, currently I don't.

All 3 of my boxes are Windoz.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119719947007

RAC: 25549373

archae86 wrote:I think Gary

15 Dec 2016 3:34:37 UTC

Message 152828 in response to message 152799

(moderation:

)

archae86 wrote:

I think Gary Roberts is well-known to advocate lowering your request work queue settings with new work types or other major configuration changes. I, personally, have lowered my requested amount from a bit over two days to about a half day. Initial indications on my machines are that the first estimate for completion time is well under the truth, so without this precaution I'd have considerably over-fetched.

Firstly, thank you very much for starting this thread. It's so helpful to have a separate place for people to report experiences and get tips on all sorts of things without polluting the real news content of the technical news announcements. Hopefully, people will continue the discussion here rather than there :-).

And yes, I know from bitter experience that it's much safer to ask for just a few tasks of a new run, for a number of reasons. It's much easier to safely get more once you know how the first few are going.

Secondly, I don't run any Windows machines so I'll be interested in following your reports. Hopefully, the Windows version will eventually become as efficient as the Linux version but in the meantime, for those who want to experiment with running multiple concurrent GPU tasks without consuming all the available CPU cores, I would suggest using app_config.xml to tweak the requirement for 1 CPU core per GPU task to something less. That way you can immediately override the default settings provided by the GPU utilization factor mechanism by making changes to app_config.xml and using the 're-read config files' option in BOINC Manager to immediately implement those changes.

As an example, if you had at least a 2GB card (based on Linux experience - may be different for Windows), you could try running 2 GPU tasks supported by 1 CPU core (instead of the default 2) by using the following app_config.xml file, placed in the Einstein project directory. Note that I'm posting this from a borrowed computer in a hospital room so it's from (possibly faulty) memory. Check the above link for documentation on app_config.xml

<app_config>
    <app>
        <name>hsgamma_FGRPB1G</name>
        <gpu_versions>
            <gpu_usage>0.5</gpu_usage>
            <cpu_usage>0.5</cpu_usage>
       </gpu_versions>
    </app>
</app_config>

If 0.5 CPUs per GPU task is not enough CPU support, you could try (with sufficient GPU memory) running 3 GPU tasks supported by 2 CPU cores. That would effectively reserve 0.67 CPUs per GPU task. To do this, just set gpu_usage to 0.33 and cpu_usage to 0.67 in the above example, followed by a 're-read config files' in BOINC Manager. Please realise that I don't have access to a Windows machine to try any of this so anyone following this will need to experiment to find the optimum conditions. It may well be that one core per GPU task really is the best option anyway.

With only some of the calculation being done on the GPU, more tasks running concurrently may make better use of the GPU but at the expense of interfering with the CPU support for each of those tasks. Hopefully the current inefficiency of the Windows app is just a temporary problem.

Cheers,
Gary.

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7397761687

RAC: 1984191

Betreger wrote:On my GTX660

15 Dec 2016 4:10:19 UTC

Message 152829 in response to message 152827

(moderation:

)

Betreger wrote:

On my GTX660 how many instances of this new app do I want to run? I have been very happy running 3 on BRPs.

Do I want to free up a core to geed the GPU, currently I don't.

For the duration (I hope short) of this specific 1.15 application, I think we Windows sufferers should switch our mindset.

Don't think of the CPU task as support for the GPU. Think of the GPU as support for a CPU task.

In practice this means forgetting about "reserving a core" (which is a misnomer anyway). You'll get less done, not more, if you leave any CPU cycles unused with this application (not true for lots of other applications). The GPU can keep up with the CPU just about well enough for the difference between, e.g. a 1060 and a 1070, or between a 750Ti and a 970 not to be noticed.

To simplify things I'll assume you want to run only 1.15 "GPU" tasks. Say you have 4 total cores and two GPU cards (this is real for me on my fastest machine). Then if you allow in Compute preferences use at most 100% of the CPU time and set the Einstein project preference for the machine's location/venue for GPU Utilization of FGRP aps to 0.5, you'll find that once boincmgr on your machine catches on to your desire it will start four total 1.15 tasks, two assigned to each GPU, and each consuming 100% of a core (or what is left after any overhead on your machine).

In this specific case, if you have pure CPU tasks in your queue, they will go to a "waiting to run" state until BOINC thinks you are in schedule trouble, at which point it will suspend some 1.15 "GPU" work and run one or more CPU tasks in "high priority" mode.

So if you choose to run this way you'll want to turn off the setting at Project Preferences|your location|Resource Settings|Use CPU|Request CPU-only tasks from this project.

On the other hand if you like to run a mix of CPU and "GPU" work just set a low enough multiplicity so that BOINC does not start so many "CPU" tasks as to consume all allowed cores.

Gary's advice may become useful if some future version of the application is not so nearly a 100% CPU user as to make it useful to trick the scheduler into starting more tasks. For the current version I'm pretty sure doing so will prove at least slightly counter-productive, though I'd be happy for someone to post actual experimental results.

RaymondFO*

Joined: 22 Jan 13

Posts: 4

Credit: 1874955933

RAC: 644441

Windows 8.1 running a NVIDIA

15 Dec 2016 4:49:45 UTC

Message 152830

(moderation:

)

Windows 8.1 running a NVIDIA 980ti I now over 3.5 hours. Not worth it.

cliff

Joined: 15 Feb 12

Posts: 176

Credit: 283452444

RAC: 0

Hi Raymondo,totally agree,

15 Dec 2016 13:58:28 UTC

Message 152848 in response to message 152830

(moderation:

)

Hi Raymondo,

totally agree, its simply not worth it! A GPU task should use the GPU, not try to swamp the CPU.

The recently stopped for lack of work BRP4G setup used 0.02 cpu per WU, which was fine here, would also have been ok at 0.1 but running 1 CPU + 1 GPU with the GPU heavily underused leading to excessive run times is a waste of rescources.

As is the servers habit of providing more work than can be done before the time limit expires, I had to abort several WU that I couldn't possibly complete before the deadline.

The server estimate of 30 odd mins to complete a task that can take up to over 4 hours is unworkable.

[edit] Just checked WU did validate but the timings were daft

13,838.86

13,822.67

The above are total run time and cpu time used. Which quite clearly demonstates the lack of GPU usage.

So I've set E@H to NNT until such time as either there are more BRP4G available or the project can provide a more GPU dedicated app.

Cliff,

Been there, Done that, Still no damm T Shirt.

Observations on FGRPB1 1.15 for Windows

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner