Gamma-ray pulsar binary search #1 on GPUs

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

Bernd, I'm assuming if we

15 Feb 2017 16:05:08 UTC

Message 155456

(moderation:

)

Bernd, I'm assuming if we don't wish to decrease the CPU usage we just leave out any commandlines? I'm comfortable with a full core for each work unit and have the headroom for it. Use of commandline would slow down the processing time for each one correct?

Zalster

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

Bernd Machenschalk wrote:I'm

15 Feb 2017 21:21:30 UTC

Message 155463 in response to message 155438

(moderation:

)

Bernd Machenschalk wrote:

I'm publishing 1.20 for Windows and Linux (FGRPopencl-Beta-nvidia). This features some automatic measurement and adjustment of the optimal sleep time. Just add "<cmdline> --sleepTimeFFT -20 </cmdline>" to your app_config.xml.

I've done a very short test with this command line and the CPU utilization dropped from a full thread (shown as ~15% in windows task manager on my 4 core, 8 thread i7) down to 1-2%. But it also had the effect that the GPU utilization dropped from always being above 97% when running x2 down to mostly showing 0% with spikes up to ~50% every other second.
With these performance numbers I predict (based on very rough eyeballing) that tasks will take multiple hours to complete compared to ~30 min with a full CPU thread as support.

My conclusion as of now is to continue without the command line in place and let the GPU tasks have a full CPU thread as support.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253631152

RAC: 35545

Zalster wrote:Bernd, I'm

16 Feb 2017 7:12:00 UTC

Message 155475 in response to message 155456

(moderation:

)

Zalster wrote:

Bernd, I'm assuming if we don't wish to decrease the CPU usage we just leave out any commandlines? I'm comfortable with a full core for each work unit and have the headroom for it. Use of commandline would slow down the processing time for each one correct?

Yes, the computation code is identical to 1.18, and if you don't pass it any additional command-line arguments, it will behave exactly the same.

In the optimal case computation should not be slowed down by putting the CPU to sleep while the GPU works, but finding the optimal sleep times for a particular setup (GPU, CPU, HT, thread priority, parallel tasks) can be tedious. Thus the feature of "auto-tuning" (negative value to --sleepTimeFFT) was introduced, in the hope that it would be helpful. It did help on the one particular system that I tested it with. If it doesn't on yours, well, sorry for the confusion.

Thanks for testing anyway!

juan BFP

Joined: 18 Nov 11

Posts: 839

Credit: 421443712

RAC: 0

I know each host are diferent

16 Feb 2017 7:38:25 UTC

Message 155476 in response to message 155475

(moderation:

)

I know each host are diferent but in my particular host... https://einsteinathome.org/host/12316949

I try and noticed something interesting. Aparently the 1.20 builds are slower than the 1.18 by a small diference, with or without the sleepTimeFFT. With the 1.18 the crunching times for a WU (2@time) was 1150-1200 for the 1.20 the rise to 1170-1230.

My question are simple, why if you pass the parameter or not the times are basicaly the same? and why without parameter the times remain a little higher than in the 1.18 builds?

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253631152

RAC: 35545

I'm promoting 1.20 out of

16 Feb 2017 7:50:20 UTC

Message 155480

(moderation:

)

I'm promoting 1.20 out of "Beta" status to avoid a work shortage because of "Beta" restrictions.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253631152

RAC: 35545

FWIW I had better run times

16 Feb 2017 8:17:05 UTC

Message 155482

(moderation:

)

FWIW I had better run times when I used "--sleepTimeFFT -1000" rather than -20 (the numerical values is the time in microseconds that gets reserved for the measuring itself).

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253631152

RAC: 35545

With the additional

16 Feb 2017 8:31:00 UTC

Message 155484 in response to message 155476

(moderation:

)

With the 1.18 the crunching times for a WU (2@time) was 1150-1200 for the 1.20 the rise to 1170-1230.

My question are simple, why if you pass the parameter or not the times are basicaly the same? and why without parameter the times remain a little higher than in the 1.18 builds?

With the additional parameters not set, there is one little conditional (if ...) more to process by the CPU after each kernel launch, i.e. while the kernel is running on the GPU. This is the only difference between 1.18 and 1.20, and should only matter on a very slow CPU (and fast GPU).

As far as I can see the runtime difference is ~2%. We try to keep our WUs of equal size (at least the ones that get the same credit), but our prediction of the run time for a specific workunit isn't perfect. We usually tolerate a variation of up to 5-10%. Maybe you were just unlucky picking up tasks. How many tasks are these numbers based on?

The parameter puts the CPU to sleep while the GPU is working (on the FFT). If that is tuned correctly, this shouldn't affect the overall run time at all. If this is too large, the CPU sleeps too long to take back over after the GPU is done, and the overall run time increases. If the parameter is too small and the CPU wakes up too early, you see little or no effect on the CPU utilization.

Mumak

Joined: 26 Feb 13

Posts: 335

Credit: 3634003636

RAC: 1567216

In the recent past I have

16 Feb 2017 8:57:02 UTC

Message 155485

(moderation:

)

In the recent past I have noticed that these tasks take slightly different times to finish when using the same app.
When I look at results of my Fury machine, initially a task took 520 s to finish, later for a few days (weeks?) it was 420-440 s, then since Feb-14 it's 520 s again. All is v1.18.

So the difference in run time observed might be due to such different task favors (different search params or work sets?) rather than the new application.

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1644551369

RAC: 620068

I was very pleased with the

17 Feb 2017 16:30:38 UTC

Message 155534

(moderation:

)

I was very pleased with the improvement in throughput that .18 gave my GTX660 running 2 at a time . I ran the four .19 aps I was sent and saw no change and since then .20 apps. Throughput is the same as .18 but the screen lag is greatly reduced. As an Nvidea user I eagerly await the Cuda app.

Harri Liljeroos

Joined: 10 Dec 05

Posts: 4630

Credit: 3369341263

RAC: 1972579

What is the correct way to

17 Feb 2017 23:01:13 UTC

Message 155546

(moderation:

)

What is the correct way to remove the cmdline from an application? If I remove it from the app_config.xml and restart Boinc, the cmdline is not removed from client_state.xml but the last version of it still remains there and it will be used by the application when Boinc is restarted. I am using Win 7 x64 Boinc 7.6.22. I tried also going from <app_version></app_version> to <app></app> tags which do not have the <cmdline></cmdline> available but that did not remove it either.

I had to edit the client_state.xml manually to remove the <cmdline></cmdline> and this is always risky.

Could I have typed <cmdline> </cmdline> with just an empty " " to remove it?

Gamma-ray pulsar binary search #1 on GPUs

Forums › Technical News

Comment viewing options

Forums › Technical News