The O2-All Sky Gravitational Wave Search on GPUs - discussion thread.

solling2

Joined: 20 Nov 14

Posts: 219

Credit: 1579934531

RAC: 65274

I'd support Gary assuming in

24 Jul 2019 11:53:31 UTC

Message 172282

(moderation:

)

I'd support Gary assuming in the other thread that there may be two different problems around.

Previous versions:

Problem 1 were errors while computing after about 7 seconds, which is what I had with half of the 1.04 tasks.

Problem 2 are validate errors. Those become apparent after finishing a task. We had those with version 1.04 and 1.05.

Current version 1.06:

Problem 1 apparently doesn't occur any more. Instead, half of my 1.06 tasks run extremely slow from the first seconds onward.

DF1DX has seen Problem 2 already, unfortunately. The ones I finished are currently in the waiting status.

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7394941687

RAC: 1978637

It lives--I have

24 Jul 2019 20:20:26 UTC

Message 172296

(moderation:

)

It lives--I have validations!

I despaired of getting any useful result when not only I but others were getting 100% prompt validate errors.

However, since about the time Bernd posted his intention to adjust the validator, I have gotten no new validate errors. Further, my old ones have been scrubbed away.

I now actually have three Validated tasks. One of them ran at 1X on v1.05 in 4,374 elapsed seconds, and two of them ran on v1.05 at 2X in 5,550 and 5,556 elapsed seconds. Although all three show validated status, only the V1.05 one has been granted credit so far.

Yes, on my system under current conditions running this work at 2X gives a huge productivity boost over 1X. It also raised the reported GPU load average during the main running time from 44% to 69%. As I have plenty of CPU cores on the host system, I intend to try 3X pretty soon.

Other notes:

1. I think this application in the current form is very different in performance characteristics to the current Einstein Gamma-Ray Pulsar GPU application. One should be deeply skeptical of the validity of recent relationships on the importance or lack of it of CPU performance, PCIe grade, and so on.

2. I use Process Lasso, and found that it had given the CPU support application a tight core affinity list (just two of the six available) and a low priority. Simply allowing any available CPU to be used, and raising the priority gave a large immediate beneficial effect on GPU load, so presumably on required task elapsed time.

3. I continue to see (once the affinity list was broadened) these tasks to report consuming a little more than one second of CPU time per second of elapsed time. So multiple threads are probably active occasionally. I suspect that you may find productivity more sensitive to the environment of other work than usual.

In my personal case the host in question has an AMD RX 570 and an Intel i5-9400F CPU. The CPU has six physical cores, but a rather slow clock rate within the current generation. As this application clearly makes heavy use of the CPU, I suspect some of you using CPUs with much faster cores may get rather better elapsed times than I, especially at 1X. On the flip side, those of you taking Gary's path (which works splendidly on the current GRP application) of using slow old processors may suddenly find that a disadvantage.

I speculate that an ideal host for this specific application would have a fast CPU with several real cores, probably only one GPU, and perhaps only a moderately fast GPU. I suspect a Radeon VII would sit around waiting for CPU support a lot, though I'm reluctant to give it a try just yet.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119681430606

RAC: 25302192

solling2 wrote:... there may

24 Jul 2019 20:31:00 UTC

Message 172298 in response to message 172282

(moderation:

)

solling2 wrote:

... there may be two different problems around.

Which is why I fired off that report to Bernd. At least the 'validate error' now seems to have been sorted.

I was also intrigued about the comment of some sort of 'workaround' to do with AMD drivers. I presume that must relate to more modern hardware using amdgpu. The thought occurred to me that it might be interesting to do a comparison between the two different generations - say an RX 560 using amdgpu and an R7 370 using fglrx. They both have reasonably similar outputs for FGRPB1G so (now that the validate error is sorted) I'll set up a pair of otherwise similar hosts to run the new 1.06 GW app and see how they go in a head to head comparison.

I just have to find a pair of suitable candidates :-).

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119681430606

RAC: 25302192

archae86 wrote:... As I have

24 Jul 2019 22:17:19 UTC

Message 172300 in response to message 172296

(moderation:

)

archae86 wrote:

... As I have plenty of CPU cores on the host system, I intend to try 3X pretty soon.

Those (and probably higher concurrency) results will be very interesting indeed. Thanks for doing that.

archae86 wrote:

... On the flip side, those of you taking Gary's path (which works splendidly on the current GRP application) of using slow old processors may suddenly find that a disadvantage.

I think you will be proved quite right about that. I'm very hopeful that's the case. After all, it does get a bit tedious replacing capacitors on 10 year old motherboards as they progressively deteriorate over time :-).

I had parents who both served in the armed forces. The war was still raging in Papua New Guinea and islands to the north when I was born. In the immediate post-war years things were very tight and a 'waste not, want not' mentality was drilled into me at a very early age. So it is quite natural for me not to discard stuff that can still do the job, even if that means extra care and attention to keep it serviceable. However, there does come a time ....

The future for this project is continuous GW detection and if it takes modern/fast CPUs to do that - bring it on, I say :-).

Cheers,
Gary.

mikey

Joined: 22 Jan 05

Posts: 12941

Credit: 1884482703

RAC: 27980

Gary Roberts wrote:archae86

24 Jul 2019 22:46:26 UTC

Message 172302 in response to message 172300

(moderation:

)

Gary Roberts wrote:

archae86 wrote:
... As I have plenty of CPU cores on the host system, I intend to try 3X pretty soon.

Those (and probably higher concurrency) results will be very interesting indeed. Thanks for doing that.

archae86 wrote:
... On the flip side, those of you taking Gary's path (which works splendidly on the current GRP application) of using slow old processors may suddenly find that a disadvantage.

I think you will be proved quite right about that. I'm very hopeful that's the case. After all, it does get a bit tedious replacing capacitors on 10 year old motherboards as they progressively deteriorate over time :-).

I had parents who both served in the armed forces. The war was still raging in Papua New Guinea and islands to the north when I was born. In the immediate post-war years things were very tight and a 'waste not, want not' mentality was drilled into me at a very early age. So it is quite natural for me not to discard stuff that can still do the job, even if that means extra care and attention to keep it serviceable. However, there does come a time ....

The future for this project is continuous GW detection and if it takes modern/fast CPUs to do that - bring it on, I say :-).

I would love to see a minimum cpu specs page written similar to the gpu minimum specs page, that could help people who try to do the workunits with cpu's that aren't up to par. For instance an I5 or equivalent quad core cpu with a given ghz speed. The idea being the workunit needs to be able to be completed prior to the deadline, but being able to only complete one workunit within that deadline isn't going to be very productive for the new or most people. Personally I would prefer something like 10 workunits within the deadline but that may be a bit aggressive in my thinking.

cecht

Joined: 7 Mar 18

Posts: 1618

Credit: 3030593572

RAC: 1432036

Yes! After reading today's

25 Jul 2019 0:01:01 UTC

Message 172304

(moderation:

)

Yes! After reading today's posts, I checked, and 11 of my 1.05 tasks have validated (avg GPU time: 3750 sec; avg CPU time: 2370 sec). I'm waiting for tasks to load to run with the 1.06 app. I only have a 4-core(thread) Pentium, but am hopeful I can run 2x tasks.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7394941687

RAC: 1978637

archae86 wrote:I now actually

25 Jul 2019 4:52:02 UTC

Message 172307 in response to message 172296

(moderation:

)

archae86 wrote:

I now actually have three Validated tasks. One of them ran at 1X on v1.05 in 4,374 elapsed seconds, and two of them ran on v1.05 at 2X in 5,550 and 5,556 elapsed seconds. Although all three show validated status, only the V1.05 one has been granted credit so far.

Some hours have passed since I made this comment. In the interim, I have zero new validations, with ten pending. Two v1.06 tasks continue to show "completed and validated" but with zero credit. The v1.05 task which showed as validated a few hours ago now shows "completed, marked as invalid", but continues to show credit?!

I intend to do a trial at 3X after I sleep.

For a time this afternoon I had the multiplicity set to 2X both for Gamma-Ray Pulsar and the new GW work. I watched a mismatched pair run for a while. The GPU temperature suggested that the GPU was pretty busy (well above the temperature when running two GW tasks). The GRP task moved along ar nearly the 1X GRP rate. The GW task really crawled.

So people who allow mixed loads of these two flavors (by downloading both and having the same multiplicity setting for them) will get really wild fluctuations in task queue, as when a mixed load GW task finally completes the Task Duration Correction Factor will bang up to a very high level, triggering Panic Mode operation. Whereas overfetching will occur when a long string of near 1X speed GRP tasks complete in sequence.

If memory serves, it might be that with GRP running 2X and GW running 3X the two won't run simultaneously. There will still be wild fluctuations in Task Duration Correction Factor, but more like 10-fold or less rather than perhaps 100-fold.

As usual, short work fetch queue length would be a prudent precaution in building up experience with this new material.

cecht

Joined: 7 Mar 18

Posts: 1618

Credit: 3030593572

RAC: 1432036

I am having good results at

25 Jul 2019 20:04:37 UTC

Message 172329

(moderation:

)

I am having good results at both 2x and 3x tasks run with the 1.06 app, though no wingman validations for those completed tasks yet. Below are some average task time comparisons for running at different GPU and CPU multipliers, as set in app_config.xml. My two RX 570s were run at the same settings that I use for the binary pulsar search #1 tasks: mining BIOS, P-state mask 0,6, no power limit. This Linux host has a modest 2-core(4-thread) Pentium G5600 CPU @ 3.90GHz.

app_config GPU\CPU   task time, s   CPU time, s
     1 \ 0.9            3750           2370
     0.5 \ 0.8          2435           1463
     0.333 \ 0.5        2153           1081

So I am getting a benefit of reduced task times running 3X tasks, even though my CPU is nearly tapped out. At that setting of 0.333 GPU, 0.5 CPU, the CPU usage average is ~90% for two cores, ~83% for the other two cores. When one task gets to 99% completion, however, one core taps out at 100% usage while progress of the task hangs for about 4 minutes until it completes. I've only been running at 3X for a few hours, so have kept task completions well staggered. Not sure what would happen if (when) four or more tasks simultaneously get to the 99% stage and my CPU gets totally tapped out.

The GPU plots over time look pretty much the same for each app_config setting, with load fluctuating a lot, while average load, temp, and power rise only a few points with each additional simultaneous task added. When running tasks at 3X, average GPU load is ~60% and average GPU power is ~55 W (compared to ~100% and ~80 W when running binary pulsar tasks at 3X).

As Archae86 recommended, I'm only running the GW GPU tasks; I've unchecked the binary pulsar application in my E@H Project Preferences.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

RX 580, Xeon X5660 @ 3.78GHz,

25 Jul 2019 21:52:39 UTC

Message 172333

(moderation:

)

RX 580, Xeon X5660 @ 3.78GHz, Windows 10

3x , avg 7728 s ... 2576 per task

5x, avg 11290 s ... 2258

6x, avg 13475 s ... 2246

Those configurations did run without problems, but results from those calculations might be junk.. I see Bernd has disabled this thing for now.

cecht

Joined: 7 Mar 18

Posts: 1618

Credit: 3030593572

RAC: 1432036

Richie wrote:Those

25 Jul 2019 22:48:24 UTC

Message 172334 in response to message 172333

(moderation:

)

Richie wrote:

Those configurations did run without problems, but results from those calculations might be junk.. I see Bernd has disabled this thing for now.

Nice multiplexing Richie! The multiplex results I reported are for O2AS20-500 tasks (Continuous Gravitational Wave search O2 All-Sky v1.06 () x86_64-pc-linux-gnu; GW-opencl-ati); the Server Status page shows that workunits for that application are still being sent out, but the three work generators for the O1OD1 series of programs are disabled. I'm still waiting for validation of 45 pending O2AS20-500 tasks and am still hoping they aren't junk! :)

Ideas are not fixed, nor should they be; we live in model-dependent reality.

The O2-All Sky Gravitational Wave Search on GPUs - discussion thread.

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner