All-Sky Gravitational Wave Search on O3 data (O3ASHF1)

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253327228

RAC: 38617

This is not a matter of the

7 Nov 2023 18:44:33 UTC

Message 218955 in response to message 218947

(moderation:

)

This is not a matter of the application, the application (binaries) are identical. The distinction is in the workunits, notably in the command-lines. The reason why this distinction is implemented via different plan classes is that in BOINC you can specify the (CPU) memory usage per workunit, while the GPU VRAM usage can only be specified in the plan class. Thus for different VRAM requirements you need to have different plan classes.

And no, there are no plans to issue "old" and "new" workunits in parallel. Bookkeeping (e.g. about completed frequency ranges) would be a nightmare.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4149

Credit: 49610471849

RAC: 37218038

ah, ok. unfortunate, but I

7 Nov 2023 19:26:46 UTC

Message 218958

(moderation:

)

ah, ok. unfortunate, but I understand the constraints better. I thought it was just a difference in the app itself.

_________________________________________________________________________

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7375321687

RAC: 2170758

Bernd Machenschalk wrote:We

9 Nov 2023 2:29:06 UTC

Message 219026 in response to message 218562

(moderation:

)

Bernd Machenschalk wrote:

We are planning a change to the current run. We will trade in a bit of runtime for memory. The workunits that we plan to produce in the future will run a bit longer (~10%).

On the three machines in my flotilla, the productivity degradation appears to be considerably more than 10%.

Today I conducted what I believe to be a well-controlled comparison on one machine

During the trial the machine ran more than ten consecutive WU's of a given flavor sequentially, with pretty small variation in reported elapsed time. In each case I took the trouble to arrange the start-stop times of the three WUs in progress to be pretty evenly spaced (this actually matters a lot for recent GW GPU work, as the CPU vs. GPU usage varies greatly during the run time).

Running at 3X multiplicity, the ...ati WU (previous style) elapsed time averaged about 25 minutes, while the ...ati-2 WU (new style) elapsed time averaged about 34 minutes.

In other words, on this system as currently operated, the old style was about 36% more productive than the new style.

As I understand the major purpose of the change was to increase total task output of the user base on this particular work, the concern to be checked is whether the hoped-for improvement coming from addition of new machines not previously able to run the work might be more than wiped out by the productivity loss on the machines that were already running the work (which may be more than 36%, as some users may withdraw their machines in discouragement).

I'm not proposing that my machine is typical--in fact I could enumerate more than one way that it is not, but just suggesting that the proposition needs to be checked.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253327228

RAC: 38617

I can currently see that on

9 Nov 2023 7:26:00 UTC

Message 219028

(moderation:

)

I can currently see that on average the overall runtime of a "new" task compared to an "old" task on the same host is not longer by more than 10%. This is averaged over 1102 Windows and 116 Linux hosts, both with NVidia GPUs. based on >100k tasks they reported.

An issue with running multiple instances of this app (in general, not only on the same GPU) that are started at the same time is that they take a significant time to read the input, depending on the I/O system, as there is only one. The way the current tasks are processed this now happens two times, once at the beginning and once in the middle of a task. Given the relatively short overall run time (<1h) a few minutes during initialization could make a big difference here. Looking at stderr, does the App spend significant time between "Loading SFTs" and "Search FstatMethod used"?

Ben Scott

Joined: 30 Mar 20

Posts: 54

Credit: 1833923930

RAC: 2841663

I have to concur, with both

9 Nov 2023 8:58:43 UTC

Message 219031 in response to message 219026

(moderation:

)

I have to concur, with both my machines while running 2x and 3x for best throughput, the times are much more than 10% longer. My results are similar to what ARCHEA86 is reporting.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253327228

RAC: 38617

Actually, the issue reported

9 Nov 2023 12:53:00 UTC

Message 219039

(moderation:

)

Actually, the issue reported here might be responsible for a significant slowdown on multi-GPU systems. A fix (app version 1.07) is being released.

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7375321687

RAC: 2170758

Bernd Machenschalk wrote:I

9 Nov 2023 15:41:39 UTC

Message 219044 in response to message 219028

(moderation:

)

Bernd Machenschalk wrote:

I can currently see that on average the overall runtime of a "new" task compared to an "old" task on the same host is not longer by more than 10%.

Great--this means the trouble I am experiencing will not dominate the overall productivity result.

Quote:

Looking at stderr, does the App spend significant time between "Loading SFTs" and "Search FstatMethod used"?

Looking at this stderr from the machine on which I did my careful comparison I see these milestones in the stderr for a WU that took 2043 elapsed seconds to run (running at 3X with evenly-spaced start timing).

17:25:41.2398 (12280) [normal]: Start of BOINC application
17:25:41.9587 (12280) [normal]: Loading SFTs
17:26:30.2127 (12280) [normal]: Search FstatMethod used
17:26:55.9027 (12280) [normal]: CG:18271827 FG:250000
(That is the last time-stamped entry before the first pass "dot progress" begins and continues for many lines)

17:39:17.1455 (12280) [normal]: Finished main analysis.
17:43:15.7492 (12280) [normal]: Finished recalculating toplist
(that appears to end the first pass, after which the second pass milestones I choose to show)

17:43:16.0148 (12280) [normal]: Parsed user input successfully
17:43:16.5930 (12280) [normal]: Loading SFTs matching
17:43:44.7350 (12280) [normal]: Search FstatMethod used
17:44:04.0953 (12280) [normal]: Finished reading input data
17:44:10.2517 (12280) [normal]: CG:18271827 FG:250000 f1dotmin_fg:-2.773529411765e-009
(then the "dot progress for the second pass begins)

17:56:31.7073 (12280) [normal]: Finished main analysis
18:00:41.0919 (12280) [normal]: Finished recalculating toplist
18:00:41 (12280): called boinc_finish

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 302

Credit: 11345325535

RAC: 12414931

Bernd Machenschalk

9 Nov 2023 19:38:52 UTC

Message 219053 in response to message 219039

(moderation:

)

Bernd Machenschalk wrote:

Actually, the issue reported here might be responsible for a significant slowdown on multi-GPU systems. A fix (app version 1.07) is being released.

Feeling this one! It is interesting to watch a RTX A6000 try to work through 8 of these work units at the same time while the other GPU is idle. Trying to drain these tasks but it looks like it is going to take some time.

Ben Scott

Joined: 30 Mar 20

Posts: 54

Credit: 1833923930

RAC: 2841663

On a related note, the run

9 Nov 2023 21:03:21 UTC

Message 219055

(moderation:

)

On a related note, the run times are different but not nearly as much as the estimated compute size the work units come with. The old ones are 144,000 GFLOPS while the new ones are 720,000 GFLOPS. This causes some confusion with the estimated run time when both are loaded at the same time.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4149

Credit: 49610471849

RAC: 37218038

the GFlops change happened a

10 Nov 2023 1:02:21 UTC

Message 219060

(moderation:

)

the GFlops change happened a while ago with the change to 5000cr reward from 1000cr, scaling the reward is achieved by increasing the set computation size.

_________________________________________________________________________

All-Sky Gravitational Wave Search on O3 data (O3ASHF1)

Forums › Technical News

Comment viewing options

Forums › Technical News