All-Sky Gravitational Wave Search on O3 data (O3ASHF1)

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6594

Credit: 339805787

RAC: 432010

Ian&Steve C. wrote: Bernd

27 Sep 2023 21:52:43 UTC

Message 217557 in response to message 217549

(moderation:

)

Ian&Steve C. wrote:

Bernd said:

Quote:
To reach a better sensitivity, we'll make the resulting list of candidates even longer. This means not only larger result files to be uploaded (well, shouldn't be much of a problem nowadays), but also a longer time taken for the "recalc" step ("recalculating toplist ststistics" is written in stderr). This step is done purely on the CPU. We are working on porting it to the GPUs, but the memory access pattern of this step is so unpredictable that we don't get much speedup from that yet (accessing "global" memory on the GPU is still terribly slow). We hope to get an improved version of the App out during the run.

so probably a combination of required sensitivity and code optimization making the final part better suited to the CPU. sounds like they will be trying to port this to the GPU at some point, and at that time i would guess the app will run much faster/efficiently.

The GPUs global memory, as the name suggests, is not associated with any particular compute unit. I had thought/heard that it was page swaps of the GPUs global memory (ie. oversubscription & cache miss) that was the slow down (possibly two orders slower). Thus those really fast compute units are I/O bound (the GPU to CPU interconnect).

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Aurum

Joined: 12 Jul 17

Posts: 77

Credit: 3473477040

RAC: 4004021

It seems to me the way the

28 Sep 2023 17:05:05 UTC

Message 217596

(moderation:

)

It seems to me the way the estimated run time for Meerkat and All-Sky are estimated are confounded somehow. Yesterday one computer finished its All-Sky WUs and it just ran Meerkats for a day with my queue set to 1/0. The TimeLeft (From BoincTasks) dropped from 7 hours to 00:25:49 hours. Then in Preferences I turned off Meerkat and turned on All-Sky and with my queue set to 1/0.5 it DLed 155 O3AS WUs with a TimeLeft of 00:06:11 hours and I already had 79 Meerkat WUs. Very strange behavior since the previous DL of All-Sky had WUs with TimeLeft of about 1.5 hours.

I have another computer that DLed Meerkats with 11 hour run times. Unless Meerkat runs alone for a day it can start with any time from 3 to 11 hours for a 15 minute task.

There also seems to be interference with FGRP5 CPU WUs. With queue set to 1/0 if the TimeLeft for the GPU WUs totals a day then I cannot DL any FGRP5 WUs even if there are no CPU WUs of any kind running.

CPU WUs and GPU WUs should be managed separately but they appear to be combined on E@H.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4152

Credit: 49795594936

RAC: 39578754

the estimated runtime comes

28 Sep 2023 17:13:09 UTC

Message 217597

(moderation:

)

the estimated runtime comes from your client.

it's because the project (as do all projects AFAIK) shares the same DCF for all task types. and almost always one task type needs a wildly different DCF than another to produce an accurate runtime estimation. so as you process one kind of task, the DCF shifts in the direction to converge on the value required for that task, which makes the other task runtime estimations more and more inaccurate.

there's not really a good solution to this other than just running only one kind of task, or a really small work cache so that the inaccurate runtimes don't impact the schedule requests as much.

_________________________________________________________________________

San-Fernando-Valley

Joined: 16 Mar 16

Posts: 555

Credit: 10800549517

RAC: 12242961

Great explanation of the

28 Sep 2023 17:59:18 UTC

Message 217600

(moderation:

)

Great explanation of the things going on CPU and GPU. Thanks to all.

It is very "educational" and interesting (for me).

I`m just wondering if this is important to disect how the credits/run times/etc are precalculated.

For the sake of understanding the way it works it's fine.

But I think it is irrelevant to the objective of crunching.

No matter how the calcs are being done (poor or great) I personally don't mind/care ...

I just want to crunch - which I am trying to do.

I'm very satified (probably no one cares about my positioning) with the way things work here.

cheers

S-F-V

Keith Myers

Joined: 11 Feb 11

Posts: 5057

Credit: 19223274887

RAC: 6080863

Most projects have ditched

28 Sep 2023 19:13:52 UTC

Message 217602 in response to message 217597

(moderation:

)

Most projects have ditched the DCF mechanism for tasks, Einstein being one of the single digit holdouts.

Most BOINC projects moved to separate APR calculation numbers for each application type.

This method has correct estimated calculation numbers for each task.

Scrooge McDuck

Joined: 2 May 07

Posts: 1126

Credit: 18827242

RAC: 11341

It might be a good idea to

28 Sep 2023 21:00:54 UTC

Message 217606

(moderation:

)

It might be a good idea to put the content of Ian's last comment, appropriately described, directly on the project's FAQ page:

https://einsteinathome.org/faq

Every einstein cruncher initially stumbles over the strangely fluctuating run time estimates when crunching different task types (the default). Estimates are far too short then far too long. By observing tasks for a long enough time, you can get closer to the cause and realize what's the only solution: keep task cache small (or confine to one task type). But probably only experienced long-time crunchers who may also have read the responsible source code of the BOINC client (or run it in full debug mode flooding the messages window) fully understand the reasons. So, I think a short FAQ entry about this 'feature' would be very useful.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253464341

RAC: 36465

We are planning a change to

26 Oct 2023 13:15:00 UTC

Message 218562

(moderation:

)

We are planning a change to the current run. We will trade in a bit of runtime for memory. The workunits that we plan to produce in the future will run a bit longer (~10%), but take significantly less memory, such that GPUs with 4GB can run these. A few modifications need to be made to the app to achieve this. The app version 1.06 that we just published already incorporates the necessary changes. With the current workunits, however, it should behave identical to the previous 1.04 version.

In (our part of) Germany there is a long weekeend ahead, and I'm reluctant to deploy such a change of the workunits before a longer period of reduced attention from our side. So this change of workunits will likely happen mid next week.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4152

Credit: 49795594936

RAC: 39578754

that's great to hear

26 Oct 2023 13:21:39 UTC

Message 218564

(moderation:

)

that's great to hear Bernd.

was the change mainly to widen the pool of available devices for an overall increase in crunching power?

what is the estimate for the new minimum amount of VRAM needed for the new app/tasks?

_________________________________________________________________________

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 303

Credit: 11402147784

RAC: 12825462

That is good news, overall.

26 Oct 2023 14:07:29 UTC

Message 218566

(moderation:

)

That is good news, overall. Originally, you all were working on the recalculation step (from CPU to GPU) but said it didn't seem to speed up the work. Is anything in the works related to this? That recalculation step is really intensive. With the drop in VRAM requirements, I would assume more users will want to run more of these work units at the same time which can lead to some interesting bottlenecks (from what I have seen on our systems).

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253464341

RAC: 36465

was the change mainly to

26 Oct 2023 14:07:38 UTC

Message 218567 in response to message 218564

(moderation:

)

was the change mainly to widen the pool of available devices for an overall increase in crunching power?

yes indeed.

what is the estimate for the new minimum amount of VRAM needed for the new app/tasks?

We're still running tests, and t seems that we have only a limited range of machines and, in particular OS and the app doesn't behave the same. Also too, the memory consumption is data dependent (i.e. on the individual WU).

But the max should be below what was required for the previous run O3MD1. which was 3,5GB max.

All-Sky Gravitational Wave Search on O3 data (O3ASHF1)

Forums › Technical News

Comment viewing options

Forums › Technical News