All-Sky Gravitational Wave Search on O3 data (O3ASHF1)

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,591
Credit: 322,275,621
RAC: 267,171

Ian&Steve C. wrote: Bernd

Ian&Steve C. wrote:

Bernd said:

Quote:
To reach a better sensitivity, we'll make the resulting list of candidates even longer. This means not only larger result files to be uploaded (well, shouldn't be much of a problem nowadays), but also a longer time taken for the "recalc" step ("recalculating toplist ststistics" is written in stderr). This step is done purely on the CPU. We are working on porting it to the GPUs, but the memory access pattern of this step is so unpredictable that we don't get much speedup from that yet (accessing "global" memory on the GPU is still terribly slow). We hope to get an improved version of the App out during the run.


so probably a combination of required sensitivity and code optimization making the final part better suited to the CPU. sounds like they will be trying to port this to the GPU at some point, and at that time i would guess the app will run much faster/efficiently.

The GPUs global memory, as the name suggests, is not associated with any particular compute unit. I had thought/heard that it was page swaps of the GPUs global memory (ie. oversubscription & cache miss) that was the slow down (possibly two orders slower). Thus those really fast compute units are I/O bound (the GPU to CPU interconnect).

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Aurum
Aurum
Joined: 12 Jul 17
Posts: 77
Credit: 3,412,397,040
RAC: 133

It seems to me the way the

It seems to me the way the estimated run time for Meerkat and All-Sky are estimated are confounded somehow. Yesterday one computer finished its All-Sky WUs and it just ran Meerkats for a day with my queue set to 1/0. The TimeLeft (From BoincTasks) dropped from 7 hours to 00:25:49 hours. Then in Preferences I turned off Meerkat and turned on All-Sky and with my queue set to 1/0.5 it DLed 155 O3AS WUs with a TimeLeft of 00:06:11 hours and I already had 79 Meerkat WUs. Very strange behavior since the previous DL of All-Sky had WUs with TimeLeft of about 1.5 hours.

I have another computer that DLed Meerkats with 11 hour run times. Unless Meerkat runs alone for a day it can start with any time from 3 to 11 hours for a 15 minute task.

There also seems to be interference with FGRP5 CPU WUs. With queue set to 1/0 if the TimeLeft for the GPU WUs totals a day then I cannot DL any FGRP5 WUs even if there are no CPU WUs of any kind running.

CPU WUs and GPU WUs should be managed separately but they appear to be combined on E@H.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4,015
Credit: 47,628,015,028
RAC: 43,896,123

the estimated runtime comes

the estimated runtime comes from your client.

it's because the project (as do all projects AFAIK) shares the same DCF for all task types. and almost always one task type needs a wildly different DCF than another to produce an accurate runtime estimation. so as you process one kind of task, the DCF shifts in the direction to converge on the value required for that task, which makes the other task runtime estimations more and more inaccurate.

there's not really a good solution to this other than just running only one kind of task, or a really small work cache so that the inaccurate runtimes don't impact the schedule requests as much.

_________________________________________________________________________

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 459
Credit: 10,383,957,880
RAC: 12,808,833

Great explanation of the

Great explanation of the things going on CPU and GPU.  Thanks to all.

It is very "educational" and interesting (for me).

 

I`m just wondering if this is important to disect how the credits/run times/etc are precalculated.

For the sake of understanding the way it works it's fine.

But I think it is irrelevant to the objective of crunching.

No matter how the calcs are being done (poor or great) I personally don't mind/care ...

I just want to crunch - which I am trying to do.

I'm very satified (probably no one cares about my positioning) with the way things work here.

cheers

S-F-V

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4,992
Credit: 18,835,037,477
RAC: 5,789,769

Most projects have ditched

Most projects have ditched the DCF mechanism for tasks, Einstein being one of the single digit holdouts.

Most BOINC projects moved to separate APR calculation numbers for each application type.

This method has correct estimated calculation numbers for each task.

 

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 1,068
Credit: 18,101,543
RAC: 12,399

It might be a good idea to

It might be a good idea to put the content of Ian's last comment, appropriately described, directly on the project's FAQ page:

https://einsteinathome.org/faq

Every einstein cruncher initially stumbles over the strangely fluctuating run time estimates when crunching different task types (the default). Estimates are far too short then far too long. By observing tasks for a long enough time, you can get closer to the cause and realize what's the only solution: keep task cache small (or confine to one task type). But probably only experienced long-time crunchers who may also have read the responsible source code of the BOINC client (or run it in full debug mode flooding the messages window) fully understand the reasons. So, I think a short FAQ entry about this 'feature' would be very useful.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,330
Credit: 251,180,315
RAC: 41,915

We are planning a change to

We are planning a change to the current run. We will trade in a bit of runtime for memory. The workunits that we plan to produce in the future will run a bit longer (~10%), but take significantly less memory, such that GPUs with 4GB can run these. A few modifications need to be made to the app to achieve this. The app version 1.06 that we just published already incorporates the necessary changes. With the current workunits, however, it should behave identical to the previous 1.04 version.

In (our part of) Germany there is a long weekeend ahead, and I'm reluctant to deploy such a change of the workunits before a longer period of reduced attention from our side. So this change of workunits will likely happen mid next week.

BM

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4,015
Credit: 47,628,015,028
RAC: 43,896,123

that's great to hear

that's great to hear Bernd.

was the change mainly to widen the pool of available devices for an overall increase in crunching power?

what is the estimate for the new minimum amount of VRAM needed for the new app/tasks?

_________________________________________________________________________

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 258
Credit: 10,732,260,146
RAC: 11,746,300

That is good news, overall.

That is good news, overall. Originally, you all were working on the recalculation step (from CPU to GPU) but said it didn't seem to speed up the work. Is anything in the works related to this? That recalculation step is really intensive. With the drop in VRAM requirements, I would assume more users will want to run more of these work units at the same time which can lead to some interesting bottlenecks (from what I have seen on our systems). 

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,330
Credit: 251,180,315
RAC: 41,915

was the change mainly to

was the change mainly to widen the pool of available devices for an overall increase in crunching power?

yes indeed.


what is the estimate for the new minimum amount of VRAM needed for the new app/tasks?

We're still running tests, and t seems that we have only a limited range of machines and, in particular OS and the app doesn't behave the same. Also too, the memory consumption is data dependent (i.e. on the individual WU).

But the max should be below what was required for the previous run O3MD1. which was 3,5GB max.

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.