Gravitational Wave search O3AS Engineering

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250486688

RAC: 34888

26 Apr 2021 10:44:30 UTC

Topic 225277

(moderation:

)

While we wait for the LSC to release the data from their third observation run ("O3"), we will test the workunit setup and application planned to analyze that new data in a short "Engineering" run (on generated mock data). This will be the locality scheduling application for the next few weeks.

The App is currently in "Beta test" status, and workunit generation is limited untill we find the app reliable enough. Probably tomorrow we will release the app to the larger public and will generate workunits continuously.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2957216360

RAC: 715705

Some initial observations

26 Apr 2021 11:04:59 UTC

Message 185225

(moderation:

)

Some initial observations already posted in comment 185223

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1589715718

RAC: 767910

A reminder for those who are

20 Jul 2021 22:26:23 UTC

Message 187475

(moderation:

)

A reminder for those who are running 2x on this app.

If you run them with staggered starting times you will get a major improvement in throughput.

My GTX1660s needs about 33m wo staggered starts whereas staggering them reduces run time to ~28m.

petri33

Joined: 4 Mar 20

Posts: 123

Credit: 4040215819

RAC: 7024302

Betreger wrote:A reminder

21 Jul 2021 12:42:24 UTC

Message 187516 in response to message 187475

(moderation:

)

Betreger wrote:

A reminder for those who are running 2x on this app.

If you run them with staggered starting times you will get a major improvement in throughput.

My GTX1660s needs about 33m wo staggered starts whereas staggering them reduces run time to ~28m.

If we could have app_mutex on GPU with 2x tasks, the tasks would run 20 seconds parallel, one doing CPU and initial GPU setup, second one doing full GPU calculations. When GPU calculations on the second on finished, the first GPU could start full GPU stuff. The second one would do 200+ seconds checking with CPU and the first one would do GPU. When second one stops CPU stuff it would report and start a new one and do 20 seconds CPU and minor GPU setup.

GPU #1        |GPU #2
----------------------------
...           | ...
begin new WU  | GPU
CPU setup     | GPU
mutex wait GPU| mutex release
GPU           | CPU check WU
GPU           | report WU
GPU           | begin new WU
GPU           | CPU setup
mutex release | mutex wait GPU
CPU check WU  | GPU
Report WU     | GPU
...           | ...

That was used in Seti NVIDIA linux app to run one GPU intensive part of the task at a time and overlap initial setup & cleanup & report stages.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3953

Credit: 46838872642

RAC: 64340683

petri33 wrote: Betreger

21 Jul 2021 15:22:04 UTC

Message 187527 in response to message 187516

(moderation:

)

petri33 wrote:

Betreger wrote:

A reminder for those who are running 2x on this app.

If you run them with staggered starting times you will get a major improvement in throughput.

My GTX1660s needs about 33m wo staggered starts whereas staggering them reduces run time to ~28m.

If we could have app_mutex on GPU with 2x tasks, the tasks would run 20 seconds parallel, one doing CPU and initial GPU setup, second one doing full GPU calculations. When GPU calculations on the second on finished, the first GPU could start full GPU stuff. The second one would do 200+ seconds checking with CPU and the first one would do GPU. When second one stops CPU stuff it would report and start a new one and do 20 seconds CPU and minor GPU setup.
GPU #1        |GPU #2
----------------------------
...           | ...
begin new WU  | GPU
CPU setup     | GPU
mutex wait GPU| mutex release
GPU           | CPU check WU
GPU           | report WU
GPU           | begin new WU
GPU           | CPU setup
mutex release | mutex wait GPU
CPU check WU  | GPU
Report WU     | GPU
...           | ...
That was used in Seti NVIDIA linux app to run one GPU intensive part of the task at a time and overlap initial setup & cleanup & report stages.

perfect use case for this! especially since there are several minutes of CPU-only processing at the tail end of every GW task.

_________________________________________________________________________

GWGeorge007

Joined: 8 Jan 18

Posts: 3062

Credit: 4967027686

RAC: 1414543

Curious George here, Why

21 Jul 2021 16:33:29 UTC

Message 187532

(moderation:

)

Curious George here,

Why couldn't we use this sort of formula for O2MD1 tasks?

George

Proud member of the Old Farts Association

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3953

Credit: 46838872642

RAC: 64340683

GWGeorge007 wrote: Curious

21 Jul 2021 16:37:05 UTC

Message 187533 in response to message 187532

(moderation:

)

GWGeorge007 wrote:

Curious George here,

Why couldn't we use this sort of formula for O2MD1 tasks?

could have. but i think the CPU "wrap up" portion at the end of the task is significantly longer with O3AS than it was with O2MD. So it has a bigger effect now than before.

but in either case, it's up to the Einstein developers to implement something like this. logic that needs to be coded into the application.

_________________________________________________________________________

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117592169802

RAC: 35210726

GWGeorge007 wrote:Why

21 Jul 2021 22:57:00 UTC

Message 187551 in response to message 187532

(moderation:

)

GWGeorge007 wrote:

Why couldn't we use this sort of formula for O2MD1 tasks?

Because O2MD1 is a CPU only search and doesn't use the GPU.

Before the O3AS engineering test run started there was a GPU search using O2 data called O2MDF. It's been gone for a while and there are no apps listed for it.

If you were talking about O2MDF rather than O2MD1, it's highly unlikely that it would be used again. The emphasis would now be to use GPUs for just the new O3 data since that is the data most likely to provide a detection.

Cheers,
Gary.

mmonnin

Joined: 29 May 16

Posts: 291

Credit: 3398986540

RAC: 3004860

Nothing but transient http

23 Jul 2021 10:14:11 UTC

Message 187636

(moderation:

)

Nothing but transient http upload errors on O3AS tasks for the past day plus. I had just switched from GRPB on several computers and haven't had issues with those tasks. Several tasks did get credit or are pending but I have hundreds now failing to upload.

mmonnin

Joined: 29 May 16

Posts: 291

Credit: 3398986540

RAC: 3004860

Can't upload and can't

23 Jul 2021 22:21:02 UTC

Message 187668

(moderation:

)

Can't upload and can't download different work because of the uploads.
Changed back to Gamma ray and reset project.
STILL was getting O3 tasks over and over.
I could change locations and the project would still send this junk.
I could completely de-select ATI or NV and it would still send tasks for those cards.
I could select CPU only, a GPU app, everything else No and STILL get GPU work.
Each PC was getting 12 tasks no matter the speed or GPU count.
Server is F'd up

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117592169802

RAC: 35210726

mmonnin wrote:Each PC was

23 Jul 2021 22:47:00 UTC

Message 187669 in response to message 187668

(moderation:

)

mmonnin wrote:

Each PC was getting 12 tasks no matter the speed or GPU count.

Sounds like the server is doing what it always does if you don't properly remove allocated work. Resetting the project doesn't remove what has already been allocated. You need to abort it first and make sure that work you don't want is properly reported before you reset a project.

The other thing to check also is that you don't have the setting for 'Allow non-preferred apps' set to 'yes' for the location a particular host is assigned to.

EDIT: I was about to set up a host to run O3AS work to check if there really is a widespread upload problem. I haven't started running any of this yet so don't know the situation. I'm surprised that a lot more people aren't complaining if this is truly widespread. This is a serious issue (being a weekend) since it doesn't take long for stuck uploads to destroy new work fetch and completely jam up a system.

Are you sure it's not something to do with your ISP for example?

Cheers,
Gary.

Gravitational Wave search O3AS Engineering

Forums › Technical News

Comment viewing options

Forums › Technical News