Gravitational Wave search O3AS Engineering

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,023
Credit: 214,156,517
RAC: 42,842
Topic 225277

While we wait for the LSC to release the data from their third observation run ("O3"), we will test the workunit setup and application planned to analyze that new data in a short "Engineering" run (on generated mock data). This will be the locality scheduling application for the next few weeks.

The App is currently in "Beta test" status, and workunit generation is limited untill we find the app reliable enough. Probably tomorrow we will release the app to the larger public and will generate workunits continuously.

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,038
Credit: 671,050,144
RAC: 1,312,187

Some initial observations

Some initial observations already posted in comment 185223

Betreger
Betreger
Joined: 25 Feb 05
Posts: 953
Credit: 678,498,639
RAC: 299,465

A reminder for those who are

A reminder for those who are running 2x on this app.

If you run them with staggered starting times you will get a major improvement in throughput.

My GTX1660s needs about 33m wo staggered starts whereas staggering them reduces run time to ~28m.

petri33
petri33
Joined: 4 Mar 20
Posts: 65
Credit: 920,627,609
RAC: 6,743,675

Betreger wrote:A reminder

Betreger wrote:

A reminder for those who are running 2x on this app.

If you run them with staggered starting times you will get a major improvement in throughput.

My GTX1660s needs about 33m wo staggered starts whereas staggering them reduces run time to ~28m.

If we could have app_mutex on GPU with 2x tasks, the tasks would run 20 seconds parallel, one doing CPU and initial GPU setup, second one doing full GPU calculations. When GPU calculations on the second on finished, the first GPU could start full GPU stuff. The second one would do 200+ seconds checking with CPU and the first one would do GPU. When second one stops CPU stuff it would report and start a new one and do 20 seconds CPU and minor GPU setup.

GPU #1        |GPU #2
----------------------------
...           | ...
begin new WU  | GPU
CPU setup     | GPU
mutex wait GPU| mutex release
GPU           | CPU check WU
GPU           | report WU
GPU           | begin new WU
GPU           | CPU setup
mutex release | mutex wait GPU
CPU check WU  | GPU
Report WU     | GPU
...           | ...

That was used in Seti NVIDIA linux app to run one GPU intensive part of the task at a time and overlap initial setup & cleanup & report stages.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 858
Credit: 4,758,307,838
RAC: 31,829,495

petri33 wrote: Betreger

petri33 wrote:

Betreger wrote:

A reminder for those who are running 2x on this app.

If you run them with staggered starting times you will get a major improvement in throughput.

My GTX1660s needs about 33m wo staggered starts whereas staggering them reduces run time to ~28m.

If we could have app_mutex on GPU with 2x tasks, the tasks would run 20 seconds parallel, one doing CPU and initial GPU setup, second one doing full GPU calculations. When GPU calculations on the second on finished, the first GPU could start full GPU stuff. The second one would do 200+ seconds checking with CPU and the first one would do GPU. When second one stops CPU stuff it would report and start a new one and do 20 seconds CPU and minor GPU setup.

GPU #1        |GPU #2
----------------------------
...           | ...
begin new WU  | GPU
CPU setup     | GPU
mutex wait GPU| mutex release
GPU           | CPU check WU
GPU           | report WU
GPU           | begin new WU
GPU           | CPU setup
mutex release | mutex wait GPU
CPU check WU  | GPU
Report WU     | GPU
...           | ...

That was used in Seti NVIDIA linux app to run one GPU intensive part of the task at a time and overlap initial setup & cleanup & report stages.

 

perfect use case for this! especially since there are several minutes of CPU-only processing at the tail end of every GW task.

_____________________________________________

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 361
Credit: 525,957,291
RAC: 2,913,662

Curious George here, Why

Curious George here,

Why couldn't we use this sort of formula for O2MD1 tasks?

George

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 858
Credit: 4,758,307,838
RAC: 31,829,495

GWGeorge007 wrote: Curious

GWGeorge007 wrote:

Curious George here,

Why couldn't we use this sort of formula for O2MD1 tasks?

could have. but i think the CPU "wrap up" portion at the end of the task is significantly longer with O3AS than it was with O2MD. So it has a bigger effect now than before.

but in either case, it's up to the Einstein developers to implement something like this. logic that needs to be coded into the application.

_____________________________________________

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,491
Credit: 62,231,384,681
RAC: 55,581,923

GWGeorge007 wrote:Why

GWGeorge007 wrote:
Why couldn't we use this sort of formula for O2MD1 tasks?

Because O2MD1 is a CPU only search and doesn't use the GPU.

Before the O3AS engineering test run started there was a GPU search using O2 data called O2MDF.  It's been gone for a while and there are no apps listed for it.

If you were talking about O2MDF rather than O2MD1, it's highly unlikely that it would be used again.  The emphasis would now be to use GPUs for just the new O3 data since that is the data most likely to provide a detection.

 

Cheers,
Gary.

mmonnin
mmonnin
Joined: 29 May 16
Posts: 281
Credit: 1,821,287,397
RAC: 3,519,524

Nothing but transient http

Nothing but transient http upload errors on O3AS tasks for the past day plus. I had just switched from GRPB on several computers and haven't had issues with those tasks. Several tasks did get credit or are pending but I have hundreds now failing to upload.

mmonnin
mmonnin
Joined: 29 May 16
Posts: 281
Credit: 1,821,287,397
RAC: 3,519,524

Can't upload and can't

Can't upload and can't download different work because of the uploads.
Changed back to Gamma ray and reset project.
STILL was getting O3 tasks over and over.
I could change locations and the project would still send this junk.
I could completely de-select ATI or NV and it would still send tasks for those cards.
I could select CPU only, a GPU app, everything else No and STILL get GPU work.
Each PC was getting 12 tasks no matter the speed or GPU count.
Server is F'd up

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,491
Credit: 62,231,384,681
RAC: 55,581,923

mmonnin wrote:Each PC was

mmonnin wrote:
Each PC was getting 12 tasks no matter the speed or GPU count.

Sounds like the server is doing what it always does if you don't properly remove allocated work.  Resetting the project doesn't remove what has already been allocated.  You need to abort it first and make sure that work you don't want is properly reported before you reset a project.

The other thing to check also is that you don't have the setting for 'Allow non-preferred apps' set to 'yes' for the location a particular host is assigned to.

EDIT:  I was about to set up a host to run O3AS work to check if there really is a widespread upload problem.  I haven't started running any of this yet so don't know the situation.  I'm surprised that a lot more people aren't complaining if this is truly widespread.  This is a serious issue (being a weekend) since it doesn't take long for stuck uploads to destroy new work fetch and completely jam up a system.

Are you sure it's not something to do with your ISP for example?

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.