Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3960
Credit: 47054332642
RAC: 65223194

there's definitely a

there's definitely a correlation between certain freq and DF ranges to run time, you see some tasks running 7-8mins, and others running like 4 mins (nvidia RTX cards). I haven't looked hard enough to see exactly which freq/DF combos run fast and slow, but a spot check did see runtime differences that matched up with freq/DF.

 

and CPU def seems to be a major bottleneck, I see only barely slower runtimes on a 2070 vs a 2080ti on an identically spec'd system. without the CPU bottleneck, the 2080ti should be close to 2x as fast.

_________________________________________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3960
Credit: 47054332642
RAC: 65223194

Backlogs FGRPB1G FGRP5

Backlogs FGRPB1G FGRP5 O2MDF O2MD1 BRP4 Total
Workunits waiting for validation 7 0 2,334 0 1 2,342

 

I guess the validators aren't running for S3 yet. 

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18753577077
RAC: 7133072

I actually don't even SEE an

I actually don't even SEE an S3 validator created yet in the server tables.

Just the old S2 validator.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3960
Credit: 47054332642
RAC: 65223194

Yeah I saw that too. 

Yeah I saw that too. 

_________________________________________________________________________

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 409
Credit: 10219883455
RAC: 21865205

  Suddenly getting many

 

Suddenly getting many validate errors on S3 since yesterday:

 

Task 1063943092

Name: h1_0552.20_O2C02Cl4In0__O2MDFS3_Spotlight_552.90Hz_603_1

Workunit ID: 520519145

Created: 27 Jan 2021 8:40:23 UTC

Sent: 27 Jan 2021 11:19:05 UTC

Report deadline: 3 Feb 2021 11:19:05 UTC

Received: 27 Jan 2021 11:51:05 UTC

Server state: Over

Outcome: Validate error

Client state: Done

Exit status: 0 (0x00000000)

Computer: 12432012

Run time (sec): 575.19

CPU time (sec): 506.27

Peak working set size (MB): 448.93

Peak swap size (MB): 3155.76

Peak disk usage (MB): 4.46

Validation state: Invalid

Granted credit: 0

Application: Gravitational Wave search O2 Multi-Directional GPU v2.09 (GW-opencl-nvidia)
windows_x86_64


Stderr output

<core_client_version>7.16.11</core_client_version>
<![CDATA[
<stderr_txt>
putenv 'LAL_DEBUG_LEVEL=3'
2021-01-27 12:36:17.1905 (8072) [normal]: This program is published under the GNU General Public License, version 2
2021-01-27 12:36:17.1905 (8072) [normal]: For details see http://einstein.phys.uwm.edu/license.php
2021-01-27 12:36:17.1905 (8072) [normal]: This Einstein@home App was built at: Jul 29 2020 10:47:46

2021-01-27 12:36:17.1905 (8072) [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/einstein_O2MDF_2.09_windows_x86_64__GW-opencl-nvidia.exe'.
Activated exception handling...
[DEBUG} GPU type: 1
[DEBUG} got GPU info from BOINC
[DEBUG} got VendorID 4318
2021-01-27 12:36:17.2374 (8072) [debug]: BSGL output files
2021-01-27 12:36:17.2374 (8072) [debug]: Flags: LAL_DEBUG, OPTIMIZE, HS_OPTIMIZATION, GC_SSE2_OPT, X64, SSE, SSE2, GNUC X86 GNUX86
2021-01-27 12:36:17.2374 (8072) [debug]: Set up communication with graphics process.
2021-01-27 12:36:17.2530 (8072) [normal]: Parsed user input successfully

DEPRECATION WARNING: program has invoked obsolete function XLALGetVersionString(). Please see XLALVCSInfoString() for information about a replacement.
Code-version: %% LAL: 6.21.0.1 (CLEAN 2d5112416ed80b559c941e5fa76095b3fd4e61a8)
%% LALPulsar: 1.18.2.1 (CLEAN 2d5112416ed80b559c941e5fa76095b3fd4e61a8)
%% LALApps: 6.25.1.1 (CLEAN 2d5112416ed80b559c941e5fa76095b3fd4e61a8)

2021-01-27 12:36:17.7685 (8072) [normal]: Reading input data ...
2021-01-27 12:36:17.7685 (8072) [normal]: Loading SFTs matching '..\..\projects\einstein.phys.uwm.edu\h1_0552.20_O2C02Cl4In0.X32F;..\..\projects\einstein.phys.uwm.edu\l1_0552.20_O2C02Cl4In0.X32F;..\..\projects\einstein.phys.uwm.edu\h1_0552.25_O2C02Cl4In0.ZYol;..\..\projects\einstein.phys.uwm.edu\l1_0552.25_O2C02Cl4In0.ZYol;..\..\projects\einstein.phys.uwm.edu\h1_0552.30_O2C02Cl4In0.tp-e;..\..\projects\einstein.phys.uwm.edu\l1_0552.30_O2C02Cl4In0.tp-e;..\..\projects\einstein.phys.uwm.edu\h1_0552.35_O2C02Cl4In0.D3rB;..\..\projects\einstein.phys.uwm.edu\l1_0552.35_O2C02Cl4In0.D3rB;..\..\projects\einstein.phys.uwm.edu\h1_0552.40_O2C02Cl4In0.VR1L;..\..\projects\einstein.phys.uwm.edu\l1_0552.40_O2C02Cl4In0.VR1L;..\..\projects\einstein.phys.uwm.edu\h1_0552.45_O2C02Cl4In0.mBq1;..\..\projects\einstein.phys.uwm.edu\l1_0552.45_O2C02Cl4In0.mBq1;..\..\projects\einstein.phys.uwm.edu\h1_0552.50_O2C02Cl4In0.e6c-;..\..\projects\einstein.phys.uwm.edu\l1_0552.50_O2C02Cl4In0.e6c-;..\..\projects\einstein.phys.uwm.edu\h1_0552.55_O2C02Cl4In0.koAu;..\..\projects\einstein.phys.uwm.edu\l1_0552.55_O2C02Cl4In0.koAu;..\..\projects\einstein.phys.uwm.edu\h1_0552.60_O2C02Cl4In0.PXD1;..\..\projects\einstein.phys.uwm.edu\l1_0552.60_O2C02Cl4In0.PXD1;..\..\projects\einstein.phys.uwm.edu\h1_0552.65_O2C02Cl4In0.iz6d;..\..\projects\einstein.phys.uwm.edu\l1_0552.65_O2C02Cl4In0.iz6d;..\..\projects\einstein.phys.uwm.edu\h1_0552.70_O2C02Cl4In0.uS5U;..\..\projects\einstein.phys.uwm.edu\l1_0552.70_O2C02Cl4In0.uS5U;..\..\projects\einstein.phys.uwm.edu\h1_0552.75_O2C02Cl4In0.Ee7E;..\..\projects\einstein.phys.uwm.edu\l1_0552.75_O2C02Cl4In0.Ee7E;..\..\projects\einstein.phys.uwm.edu\h1_0552.80_O2C02Cl4In0.iVbC;..\..\projects\einstein.phys.uwm.edu\l1_0552.80_O2C02Cl4In0.iVbC;..\..\projects\einstein.phys.uwm.edu\h1_0552.85_O2C02Cl4In0.df8O;..\..\projects\einstein.phys.uwm.edu\l1_0552.85_O2C02Cl4In0.df8O;..\..\projects\einstein.phys.uwm.edu\h1_0552.90_O2C02Cl4In0.0nZi;..\..\projects\einstein.phys.uwm.edu\l1_0552.90_O2C02Cl4In0.0nZi;..\..\projects\einstein.phys.uwm.edu\h1_0552.95_O2C02Cl4In0.5dna;..\..\projects\einstein.phys.uwm.edu\l1_0552.95_O2C02Cl4In0.5dna;..\..\projects\einstein.phys.uwm.edu\h1_0553.00_O2C02Cl4In0.6A-3;..\..\projects\einstein.phys.uwm.edu\l1_0553.00_O2C02Cl4In0.6A-3;..\..\projects\einstein.phys.uwm.edu\h1_0553.05_O2C02Cl4In0.MZBb;..\..\projects\einstein.phys.uwm.edu\l1_0553.05_O2C02Cl4In0.MZBb;..\..\projects\einstein.phys.uwm.edu\h1_0553.10_O2C02Cl4In0.GSp9;..\..\projects\einstein.phys.uwm.edu\l1_0553.10_O2C02Cl4In0.GSp9;..\..\projects\einstein.phys.uwm.edu\h1_0553.15_O2C02Cl4In0.QxpN;..\..\projects\einstein.phys.uwm.edu\l1_0553.15_O2C02Cl4In0.QxpN;..\..\projects\einstein.phys.uwm.edu\h1_0553.20_O2C02Cl4In0.0kxE;..\..\projects\einstein.phys.uwm.edu\l1_0553.20_O2C02Cl4In0.0kxE;..\..\projects\einstein.phys.uwm.edu\h1_0553.25_O2C02Cl4In0.yBD4;..\..\projects\einstein.phys.uwm.edu\l1_0553.25_O2C02Cl4In0.yBD4;..\..\projects\einstein.phys.uwm.edu\h1_0553.30_O2C02Cl4In0.GwkW;..\..\projects\einstein.phys.uwm.edu\l1_0553.30_O2C02Cl4In0.GwkW;..\..\projects\einstein.phys.uwm.edu\h1_0553.35_O2C02Cl4In0.vg_0;..\..\projects\einstein.phys.uwm.edu\l1_0553.35_O2C02Cl4In0.vg_0;..\..\projects\einstein.phys.uwm.edu\h1_0553.40_O2C02Cl4In0.9bPk;..\..\projects\einstein.phys.uwm.edu\l1_0553.40_O2C02Cl4In0.9bPk;..\..\projects\einstein.phys.uwm.edu\h1_0553.45_O2C02Cl4In0.7KQc;..\..\projects\einstein.phys.uwm.edu\l1_0553.45_O2C02Cl4In0.7KQc;..\..\projects\einstein.phys.uwm.edu\h1_0553.50_O2C02Cl4In0.audb;..\..\projects\einstein.phys.uwm.edu\l1_0553.50_O2C02Cl4In0.audb;..\..\projects\einstein.phys.uwm.edu\h1_0553.55_O2C02Cl4In0.ycsj;..\..\projects\einstein.phys.uwm.edu\l1_0553.55_O2C02Cl4In0.ycsj;..\..\projects\einstein.phys.uwm.edu\h1_0553.60_O2C02Cl4In0.T3zr;..\..\projects\einstein.phys.uwm.edu\l1_0553.60_O2C02Cl4In0.T3zr;..\..\projects\einstein.phys.uwm.edu\h1_0553.65_O2C02Cl4In0.WQgs;..\..\projects\einstein.phys.uwm.edu\l1_0553.65_O2C02Cl4In0.WQgs' into catalog ...2021-01-27 12:36:50.9325 (8072) [normal]: done.
2021-01-27 12:36:50.9325 (8072) [normal]: Validating SFTs ... 2021-01-27 12:37:15.9266 (8072) [normal]: success.
2021-01-27 12:37:17.0045 (8072) [normal]: Search FstatMethod used: 'ResampOpenCL'
2021-01-27 12:37:17.0045 (8072) [normal]: Recalc FstatMethod used: 'DemodSSE'
2021-01-27 12:37:17.0045 (8072) [normal]: OpenCL Device used for Search/Recalc and/or semi coherent step: 'Quadro GV100 (Platform: NVIDIA CUDA, global memory: 32768 MiB)'
2021-01-27 12:37:17.0045 (8072) [normal]: OpenCL version is used for the semi-coherent step!
2021-01-27 12:37:39.2180 (8072) [normal]: Number of segments: 22, total number of SFTs in segments: 9902
2021-01-27 12:37:39.2648 (8072) [normal]: Finished reading input data.
% --- GPS reference time = 1177858472.0000 , GPS data mid time = 1177806642.5000
2021-01-27 12:37:39.2648 (8072) [normal]: dFreqStack = 7.061150e-007, df1dot = 4.521800e-012, df2dot = 2.284100e-018, df3dot = 0.000000e+000
% --- Setup, N = 22, T = 604800 s, Tobs = 19646545 s, gammaRefine = 37, gamma2Refine = 45, gamma3Refine = 1

DEPRECATION WARNING: program has invoked obsolete function InitDopplerSkyScan(). Please see XLALInitDopplerSkyScan() for information about a replacement.
2021-01-27 12:37:39.2805 (8072) [normal]: INFO: No checkpoint checkpoint.cpt found - starting from scratch
% --- Cpt:0, total:32, sky:1/1, f1dot:1/32

0.% --- CG:1573198 FG:70821 f1dotmin_fg:-5.638670177698e-008 df1dot_fg:1.222108108108e-013 f2dotmin_fg:-1.116671111111e-018 df2dot_fg:5.075777777778e-020 f3dotmin_fg:0 df3dot_fg:1
...................INFO: Major Windows version: 6
c
................................................................................c
....................................................................................................c
....................................................................................................c
....................................................................................................c
....................................................................................................c
........................................................................................................................c
....................
2021-01-27 12:45:30.0908 (8072) [normal]: Finished main analysis.
2021-01-27 12:45:30.0908 (8072) [normal]: Recalculating statistics for the final toplist...
2021-01-27 12:45:49.0082 (8072) [normal]: Finished recalculating toplist statistics.
2021-01-27 12:45:49.0082 (8072) [debug]: Writing output ... toplist2 ... toplist3 ... done.

DEPRECATION WARNING: program has invoked obsolete function FreeDopplerSkyScan(). Please see XLALDestroyDopplerSkyScan() for information about a replacement.
2021-01-27 12:45:49.8673 (8072) [debug]: resultfile '../../projects/einstein.phys.uwm.edu/h1_0552.20_O2C02Cl4In0__O2MDFS3_Spotlight_552.90Hz_603_1_0' (len 95), current config file: 0
2021-01-27 12:45:49.8673 (8072) [debug]: renaming '../../projects/einstein.phys.uwm.edu/h1_0552.20_O2C02Cl4In0__O2MDFS3_Spotlight_552.90Hz_603_1_0-BSGLtL' to '../../projects/einstein.phys.uwm.edu/h1_0552.20_O2C02Cl4In0__O2MDFS3_Spotlight_552.90Hz_603_1_1'
2021-01-27 12:45:49.8830 (8072) [debug]: renaming '../../projects/einstein.phys.uwm.edu/h1_0552.20_O2C02Cl4In0__O2MDFS3_Spotlight_552.90Hz_603_1_0-BtSGLtL' to '../../projects/einstein.phys.uwm.edu/h1_0552.20_O2C02Cl4In0__O2MDFS3_Spotlight_552.90Hz_603_1_2'
Code-version: %% LAL: 6.21.0.1 (CLEAN 2d5112416ed80b559c941e5fa76095b3fd4e61a8)
%% LALPulsar: 1.18.2.1 (CLEAN 2d5112416ed80b559c941e5fa76095b3fd4e61a8)
%% LALApps: 6.25.1.1 (CLEAN 2d5112416ed80b559c941e5fa76095b3fd4e61a8)

FPU status flags: COND_1 PRECISION
2021-01-27 12:45:49.8830 (8072) [debug]: worker done. return(0) to caller
2021-01-27 12:45:49.8830 (8072) [normal]: done. calling boinc_finish(0).
12:45:49 (8072): called boinc_finish

</stderr_txt>
]]>




 

Does anybody have an idea?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117703249097
RAC: 35085088

San-Fernando-Valley

San-Fernando-Valley wrote:
Suddenly getting many validate errors on S3 since yesterday:

It probably just means that the validator needs tweaking for the conditions of the new S3 tasks series.  My guess is that adjustments will probably be made and the tasks then be resubmitted for validation - or something like that.

Teething problems like this have happened before.

As I mentioned earlier on, this transition seems quite unusual and quite rushed, so it's probably not a great surprise if there are some issues.  I'm sure it will all get fixed in the end :-).

Cheers,
Gary.

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 409
Credit: 10219883455
RAC: 21865205

Thanks for your response,

Thanks for your response, Gary.

It seems to have happend only yesterday, 27th, with 28 such tasks concerned.

Seems to be OK now.

Thought I'd mention it.

We are having troublesome times !

 

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6460
Credit: 9585091206
RAC: 6910486

Looks like my Gravity Wave

Looks like my Gravity Wave uploads are hanging.

Got more than 40 waiting.  And have basic backoff's running.

Fri 29 Jan 2021 06:52:02 PM CST | Einstein@Home | Started upload of h1_0557.10_O2C02Cl4In0__O2MDFS3_Spotlight_557.70Hz_520_1_1
Fri 29 Jan 2021 06:52:17 PM CST | Einstein@Home | Temporarily failed upload of h1_0557.10_O2C02Cl4In0__O2MDFS3_Spotlight_557.70Hz_520_1_0: transient HTTP error
Fri 29 Jan 2021 06:52:17 PM CST | Einstein@Home | Backing off 00:00:15 on upload of h1_0557.10_O2C02Cl4In0__O2MDFS3_Spotlight_557.70Hz_520_1_0

 

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 409
Credit: 10219883455
RAC: 21865205

... yep, tasks are not

... yep, tasks are not uploading ...

HK-Steve
HK-Steve
Joined: 9 May 17
Posts: 1
Credit: 933434813
RAC: 0

2754 pending validations and

2754 pending validations and 60 Uploading, Project Backoff .

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.