Multi-Directed Continuous GW production work begins

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7273015063
RAC: 1833069
Topic 203088

I believe in the last 24 hours or less, it appears that production distribution of the Multi-Wave Continuous Gravitational Wave Search has begun.

I thought a user thread for observations could be useful, though problems probably should be brought up in the Problems and Bug Reports forum.

As with the Tuning runs, it appears that there are two flavors, G, and CV.  I think the intention is to distribute the CV flavor to more capable machines.  However in my case a Haswell host I think one of my more capable is getting G work.

For those of us used to the extremely constant work content (and thus run time on well-stabilized machines) of several recent Einstein applications, this work has a remarkable degree of varying expected run times.  G type work distributed to that Haswell host displays a current estimated elapsed time ranging over a greater than 50:1 range!.  

CV work comes in two sub-flavors as disclosed by the task name, VelaJr and CasA.  Of these the tasks distributed to my most capable host (a different model Haswell) have already shown a 30:1 estimate range for CasA, and a 6:1 range for VelaJr.  I doubt the tasks I happen to have received represent the full range.

A couple of administrative overhead observations:

The enabling designations for the production runs seen on one's account's project preferences web page appear to be new (differing from those for the tuning runs), so perhaps may be turned on by default--so lots of USA users may come back from Thanksgiving distractions to find unexpected new work in progress.

In my case I had most of my hosts not running CPU work.  The newly running Multi-Directed Continuous GW work takes enough longer than the runtime estimated based on my GPU work that several of my hosts fell into "panic mode" shortly after they arrived. While this is not a dire problem, I've chosen to lower the requested work queue from 3.5 to 2.5 days on two machines, hoping to quell it.

 

ku4eto
ku4eto
Joined: 29 Oct 16
Posts: 25
Credit: 152116
RAC: 0

Yup, i already got around 3

Yup, i already got around 3 hours ago my first GW : https://einsteinathome.org/workunit/263559925

I am running this on a VM with 2 cores of a E5 2660 v3 @2.2Ghz. The estimated time was ~1.05, yet it took around 2hours and half to complete.

For some reason, it was Running good, then for 2-3 seconds the progress bar stops (like on 56.731 for example), and after few seconds it starts again.

Another interesting thing is, the other person i was paired with, has almost the same CPU, but with 16 assigned cores. Yet, his Runtime and CPU time are half of mine.

The run version for my task was v1.00 G for some reason, i have seen on previous GW work v1.01

niktak11
niktak11
Joined: 9 Apr 16
Posts: 1
Credit: 12363138
RAC: 0

The new tasks I got are

The new tasks I got are expected to finish around the 18 hour mark...running on a 4770k at 4.2Ghz

 

Maybe the initial tasks that you guys got were shorter?

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 195050004
RAC: 378208

Please note that I already

Please note that I already wrote about the search in the first post of Multi-Directed Gravitational Wave Search.

I'll add a paragraph about the varying runtimes you see (which is normal behavior).

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7273015063
RAC: 1833069

niktak11 wrote:Maybe the

niktak11 wrote:
Maybe the initial tasks that you guys got were shorter?

I already mentioned the very wide range of estimated work content on tasks I've received.  I'm just beginning to have enough completed results to give some ground truth on actual elapsed times.  I deliberately expedited consideration of some work predicted to have short run times, and have observed as little as just under 6 minutes.  The longest time for a task actually complete so far was a little over 9 hours.   Admittedly that last is on a laptop. but it is a SkyLake running only a single CPU task, and delivers FGRPSSE-Beta times of just under 5 hours, so it is not all so very slow a machine.

ku4eto
ku4eto
Joined: 29 Oct 16
Posts: 25
Credit: 152116
RAC: 0

Completed another GW WU with

Completed another GW WU with ... 4400 seconds of run time -~75 mins. Over twice as fast compared to the first one. Running a third one now, but not sure how long it will take. I will have to complete a task on the desktop PC to see if i will be able to get a GW there.

EDIT: It took as much as a FGRPSSE job - ~7 hours (24,400 CPU time)! I guess it should give almost as much credit as the FGRPSEE?

Also, it seems like my VM is now receiving only GW WUs (aside from 1-2 final FGRPSSE). All of them are AVX ones.

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7273015063
RAC: 1833069

ku4eto wrote:I guess it

ku4eto wrote:
I guess it should give almost as much credit as the FGRPSEE?

Maybe.  I imagine one use of the tuning run was to allow the estimates for relative unit execution time to be made pretty accurate.  And I think in general the project strives to match credit award rates across the total user population by application.

But there is a huge range of size on this Multi work.  I've personally observed individual validated credit unit awards ranging from 20 through 1000 in just the first day, so that may well not be the full range.  Further, even if they succeed in matching the average credit rate across all hosts there is no guarantee that your particular host is matched.  I had only one host that was running FGRPSSE work, and it has only one validated Multi unit.  On that single credit rate comparison I saw the Multi award rate to be 78% of the FGRPSSE. 

I'm just glad this major effort is up and running now.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7273015063
RAC: 1833069

archae86 wrote:In my case I

archae86 wrote:
In my case I had most of my hosts not running CPU work.  The newly running Multi-Directed Continuous GW work takes enough longer than the runtime estimated based on my GPU work that several of my hosts fell into "panic mode" shortly after they arrived. While this is not a dire problem, I've chosen to lower the requested work queue from 3.5 to 2.5 days on two machines, hoping to quell it.

An additional reason for more "panic mode" than expected is that some of the Multi work issued since the production start has a 7-day deadline, shorter than usual for Einstein production.  Most of the rest seems to have been issued with a 14-day deadline.  I don't know what the distinction is, nor whether 7-day deadline issuance continues at the current time.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

archae86 wrote:In my case I

archae86 wrote:
In my case I had most of my hosts not running CPU work.  The newly running Multi-Directed Continuous GW work takes enough longer than the runtime estimated based on my GPU work that several of my hosts fell into "panic mode" shortly after they arrived. While this is not a dire problem, I've chosen to lower the requested work queue from 3.5 to 2.5 days on two machines, hoping to quell it.

The recent versions of BOINC are supposed to calculate this separately for the GPU and the CPU.  It has not been a problem for me with 7.6.33 (Win7 64-bit and Ubuntu 16.xx).  I have a wide variety of CPU and GPU work, but the only time I see a panic mode is when there is an initial misestimation of the run times, combined with an unrealistically short deadline.  I usually keep the default buffer though (0.1 + 0.5 days), and sometimes shorter.

ku4eto
ku4eto
Joined: 29 Oct 16
Posts: 25
Credit: 152116
RAC: 0

archae86 wrote:archae86

archae86 wrote:
archae86 wrote:
In my case I had most of my hosts not running CPU work.  The newly running Multi-Directed Continuous GW work takes enough longer than the runtime estimated based on my GPU work that several of my hosts fell into "panic mode" shortly after they arrived. While this is not a dire problem, I've chosen to lower the requested work queue from 3.5 to 2.5 days on two machines, hoping to quell it.

An additional reason for more "panic mode" than expected is that some of the Multi work issued since the production start has a 7-day deadline, shorter than usual for Einstein production.  Most of the rest seems to have been issued with a 14-day deadline.  I don't know what the distinction is, nor whether 7-day deadline issuance continues at the current time.


Only the very first GW i had was with 7 day deadline, all others are 14 day.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5874
Credit: 118376575478
RAC: 25571597

archae86 wrote:ku4eto wrote:I

archae86 wrote:
ku4eto wrote:
I guess it should give almost as much credit as the FGRPSEE?

Maybe.  I imagine one use of the tuning run was to allow the estimates for relative unit execution time to be made pretty accurate.  And I think in general the project strives to match credit award rates across the total user population by application.

Exactly!  It's always a compromise and there will always be 'winners' and 'losers' in the credit stakes.

But if you stop to think for a moment about the enormity of what is being undertaken, credit issues will fade into the background.  We know GWs exist and can be detected.  We have the big 'Wow!' signal from BH/BH mergers to thank for that - something that E@H was never going to be the first to detect.  But the E@H 'supercomputer' - you, the ordinary citizen scientist group - is ideally placed to find the even bigger 'Wow event', the first detection of the incredibly hard to detect continuous GW emission from spinning massive objects like neutron stars that aren't coalescing and giving the big, sudden 'spike' type event.

archae86 wrote:
I'm just glad this major effort is up and running now.

Absolutely!!

 

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.