Multi-Directed Continuous GW production work begins

archae86

Joined: 6 Dec 05

Posts: 3164

Credit: 7368721687

RAC: 2233206

24 Nov 2016 15:06:23 UTC

Topic 203088

(moderation:

)

I believe in the last 24 hours or less, it appears that production distribution of the Multi-Wave Continuous Gravitational Wave Search has begun.

I thought a user thread for observations could be useful, though problems probably should be brought up in the Problems and Bug Reports forum.

As with the Tuning runs, it appears that there are two flavors, G, and CV. I think the intention is to distribute the CV flavor to more capable machines. However in my case a Haswell host I think one of my more capable is getting G work.

For those of us used to the extremely constant work content (and thus run time on well-stabilized machines) of several recent Einstein applications, this work has a remarkable degree of varying expected run times. G type work distributed to that Haswell host displays a current estimated elapsed time ranging over a greater than 50:1 range!.

CV work comes in two sub-flavors as disclosed by the task name, VelaJr and CasA. Of these the tasks distributed to my most capable host (a different model Haswell) have already shown a 30:1 estimate range for CasA, and a 6:1 range for VelaJr. I doubt the tasks I happen to have received represent the full range.

A couple of administrative overhead observations:

The enabling designations for the production runs seen on one's account's project preferences web page appear to be new (differing from those for the tuning runs), so perhaps may be turned on by default--so lots of USA users may come back from Thanksgiving distractions to find unexpected new work in progress.

In my case I had most of my hosts not running CPU work. The newly running Multi-Directed Continuous GW work takes enough longer than the runtime estimated based on my GPU work that several of my hosts fell into "panic mode" shortly after they arrived. While this is not a dire problem, I've chosen to lower the requested work queue from 3.5 to 2.5 days on two machines, hoping to quell it.

ku4eto

Joined: 29 Oct 16

Posts: 25

Credit: 152116

RAC: 0

Yup, i already got around 3

24 Nov 2016 16:02:49 UTC

Message 152096

(moderation:

)

Yup, i already got around 3 hours ago my first GW : https://einsteinathome.org/workunit/263559925

I am running this on a VM with 2 cores of a E5 2660 v3 @2.2Ghz. The estimated time was ~1.05, yet it took around 2hours and half to complete.

For some reason, it was Running good, then for 2-3 seconds the progress bar stops (like on 56.731 for example), and after few seconds it starts again.

Another interesting thing is, the other person i was paired with, has almost the same CPU, but with 16 assigned cores. Yet, his Runtime and CPU time are half of mine.

The run version for my task was v1.00 G for some reason, i have seen on previous GW work v1.01

niktak11

Joined: 9 Apr 16

Posts: 1

Credit: 12363138

RAC: 0

The new tasks I got are

24 Nov 2016 18:47:35 UTC

Message 152102

(moderation:

)

The new tasks I got are expected to finish around the 18 hour mark...running on a 4770k at 4.2Ghz

Maybe the initial tasks that you guys got were shorter?

Christian Beer

Joined: 9 Feb 05

Posts: 595

Credit: 197516671

RAC: 28098

Please note that I already

24 Nov 2016 18:57:30 UTC

Message 152103

(moderation:

)

Please note that I already wrote about the search in the first post of Multi-Directed Gravitational Wave Search.

I'll add a paragraph about the varying runtimes you see (which is normal behavior).

archae86

Joined: 6 Dec 05

Posts: 3164

Credit: 7368721687

RAC: 2233206

niktak11 wrote:Maybe the

24 Nov 2016 20:26:18 UTC

Message 152104 in response to message 152102

(moderation:

)

niktak11 wrote:

Maybe the initial tasks that you guys got were shorter?

I already mentioned the very wide range of estimated work content on tasks I've received. I'm just beginning to have enough completed results to give some ground truth on actual elapsed times. I deliberately expedited consideration of some work predicted to have short run times, and have observed as little as just under 6 minutes. The longest time for a task actually complete so far was a little over 9 hours. Admittedly that last is on a laptop. but it is a SkyLake running only a single CPU task, and delivers FGRPSSE-Beta times of just under 5 hours, so it is not all so very slow a machine.

ku4eto

Joined: 29 Oct 16

Posts: 25

Credit: 152116

RAC: 0

Completed another GW WU with

25 Nov 2016 9:41:08 UTC

Message 152105

(moderation:

)

Completed another GW WU with ... 4400 seconds of run time -~75 mins. Over twice as fast compared to the first one. Running a third one now, but not sure how long it will take. I will have to complete a task on the desktop PC to see if i will be able to get a GW there.

EDIT: It took as much as a FGRPSSE job - ~7 hours (24,400 CPU time)! I guess it should give almost as much credit as the FGRPSEE?

Also, it seems like my VM is now receiving only GW WUs (aside from 1-2 final FGRPSSE). All of them are AVX ones.

archae86

Joined: 6 Dec 05

Posts: 3164

Credit: 7368721687

RAC: 2233206

ku4eto wrote:I guess it

25 Nov 2016 13:55:37 UTC

Message 152123 in response to message 152105

(moderation:

)

ku4eto wrote:

I guess it should give almost as much credit as the FGRPSEE?

Maybe. I imagine one use of the tuning run was to allow the estimates for relative unit execution time to be made pretty accurate. And I think in general the project strives to match credit award rates across the total user population by application.

But there is a huge range of size on this Multi work. I've personally observed individual validated credit unit awards ranging from 20 through 1000 in just the first day, so that may well not be the full range. Further, even if they succeed in matching the average credit rate across all hosts there is no guarantee that your particular host is matched. I had only one host that was running FGRPSSE work, and it has only one validated Multi unit. On that single credit rate comparison I saw the Multi award rate to be 78% of the FGRPSSE.

I'm just glad this major effort is up and running now.

archae86

Joined: 6 Dec 05

Posts: 3164

Credit: 7368721687

RAC: 2233206

archae86 wrote:In my case I

25 Nov 2016 16:44:41 UTC

Message 152124

(moderation:

)

archae86 wrote:

In my case I had most of my hosts not running CPU work. The newly running Multi-Directed Continuous GW work takes enough longer than the runtime estimated based on my GPU work that several of my hosts fell into "panic mode" shortly after they arrived. While this is not a dire problem, I've chosen to lower the requested work queue from 3.5 to 2.5 days on two machines, hoping to quell it.

An additional reason for more "panic mode" than expected is that some of the Multi work issued since the production start has a 7-day deadline, shorter than usual for Einstein production. Most of the rest seems to have been issued with a 14-day deadline. I don't know what the distinction is, nor whether 7-day deadline issuance continues at the current time.

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

archae86 wrote:In my case I

25 Nov 2016 17:12:53 UTC

Message 152125

(moderation:

)

archae86 wrote:

In my case I had most of my hosts not running CPU work. The newly running Multi-Directed Continuous GW work takes enough longer than the runtime estimated based on my GPU work that several of my hosts fell into "panic mode" shortly after they arrived. While this is not a dire problem, I've chosen to lower the requested work queue from 3.5 to 2.5 days on two machines, hoping to quell it.

The recent versions of BOINC are supposed to calculate this separately for the GPU and the CPU. It has not been a problem for me with 7.6.33 (Win7 64-bit and Ubuntu 16.xx). I have a wide variety of CPU and GPU work, but the only time I see a panic mode is when there is an initial misestimation of the run times, combined with an unrealistically short deadline. I usually keep the default buffer though (0.1 + 0.5 days), and sometimes shorter.

ku4eto

Joined: 29 Oct 16

Posts: 25

Credit: 152116

RAC: 0

archae86 wrote:archae86

25 Nov 2016 19:39:32 UTC

Message 152129 in response to message 152124

(moderation:

)

archae86 wrote:

archae86 wrote:
In my case I had most of my hosts not running CPU work. The newly running Multi-Directed Continuous GW work takes enough longer than the runtime estimated based on my GPU work that several of my hosts fell into "panic mode" shortly after they arrived. While this is not a dire problem, I've chosen to lower the requested work queue from 3.5 to 2.5 days on two machines, hoping to quell it.

An additional reason for more "panic mode" than expected is that some of the Multi work issued since the production start has a 7-day deadline, shorter than usual for Einstein production. Most of the rest seems to have been issued with a 14-day deadline. I don't know what the distinction is, nor whether 7-day deadline issuance continues at the current time.

Only the very first GW i had was with 7 day deadline, all others are 14 day.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119335396139

RAC: 25739739

archae86 wrote:ku4eto wrote:I

25 Nov 2016 20:55:17 UTC

Message 152132 in response to message 152123

(moderation:

)

archae86 wrote:

ku4eto wrote:
I guess it should give almost as much credit as the FGRPSEE?

Maybe. I imagine one use of the tuning run was to allow the estimates for relative unit execution time to be made pretty accurate. And I think in general the project strives to match credit award rates across the total user population by application.

Exactly! It's always a compromise and there will always be 'winners' and 'losers' in the credit stakes.

But if you stop to think for a moment about the enormity of what is being undertaken, credit issues will fade into the background. We know GWs exist and can be detected. We have the big 'Wow!' signal from BH/BH mergers to thank for that - something that E@H was never going to be the first to detect. But the E@H 'supercomputer' - you, the ordinary citizen scientist group - is ideally placed to find the even bigger 'Wow event', the first detection of the incredibly hard to detect continuous GW emission from spinning massive objects like neutron stars that aren't coalescing and giving the big, sudden 'spike' type event.

archae86 wrote:

I'm just glad this major effort is up and running now.

Absolutely!!

Cheers,
Gary.

Multi-Directed Continuous GW production work begins

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner