The O2-All Sky Gravitational Wave Search on GPUs - discussion thread.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117664406031

RAC: 35172157

You can't see whether or not

2 Sep 2019 8:16:00 UTC

Message 173066 in response to message 173065

(moderation:

)

You can't see whether or not you have quorum partners. This is a deliberate anti-cheating measure since (at some point in the future when all is working properly) a proportion of tasks are intended to be distributed singly for trusted hosts. However, for the period of testing, there will always be a CPU task for each GPU task.

If you look through your tasks list you will see (right now) 3 inconclusives which must have a returned quorum partner for that status to be given. You also have 1 pending which must not yet have a returned partner task. That difference is how you know whether there is a returned task or not. For each inconclusive result, there will be a 3rd task (which must be a CPU task) that will decide what gets validated - and it's highly likely that 2 CPU tasks will 'gang up' on the unfortunate GPU task. That is why Archae86 can suggest that the likelihood is that your inconclusives will eventually become invalid. It sucks, but that's the way it is.

EDIT: Unfortunately, there is no separate filter/category for inconclusives. Without directly counting them over all pages of results, use the formula (after selecting the search of interest) :-

Inconclusives = All - In progress - Pending - Valid - Invalid - Error. Currently = 3.

Cheers,
Gary.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7225014931

RAC: 1045496

Gary Roberts wrote: a

2 Sep 2019 14:20:08 UTC

Message 173072 in response to message 173066

(moderation:

)

Gary Roberts wrote:

a proportion of tasks are intended to be distributed singly for trusted hosts. However, for the period of testing, there will always be a CPU task for each GPU task.

It is worth remembering that the current GW work has been going out to CPU hosts for some time. I believe it is already the case that some CPU hosts have earned the "trusted" rating, and that tasks are often sent to them with replication of 1.

As Gary says, this, in turn, triggers the otherwise undesirable "hiding" of one's quorum partners until validation work on the workunit has been wrapped up.

Unless something really radical changes to raise the GPU task validation rate and consistency, I don't imagine the project will enable replication of one for any GPU tasks any time soon.

By the way, my host which a couple of days ago suddenly changed from 99% to 0% validation rate on GW GPU tasks spent the night running dozens of GRP GPU tasks. It had no trouble with them--lots of validations, no invalid results, no inconclusives pending resolution.

Matt White

Joined: 9 Jul 19

Posts: 120

Credit: 280798376

RAC: 0

Gary Roberts

2 Sep 2019 15:26:41 UTC

Message 173075 in response to message 173066

(moderation:

)

Gary Roberts wrote:

Inconclusives = All - In progress - Pending - Valid - Invalid - Error. Currently = 3.

Using Gary's formula, I came up with a number of 2.

I'm seeing results similar to what others are seeing, that is, running at x2 appears to be a good compromise. I see more invalids with the LINUX/AMD set up, but that particular box is doing twice the amount of GPU work as the NVIDIA/Win 7 box. Interesting, the CPU side of the house (both boxes) is crunching almost all GW work, with the occasional gamma-ray #5 tossed in for good measure, The Linux box CPU is all LIBC215 work.

Not really much else to report.

Clear skies,

Matt

Keith Myers

Joined: 11 Feb 11

Posts: 4964

Credit: 18744357688

RAC: 7012286

How is this something I have

2 Sep 2019 16:16:40 UTC

Message 173076

(moderation:

)

How is this something I have never noticed before on Einstein? Has it always been this way? I have always had a wingman for all my Gamma Ray Pulsar tasks, so have always been able to compare my results to someone else.

Is this a new policy at Einstein that there will not be replication for new apps? Or is this the normal policy for beta apps?

cecht

Joined: 7 Mar 18

Posts: 1535

Credit: 2907935434

RAC: 2146469

From Gary's equation, I have

2 Sep 2019 16:37:44 UTC

Message 173077

(moderation:

)

From Gary's equation, I have 13 inconclusive tasks on host #1 (4% of All; 11 from the defunct v1.08 app) and 20 inconclusive on host #2 (2.6% of All; 15 from the v1.08 app). For the v1.07 app, host #1 has 142 valid with 0% invalid, host #2 has 382 valid with 2% invalid (8 tasks, all from 28 Aug).

So for whatever reason, invalid:valid rates on my systems are quite low, excluding that anomaly on 28 August. Both of these Linux/AMD hosts are running 3x tasks, though the tally for host #1 includes a short stint at 2x.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Keith Myers

Joined: 11 Feb 11

Posts: 4964

Credit: 18744357688

RAC: 7012286

Anybody have one of the new

2 Sep 2019 18:19:08 UTC

Message 173078

(moderation:

)

Anybody have one of the new 1.07 beta tasks running at 1X take over 2 hours to run on the gpu? I have one that has two 60 minute crunch sessions so far accumulated and is around 86% complete with 18 minutes left to run. All the other tasks run so far have taken less than 30 minutes to run. Is this task an outlier? Do some of the task have a more complicated parameter set than the majority?

Computer: Numbskull Project Einstein@Home

Name h1_0565.75_O2C02Cl1In0__O2AS20-500_565.90Hz_941_0

Application Continuous Gravitational Wave search O2 All-Sky 1.07 (GW-opencl-nvidia)
Workunit name h1_0565.75_O2C02Cl1In0__O2AS20-500_565.90Hz_941
State Waiting to run
Received 8/29/2019 4:47:48 PM
Report deadline 9/12/2019 4:46:29 PM
Estimated app speed 56.98 GFLOPs/sec
Estimated task size 144,000 GFLOPs
Resources 1 CPU + 1 NVIDIA GPU
CPU time at last checkpoint 02:00:13
CPU time 02:00:14
Elapsed time 02:00:29
Estimated time remaining 00:18:20
Fraction done 86.788%
Virtual memory size 21,770.23 MB
Working set size 250.64 MB
Directory slots/9
Process ID 45464

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

Both of my hosts have stopped

2 Sep 2019 19:44:32 UTC

Message 173079

(moderation:

)

Both of my hosts have stopped getting new tasks. Anyone else?

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7225014931

RAC: 1045496

Richie wrote:Both of my hosts

2 Sep 2019 19:47:03 UTC

Message 173080 in response to message 173079

(moderation:

)

Richie wrote:

Both of my hosts have stopped getting new tasks. Anyone else?

I just got dozens of new GW tasks about a minute ago.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117664406031

RAC: 35172157

Keith Myers wrote:How is this

2 Sep 2019 22:08:00 UTC

Message 173081 in response to message 173076

(moderation:

)

Keith Myers wrote:

How is this something I have never noticed before on Einstein? Has it always been this way?

No, it hasn't always been this way but it has been around for quite a while for CPU-only searches. The aim appears to be to use it for GPU searches, if possible, when the app is mature.

It's a BOINC feature called 'Adaptive Replication'. Bernd commented about it last year when the CPU only version of O2AS was first being set up. I had forgotten that it has been used in FGRP5 (CPU only) as well, and for a longer period. Once an app becomes 'highly trusted' to give correct results, a lot of time can be saved. I'm not aware that it has been used previously for a GPU search.

Keith Myers wrote:

I have always had a wingman for all my Gamma Ray Pulsar tasks, so have always been able to compare my results to someone else.

I imagine you're referring to the FGRPB1G GPU search and not the FGRP5 CPU search. Unless you were running the CPU search, you wouldn't necessarily have noticed it :-).

Keith Myers wrote:

Is this a new policy at Einstein that there will not be replication for new apps? Or is this the normal policy for beta apps?

The policy seems to be that during the testing phase, tasks for a new GPU app should always be tested against a known 'good' CPU only version. Once confidence in the new app is established, adaptive replication will likely be used as a means to shorten the overall time that the search is likely to take. With new O3 data at higher sensitivity being generated, there's not much point dragging out the time to complete the O2 search if it's 'safe' to shorten that time through adaptive replication. I'm sure the scientists know what they're doing :-).

EDIT: In case anyone is interested, this thread seems to mark the introduction of adaptive replication to FGRP5. It was immediately noticed by an observant volunteer (and questioned) and the responses from Christian Beer and Richard Haselgrove are quite interesting.

In particular, Richard checked and then pointed out that whether or not the quorum is 'hidden' is actually configurable. In view of the fact that a testing phase is the very time when there are likely to be lots of validation issues and an extremely low likelihood of someone deciding to 'scam' the system, it might be a nice gesture for the project staff to allow the testers to 'see' partially completed quorums. Presumaby, this could easily be reversed when the testing phase is complete.

Cheers,
Gary.

Keith Myers

Joined: 11 Feb 11

Posts: 4964

Credit: 18744357688

RAC: 7012286

Thanks for the explanation

2 Sep 2019 22:29:36 UTC

Message 173082

(moderation:

)

Thanks for the explanation Gary. No I don't do any cpu tasks at Einstein. Only gpu.

The O2-All Sky Gravitational Wave Search on GPUs - discussion thread.

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner