O2 v1.07 (GW-opencl-nvidia) 100% invalid/inconclusive results

Stef

Joined: 8 Mar 05

Posts: 206

Credit: 110568193

RAC: 0

22 Sep 2019 1:18:45 UTC

Topic 219623

(moderation:

)

Hi,

This host (GTX 1050 on linux) is producing only invalid and inconclusive results for the GPU GW application so far. No matter if 1x, 2x or 4x simultaneous jobs were running.

Is there any point in keeping it running (i.e. helping to optimize the verification process) or should I disable the beta test?
Other projects run fine, FGRP had a long term invalid rate of <1% IIRC.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7219754931

RAC: 947822

That host also has eight

22 Sep 2019 2:00:51 UTC

Message 173446

(moderation:

)

That host also has eight recently generated inconclusive results on V1.01 GW work, which is CPU-only. I don't know what fraction of those will eventually be found invalid (for GPU V1.07 work, inconclusive seems usually to predict eventual invalid with high confidence, but CPU may be different).

While we have seen quite a range of host invalidation rates and patterns on V1.07 GPU work, I don't think the results on that machine for CPU-only V1.01 CPU are in the expected range. Possibly the machine is unhealthy in some way relevant to both CPU and GPU GW work on the current applications.

Bernd has advised that invalid (and on the way there inconclusive) results are currently valuable to the GW GPU beta. I'm unclear on whether he values long-continued production of invalid results from a machine already shown to produce them. Personally, I've chosen to move a machine which produced 100% GPU GW invalid back to GRP, but keep running GW on a machine which bursts on and off--a dozen or more of valid tasks in a row, then a half dozen invalid in a row, and back again.

Stef

Joined: 8 Mar 05

Posts: 206

Credit: 110568193

RAC: 0

Hm. I never had an invalid

22 Sep 2019 10:46:44 UTC

Message 173448

(moderation:

)

Hm. I never had an invalid CPU task, as far as I remember. I will observe that. Unfortunately I can't see the wingman's results.

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

Keep an eye on it but also

22 Sep 2019 21:42:50 UTC

Message 173457

(moderation:

)

Keep an eye on it but also remember that any beta-GPU task will be paired with a non-beta CPU task and if it's declared inconclusive then both will show that status until the tiebreaker is in. So it might turn out that your CPU is still doing just fine.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7219754931

RAC: 947822

Holmis wrote:any beta-GPU

22 Sep 2019 22:11:08 UTC

Message 173458 in response to message 173457

(moderation:

)

Holmis wrote:

any beta-GPU task will be paired with a non-beta CPU task and if it's declared inconclusive then both will show that status until the tiebreaker is in.

Good point. This is where the invisibility of the quorum partners until final resolution impairs our user ability to judge. For example, it could be that all of the CPU tasks from this machine current showing inconclusive status have as quorum partner one single GPU machine that is producing 100% invalid results on GW V1.07 (they exist).

But, because this GW search is configured to allow for single-task quorums with known reliable CPU hosts (not GPU) the quorum invisibility is in effect.

Matt White

Joined: 9 Jul 19

Posts: 120

Credit: 280798376

RAC: 0

NVIDIA GeForce GTX 1050

23 Sep 2019 11:46:27 UTC

Message 173468

(moderation:

)

NVIDIA GeForce GTX 1050 (1999MB) driver: 430.40

Stef wrote:

Hi,

This host (GTX 1050 on linux) is producing only invalid and inconclusive results for the GPU GW application so far. No matter if 1x, 2x or 4x simultaneous jobs were running.

Is there any point in keeping it running (i.e. helping to optimize the verification process) or should I disable the beta test?
Other projects run fine, FGRP had a long term invalid rate of <1% IIRC.

You might try a different version of the driver. 375.39 is showing as current on the NVIDIA website; I'm wondering if your driver is a beta release? I don't remember seeing a production driver with that high a revision number. While I was running the GW beta task, my NVIDIA validation rate was between 40 and 60%, of course, the devs were still tweaking the validation process. Another note, my NVIDIA card is in a Windows box, so mileage will vary.

Clear skies,

Matt

Stef

Joined: 8 Mar 05

Posts: 206

Credit: 110568193

RAC: 0

I'm using the nvidia driver

23 Sep 2019 12:29:25 UTC

Message 173470

(moderation:

)

I'm using the nvidia driver that is supplied by debian testing (non-free).

nvidia.com lists version 430.50 as current driver release:
https://www.nvidia.com/Download/driverResults.aspx/151568/en-us

As of now, all GPU-GW (22) tasks turned out invalid. All CPU-GW (11) tasks turned out valid.

I've disabled beta for now. There are still many inconclusive tasks yet to verify.

Alexander Favorsky

Joined: 18 Jun 16

Posts: 36

Credit: 176102386

RAC: 73257

For slightly more than a week

29 Sep 2019 5:02:56 UTC

Message 173574

(moderation:

)

For slightly more than a week I keep getting invalid results for almost all GPU-GW v1.07 tasks yet before most of them have validated successfully. CPU tasks are okay mostly. Something happened to the data or the GPU app? I've noticed that my GPU tasks don't validate against CPU versions but CPU to CPU validates okay.

Matt White

Joined: 9 Jul 19

Posts: 120

Credit: 280798376

RAC: 0

Alexander Favorsky wrote:For

29 Sep 2019 12:15:32 UTC

Message 173577 in response to message 173574

(moderation:

)

Alexander Favorsky wrote:

For slightly more than a week I keep getting invalid results for almost all GPU-GW v1.07 tasks yet before most of them have validated successfully. CPU tasks are okay mostly. Something happened to the data or the GPU app? I've noticed that my GPU tasks don't validate against CPU versions but CPU to CPU validates okay.

This is a common issue, discussed at length in this link.

Clear skies,

Matt

Jacob Klein

Joined: 22 Jun 11

Posts: 45

Credit: 114028547

RAC: 0

Was there any resolution to

27 Oct 2019 21:47:49 UTC

Message 174089

(moderation:

)

Was there any resolution to this?

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117512516960

RAC: 35435356

Jacob Klein wrote:Was there

28 Oct 2019 0:46:00 UTC

Message 174097 in response to message 174089

(moderation:

)

Jacob Klein wrote:

Was there any resolution to this?

If by "this", you are referring to lots of invalid results for previous versions of the GW GPU app, then the answer is "yes". The app is still under test but current versions seem to be working OK and giving valid results.

If you are referring to something else, please specify.

This thread that you have posted in refers to the V1.07 app for the O2AS (All Sky) search. V1.09 of the app was the one that finally achieved a high proportion of valid results. That search was abruptly terminated and a new search O2MD1 (Multi Directed) was started whose aim was to use the now successful GPU app to 'target' some known pulsars rather than do a much lengthier 'all sky' search (which I guess they'll come back to at some later point). The app is now V2.0x and has had some more tweaks. Results seem to be valid but there are some oddities with the scheduler distributing work correctly.

As conditions will change and problems will show up from time to time, if you want to participate when new apps are being tested, you really need to follow the discussion threads for each new search as it comes along. That way you will always know of problems and fixes as they happen.

BTW, the link in the comment by Matt White (immediately before yours) points to a non-existent thread. That would be a deficiency in the forum software that is not able to handle fixing links to thread titles where the title has subsequently changed. When it became apparent that GW searches could quickly change (eg from O2AS to O2MD) I changed the title of that thread so that there could be two discussion threads, one for each different search. So the link to the thread that Matt was actually pointing to at the time he posted, should now be this link. This is ancient history for now but it wouldn't surprise to see O2AS resume once O2MD finishes, which probably won't be all that far away :-)

EDIT: After posting the above, I've just now caught up with the latest in the O2MD1 discussion thread. It seems there may be a significant problem with the very recently released V2.02 app such that lots of results are giving validate errors when eventually being presented for validation. Anyone reading this would be wise to not run this new version until Bernd has had time to comment about the problem.

I started running this version on a couple of hosts late yesterday - 14hrs ago. There are 3 attempted validations so far, all of which are validate errors so I've suspended crunching of GW tasks and returned the machines to FGRPB1G until further advice from Bernd.

Cheers,
Gary.

O2 v1.07 (GW-opencl-nvidia) 100% invalid/inconclusive results

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports