O2 v1.07 (GW-opencl-nvidia) 100% invalid/inconclusive results

Stef
Stef
Joined: 8 Mar 05
Posts: 206
Credit: 110568193
RAC: 0
Topic 219623

Hi,

This host (GTX 1050 on linux) is producing only invalid and inconclusive results for the GPU GW application so far. No matter if 1x, 2x or 4x simultaneous jobs were running.

Is there any point in keeping it running (i.e. helping to optimize the verification process) or should I disable the beta test?
Other projects run fine, FGRP had a long term invalid rate of <1% IIRC.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7223874931
RAC: 1001563

That host also has eight

That host also has eight recently generated inconclusive results on V1.01 GW work, which is CPU-only.  I don't know what fraction of those will eventually be found invalid (for GPU V1.07 work, inconclusive seems usually to predict eventual invalid with high confidence, but CPU may be different).

While we have seen quite a range of host invalidation rates and patterns on V1.07 GPU work, I don't think the results on that machine for CPU-only V1.01 CPU are in the expected range.  Possibly the machine is unhealthy in some way relevant to both CPU and GPU GW work on the current applications.

Bernd has advised that invalid (and on the way there inconclusive) results are currently valuable to the GW GPU beta.  I'm unclear on whether he values long-continued production of invalid results from a machine already shown to produce them.  Personally, I've chosen to move a machine which produced 100% GPU GW invalid back to GRP, but keep running GW on a machine which bursts on and off--a dozen or more of valid tasks in a row, then a half dozen invalid in a row, and back again.

Stef
Stef
Joined: 8 Mar 05
Posts: 206
Credit: 110568193
RAC: 0

Hm. I never had an invalid

Hm. I never had an invalid CPU task, as far as I remember. I will observe that. Unfortunately I can't see the wingman's results.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Keep an eye on it but also

Keep an eye on it but also remember that any beta-GPU task will be paired with a non-beta CPU task and if it's declared inconclusive then both will show that status until the tiebreaker is in. So it might turn out that your CPU is still doing just fine.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7223874931
RAC: 1001563

Holmis wrote:any beta-GPU

Holmis wrote:
any beta-GPU task will be paired with a non-beta CPU task and if it's declared inconclusive then both will show that status until the tiebreaker is in.

Good point.  This is where the invisibility of the quorum partners until final resolution impairs our user ability to judge.  For example, it could be that all of the CPU tasks from this machine current showing inconclusive status have as quorum partner one single GPU machine that is producing 100% invalid results on GW V1.07 (they exist). 

But, because this GW search is configured to allow for single-task quorums with known reliable CPU hosts (not GPU) the quorum invisibility is in effect.

Matt White
Matt White
Joined: 9 Jul 19
Posts: 120
Credit: 280798376
RAC: 0

NVIDIA GeForce GTX 1050

NVIDIA GeForce GTX 1050 (1999MB) driver: 430.40

Stef wrote:

Hi,

This host (GTX 1050 on linux) is producing only invalid and inconclusive results for the GPU GW application so far. No matter if 1x, 2x or 4x simultaneous jobs were running.

Is there any point in keeping it running (i.e. helping to optimize the verification process) or should I disable the beta test?
Other projects run fine, FGRP had a long term invalid rate of <1% IIRC.

You might try a different version of the driver. 375.39 is showing as current on the NVIDIA website; I'm wondering if your driver is a beta release? I don't remember seeing a production driver with that high a revision number. While I was running the GW beta task, my NVIDIA validation rate was between 40 and 60%, of course, the devs were still tweaking the validation process. Another note, my NVIDIA card is in a Windows box, so mileage will vary.

Clear skies,
Matt
Stef
Stef
Joined: 8 Mar 05
Posts: 206
Credit: 110568193
RAC: 0

I'm using the nvidia driver

I'm using the nvidia driver that is supplied by debian testing (non-free).

nvidia.com lists version 430.50 as current driver release:
https://www.nvidia.com/Download/driverResults.aspx/151568/en-us

As of now, all GPU-GW (22) tasks turned out invalid. All CPU-GW (11) tasks turned out valid.

I've disabled beta for now. There are still many inconclusive tasks yet to verify.

Alexander Favorsky
Alexander Favorsky
Joined: 18 Jun 16
Posts: 36
Credit: 176383018
RAC: 74344

For slightly more than a week

For slightly more than a week I keep getting invalid results for almost all GPU-GW v1.07 tasks yet before most of them have validated successfully. CPU tasks are okay mostly. Something happened to the data or the GPU app? I've noticed that my GPU tasks don't validate against CPU versions but CPU to CPU validates okay.

Matt White
Matt White
Joined: 9 Jul 19
Posts: 120
Credit: 280798376
RAC: 0

Alexander Favorsky wrote:For

Alexander Favorsky wrote:
For slightly more than a week I keep getting invalid results for almost all GPU-GW v1.07 tasks yet before most of them have validated successfully. CPU tasks are okay mostly. Something happened to the data or the GPU app? I've noticed that my GPU tasks don't validate against CPU versions but CPU to CPU validates okay.

This is a common issue, discussed at length in this link.

Clear skies,
Matt
Jacob Klein
Jacob Klein
Joined: 22 Jun 11
Posts: 45
Credit: 114028547
RAC: 0

Was there any resolution to

Was there any resolution to this?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117640352832
RAC: 35176907

Jacob Klein wrote:Was there

Jacob Klein wrote:
Was there any resolution to this?

If by "this", you are referring to lots of invalid results for previous versions of the GW GPU app, then the answer is "yes".  The app is still under test but current versions seem to be working OK and giving valid results.

If you are referring to something else, please specify.

This thread that you have posted in refers to the V1.07 app for the O2AS (All Sky) search.  V1.09 of the app was the one that finally achieved a high proportion of valid results.  That search was abruptly terminated and a new search O2MD1 (Multi Directed) was started whose aim was to use the now successful GPU app to 'target' some known pulsars rather than do a much lengthier 'all sky' search (which I guess they'll come back to at some later point).  The app is now V2.0x and has had some more tweaks.  Results seem to be valid but there are some oddities with the scheduler distributing work correctly.

As conditions will change and problems will show up from time to time, if you want to participate when new apps are being tested, you really need to follow the discussion threads for each new search as it comes along.  That way you will always know of problems and fixes as they happen.

BTW, the link in the comment by Matt White (immediately before yours) points to a non-existent thread.  That would be a deficiency in the forum software that is not able to handle fixing links to thread titles where the title has subsequently changed.  When it became apparent that GW searches could quickly change (eg from O2AS to O2MD) I changed the title of that thread so that there could be two discussion threads, one for each different search.  So the link to the thread that Matt was actually pointing to at the time he posted, should now be this link.  This is ancient history for now but it wouldn't surprise to see O2AS resume once O2MD finishes, which probably won't be all that far away :-)

EDIT:  After posting the above, I've just now caught up with the latest in the O2MD1 discussion thread.  It seems there may be a significant problem with the very recently released V2.02 app such that lots of results are giving validate errors when eventually being presented for validation.  Anyone reading this would be wise to not run this new version until Bernd has had time to comment about the problem.

I started running this version on a couple of hosts late yesterday - 14hrs ago.  There are 3 attempted validations so far, all of which are validate errors so I've suspended crunching of GW tasks and returned the machines to FGRPB1G until further advice from Bernd.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.