The O2-All Sky Gravitational Wave Search on GPUs - discussion thread.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109399106660
RAC: 35681496

archae86 wrote:I hope other

archae86 wrote:
I hope other users with 3X or 4X experience of multiple successes or multiple failures will report helping to build up this picture.  Reports contrasting 1X/2X/3X/4X success on the same system would be particularly helpful.

Let me say to start with that I'm dipping my toes rather cautiously into what appears to be a quite murky pond.   I'm a little reluctant to disturb what is currently a very stable and productive fleet :-).  I'd like to see a pattern of stable crunching with the new app before making a move.

I'm not worried about credit.  I'm also not worried about low GPU utilisation, particularly if running multiple concurrent tasks can improve that.  My main concern is about the prospect of lots of results failing validation.  Seemingly, most tasks will not be matched against the results of a quorum partner so how does anyone know if those results are correct or not?

It would be nice to have an explanation of why that doesn't seem to matter.  Maybe they just want an approximate candidate list for each task so that the best fraction of all those zillions of candidates can be re-processed in-house at higher accuracy/resolution.  After all, they just need a single verified detection of continuous GW to score the jackpot.

I have essentially two distinct groups of hosts.  The first group runs AMD Southern Islands (SI) GPUs.  The architecture is GCN 1st generation.  The second group runs mainly AMD Polaris GPUs.  These are GCN 4th gen.  There is also one 2nd gen GPU and one 3rd gen GPU in the 2nd group and running exactly the same driver mix.  Currently, the staple diet of both groups of hosts has been the FGRPB1G GPU tasks.

The first group runs a 2016 version of the OS and the proprietary fglrx/opencl driver mix - the final version before AMD deprecated that bundle in late 2016.  The second group uses a quite recent OS version with the open source amdgpu kernel module as the graphics driver with the OpenCL capability coming from the OpenCL libs from the Red Hat version of the AMDGPU-PRO package.  If I run hosts from the first group using the latest OS/amdgpu graphics/OpenCL from AMDGPU-PRO, crunching on FGRPB1G tasks will work but results do not validate.  I intend to continue trying but for the moment (until I can get validation) I intend to leave the first group back at the 2016 point.

The purpose of this message is to document some tests with the GW V1.07 app that I've performed over the last couple of days.  For the SI based hosts (1st group) I chose the machine with a HD 7850 GPU that I've been using to test the most recent amdgpu driver with OpenCL from AMDGPU-PRO.  I've run the new app in two separate tests using the 2016 OS and fglrx driver for one test and the recent OS with OpenCL from AMDGPU-PRO for the 2nd test.  In both these tests, tasks were able to start crunching (at a respectable rate initially) but soon degenerated markedly to a crawl.  I've described what I saw happening in this message.

The upshot of both those tests was that after reaching around 30% completed in around 30 mins, the progress counter reset to 0%  and then progressed for many hours at an extremely slow rate so that the projected finishing time was many days in the future and most certainly longer than an average CPU core would have taken to crunch the task.  In one test I let things run for about 10 hours and about 5 hours for the other.  In both cases the % completed was still in single digit territory.  There is a mechanism in place to fail tasks that are taking too long and I wasn't too keen to get a "time limit exceeded" failure message, so I aborted the task in each case.

The tests for Polaris GPUs showed no evidence of progress resetting to zero and restarting at a very slow rate.  I chose two hosts, both with the same brand of RX 570.  One had a relatively recent CPU, a G4560 Pentium (2 cores/4 threads 3.5GHz).  The GW tasks are here.  The other is a 2008 Q6600 quad core @ 2.4GHz.  The GW tasks are here.  I have summarised the results (so far) for both below.

Q6600 Crunch time  - 1x = 5880s per task ( 4 tasks)
                   - 2x = 3020s per task ( 2 tasks)
                   - 3x = 2590s per task (15 tasks)

            Results:  Total=21   Pending= 3   Inconclusive=12   Invalid= 6   Valid= 0

G4560 Crunch time  - 1x = ~4000s per task ( 1 task, but not fully crunched @ 1x)
                   - 2x = not measured
                   - 3x = ~1600s per task (>40 tasks)

            Results:  Total=45   Pending=16   Inconclusive= 0   Invalid= 1   Valid=28

The most disturbing thing is that the host with the old and slow CPU seems likely to have all invalid results, even at 1x.  The more modern CPU has no inconclusives and only 1 invalid.  That seems to indicate that validation itself (and not just the elapsed time) depends on CPU architecture/speed.  That seems strange.  I can understand longer crunch times but why the validation outcome?

I'll wait longer for the validation outcome for the G4560 host to be a bit more complete but so far it looks pretty good with no inconclusives.  I checked quite a few tasks as they became valid and those were all verified against a CPU task and were not just single task quorums.  I hope Bernd will find the time to make some sort of comment about all the invalid results and in particular about what sort of CPUs may be needed to process these tasks successfully.  As I've said before, I'm prepared to upgrade the CPU if that's needed.

Cheers,
Gary.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4267
Credit: 244927581
RAC: 16505

We updated the validator this

We updated the validator this morning. According to our internal tests, about 27,5% of the invalid results from GPUs of last week would have passed validation with the new version.

It's pretty complicated to explain what exactly we did there. Basically we widened the validators understanding of "close enough" for a particular type of result. This affects only the cases where the GPU results were indeed close to that of the CPUs. I'm currently looking into the ~72,5% of the cases, where the results are "further off".

On  higher level I can see no specific platform or vendor that the "invalids" point to. The best validating app versions are Linux/AMD and OSX/NVidia (both 12% invalids), then a huge gap, then Linux+Windows/NVidia (23%) and finally Windows/AMD (26%) (OSX/AMD only had 2 successful results, both failed validation).

My suggestion would be that while we are still struggelling with validation not to run more than one task on a single GPU. It's just impossible to keep track of at our end.

BM

cecht
cecht
Joined: 7 Mar 18
Posts: 1421
Credit: 2445262974
RAC: 1494286

Bernd Machenschalk wrote:My

Bernd Machenschalk wrote:
My suggestion would be that while we are still struggelling with validation not to run more than one task on a single GPU. It's just impossible to keep track of at our end.

Check. Will do. Thanks for the update.

 

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Matt White
Matt White
Joined: 9 Jul 19
Posts: 120
Credit: 280798376
RAC: 0

Interesting

Interesting statistics.

Truth be told, I haven't done much tweaking to my very modest pair of machines, other than aborting tasks which do not play nice together. I'm still running two tasks per GPU with zero validation errors, other than the 3 or 4 tasks which tried to run concurrently with the FGRP tasks. I do have a few in the pending queue.

When I fixed my desktop issue, I purged all the driver and reloaded the standard set (not the pro) of drivers. The package version is AMDGPU-Pro 18.20-673703, which contains both the standard and pro drivers. This machine is the Core2-Duo, with the RX460 GPU. It is running 2 of the 1.07 GPU tasks, along with an additional 1.01 CPU task.

The other box is my DL360 blade server, with 2 X5650, 6 core XEON processors and a GT1030 GPU (431.60 driver), running 2 GW 1.07 tasks and 12 1.01 tasks. This is with the throttle (CPU Utilization) set at 55%. Interestingly, the LINUX box outperforms the sever, a testament to the importance of a good GPU.

Both boxes seem to be running the 1.07 task at x2, so I am reluctant to drop back to a single task. One thing I do find a bit puzzling is the crunch times as compared to others here. I expected the RX460 to perform better than it is, although if the RX570 is 3 tines as fast as the RX460, the numbers make sense. I'm keeping a weather eye on the validation results, if I start seeing invalid results showing up, I'll backstep to x1.

Clear skies,
Matt
archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023904931
RAC: 1804451

Bernd Machenschalk wrote:My

Bernd Machenschalk wrote:
My suggestion would be that while we are still struggelling with validation not to run more than one task on a single GPU. It's just impossible to keep track of at our end.

I have complied on both my boxes.  I regret that my early decision to run my first box mostly at 3X and 4X generated many dozens of invalid results (at least by the standards the validator was using until today).

For general interest as a data point on validation, both my boxes are Windows 10/RX 570 machines.

The second one to get started has only run 2X (until a few minutes ago) and to date has 68 valid, 1 invalid. 

The first one to get started ran mostly at 3X and 4X in the early days, then switched to only 2X until it went down to 1X a few minutes ago.  To date it has logged 72 valid results (all on 2X work) and 47 invalid (all on 3X and 4X work).  An additional 20 inconclusive status tasks on my 3X and 4X seem virtually certain to be score as invalid.  A small number of my 4X and 3X tasks have not yet been compared to a quorum partner, so with no action on my part these few tasks should generate an indicate whether today's adjustment of the validator strictness allows them to pass.

While my overall Windows/AMD score of 48 invalid with 141 valid is rather close to Bernd's early report of 26% invalid on Windows/AMD, that broad conclusion conceals a far more favorable sub-population when run at 2X.  While it takes longer for invalid findings to trickle in, I'm pretty confident that as things were for the last few days both of my boxes would have scored at far under 10%, and rather likely better than 5%.

Matt White
Matt White
Joined: 9 Jul 19
Posts: 120
Credit: 280798376
RAC: 0

archae86 wrote: To date it

archae86 wrote:
To date it has logged 72 valid results (all on 2X work) and 47 invalid (all on 3X and 4X work).

Based on this, (and my own findings) I'm tempted to go out on a limb and say 2x work has a much higher success rate. (Knock wood.)

I do have a few tasks in the inconclusive category, it will be interesting to see where they fall.

Clear skies,
Matt
cecht
cecht
Joined: 7 Mar 18
Posts: 1421
Credit: 2445262974
RAC: 1494286

I'm in a quandary for how

I'm in a quandary for how best to contribute to E@H. Assuming a primary goal of assisting with gravity wave detection, would it be better to grind out as many beta testing results as possible to support GW GPU app development, or optimize host resources between FGRBPG1 and GW GPU tasks to maximize BOINC credits while still contributing to GW GPU testing, or just go for the credits, assuming they mean something to big-picture E@H project goals? Put another way, is helping with binary pulsar detection more important than helping with GW GPU app development?
I have the impression that the BOINC credit system is a bit arbitrary, so maybe RAC can't be used as a metric to help out here.
Right now, my dual RX570 host is running 6 concurrent FGRBPG1 tasks along with 2 concurrent O2AS20-500 CPU tasks, while my RX460+RX560 host is running 2 concurrent O2AS20-500 GPU v1.07 tasks (1x/gpu, as requested, yielding ~30 tasks/day).  I'm dithering whether to stick with this 'balanced' resource partitioning or switch over to nothing-but-v1.07 tasks.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Matt White
Matt White
Joined: 9 Jul 19
Posts: 120
Credit: 280798376
RAC: 0

cecht wrote:I'm in a quandary

cecht wrote:
I'm in a quandary for how best to contribute to E@H. Assuming a primary goal of assisting with gravity wave detection, would it be better to grind out as many beta testing results as possible to support GW GPU app development, or optimize host resources between FGRBPG1 and GW GPU tasks to maximize BOINC credits while still contributing to GW GPU testing, or just go for the credits, assuming they mean something to big-picture E@H project goals? Put another way, is helping with binary pulsar detection more important than helping with GW GPU app development?

Since I do not use a custom app config file, and my preferences in the project section are set to run any and all tasks (other than CPU apps where GPU work is available), I'm assuming (we all know how dangerous that can be) that the project is sending me what it considers to be most helpful. And right now, that is all GW work.

That being said, and in my humble opinion, nothing is unimportant, since there are enough volunteers out there so all of the bases should be covered. Participating with E@H and BOINC projects as a whole is very much a team sport, and, sooner or later, every piece of data must be analyzed, and every number crunched, in order to make the big "discovery". So even if that "honor" is only shared by a few computers, separating the wheat from the chaff is a collective effort and since, in that way we all share in the effort, we all share the discovery.

Just my two cents. :)

Clear skies,
Matt
archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023904931
RAC: 1804451

Matt White wrote:I do have a

Matt White wrote:
I do have a few tasks in the inconclusive category, it will be interesting to see where they fall.

I predict they will be found invalid.

Consider: by the rules of the beta, even though you can't see your first quorum partner, it had to be a CPU job.  Both your task and that one passed the basic sanity checks, most likely, but then when your results were compared they were insufficiently similar to declare both valid.  So a tie-breaker task was sent out to a CPU host.  A betting person would think it likely that your initial CPU quorum partner is more likely to agree with the tie-breaker than you are.  

After typing all that I got a hazy, maybe, memory.  There is something funny about the case where the sanity checks run only when the quorum is filled are failed by your partner.  Your partner just fails, but, irrationally, I think perhaps your task is tagged inconclusive while the next task is in flight.

So maybe there is a bit more hope than I painted, but I still predict nearly all of those will be deemed invalid.  I, myself, still have 20 inconclusives, nearly all of them from 3X and 4X runs--all of which I expect to fail.

Mad_Max
Mad_Max
Joined: 2 Jan 10
Posts: 153
Credit: 2134862969
RAC: 434193

I have seen similar behavior

archae86 wrote:

My six-core RX 570 Windows 10 host continues to have reasonable validation success on tasks it ran at 2X, with total failure on so far resolved 4X tasks (15 invalid so far, zero validations)

My new news is that 3X tasks comprehensively did not work either. I ran this system at 3X for a few hours late August 14 into early August 15. Of 15 such tasks, one is currently pending (no quorum partner for comparison yet), two are invalid (initial miscompare, with the follow-up finding agreement between two others who disagree with me), and 11 inconclusive (initial miscompare, no successful resolution among initial and tie-breaking quorum partners). I judge this as most likely zero success when the scoring eventually is complete, and at best an extremely high rate of failure.

This is interesting, as cecht appears to have enjoyed success running 3X on a dual RX 570 Linux box running extremely similar graphics cards to mine.

Possibly the 3X and 4X plague I've seen may be dependent on the OS for which the application is compiled, or under which it is running, or ...

I hope other users with 3X or 4X experience of multiple successes or multiple failures will report helping to build up this picture. Reports contrasting 1X/2X/3X/4X success on the same system would be particularly helpful.

I have seen very similar behavior with my RX570 - both on GW (previous ver, did not test it with the latest) and FGRP apps. (win 7 OS)

RX570 was running 1x and 2x tasks fine. But 3x and more lead to glitches and/or close to ~100% failure rate at validation stage.
While older AMD cards (like HD 7850/7870) run 3x, 4x, 5x tasks fine at same computer with same drivers (actually it may be using different dll's, but from the same installed driver pack).

So i consider 2x concurrent task is a some sort of hardware limitations or drivers bugs of Polaris GPUs (RX 460 - RX 590) architecture...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.