The O2-All Sky Gravitational Wave Search on GPUs - discussion thread.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119604694499

RAC: 24854975

cecht wrote:... results I

25 Jul 2019 23:43:00 UTC

Message 172336 in response to message 172334

(moderation:

)

cecht wrote:

... results I reported are for O2AS20-500 tasks (Continuous Gravitational Wave search O2 All-Sky v1.06 () x86_64-pc-linux-gnu; GW-opencl-ati)

So were Richie's :-).

cecht wrote:

... the Server Status page shows that workunits for that application are still being sent out

The O2AS search has both a CPU app and the current test version of the GPU app. So, on the server status page, I presume the activity now applies to the CPU app. Bernd's cryptic comment about validation hints at immediate action to stop the further flow of GPU tasks until he works out what is going on. People still may have GPU tasks on board but I'd be quite surprised if you could get more of them after that comment. I suspect (when the problem is identified) that there could be a further app version needed so it doesn't make much sense to keep crunching any remaining tasks for the current app - at least until Bernd makes a further comment.

In the past during 'live' tests like this, people were granted credit (if possible) for the work done, even if an app failure caused the results to be 'junk'. Hence I'm not at all surprised at Archae86's comment that he had credit for an invalid result. At the moment, validation will have been suspended whilst the results are being looked at in an effort to characterise the true nature of the problem. Unless the problem is sorted quickly, results may remain in limbo for quite a while. Test results are never 'junk' since they always help to improve the app or other back end processes in the whole validation chain.

cecht wrote:

... the three work generators for the O1OD1 series of programs are disabled.

We are doing the Observation Run 2 All-Sky search. The Observation Run 1 Open Data 1 search is a different beast so it's not surprising those work generators are listed as disabled.

Cheers,
Gary.

cecht

Joined: 7 Mar 18

Posts: 1614

Credit: 3026353650

RAC: 1417833

Gary Roberts wrote:cecht

26 Jul 2019 0:44:10 UTC

Message 172337 in response to message 172336

(moderation:

)

Gary Roberts wrote:

cecht wrote:
... results I reported are for O2AS20-500 tasks (Continuous Gravitational Wave search O2 All-Sky v1.06 () x86_64-pc-linux-gnu; GW-opencl-ati)

So were Richie's :-).

cecht wrote:
... the Server Status page shows that workunits for that application are still being sent out

The O2AS search has both a CPU app and the current test version of the GPU app. So, on the server status page, I presume the activity now applies to the CPU app. Bernd's cryptic comment about validation hints at immediate action to stop the further flow of GPU tasks until he works out what is going on. People still may have GPU tasks on board but I'd be quite surprised if you could get more of them after that comment. I suspect (when the problem is identified) that there could be a further app version needed so it doesn't make much sense to keep crunching any remaining tasks for the current app - at least until Bernd makes a further comment.

In the past during 'live' tests like this, people were granted credit (if possible) for the work done, even if an app failure caused the results to be 'junk'. Hence I'm not at all surprised at Archae86's comment that he had credit for an invalid result. At the moment, validation will have been suspended whilst the results are being looked at in an effort to characterise the true nature of the problem. Unless the problem is sorted quickly, results may remain in limbo for quite a while. Test results are never 'junk' since they always help to improve the app or other back end processes in the whole validation chain.

cecht wrote:
... the three work generators for the O1OD1 series of programs are disabled.

We are doing the Observation Run 2 All-Sky search. The Observation Run 1 Open Data 1 search is a different beast so it's not surprising those work generators are listed as disabled.

Thanks for straightening me out! Or as Gilda Radner's character, Emily Litella, used to say on Rowan&Martin's Laugh-In, "Nevermind".

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119604694499

RAC: 24854975

cecht wrote:Thanks for

26 Jul 2019 1:31:49 UTC

Message 172340 in response to message 172337

(moderation:

)

cecht wrote:

Thanks for straightening me out!

Sorry for the unintentional collateral damage! I didn't realise you were bent! :-).

Cheers,
Gary.

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

I couldn't resist trying out

26 Jul 2019 12:46:25 UTC

Message 172350

(moderation:

)

I couldn't resist trying out if GPU tasks are available at the moment. Yes, they are... but same as earlier, v1.06.

Stef

Joined: 8 Mar 05

Posts: 206

Credit: 110568193

RAC: 0

I do also keep getting them,

26 Jul 2019 12:54:00 UTC

Message 172351

(moderation:

)

I do also keep getting them, but not a single result was rewarded with credit so far.

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7389541687

RAC: 2012811

cecht wrote:This Linux host

26 Jul 2019 14:21:08 UTC

Message 172352 in response to message 172329

(moderation:

)

cecht wrote:

This Linux host has a modest 2-core(4-thread) Pentium G5600 CPU @ 3.90GHz.

app_config GPU\CPU   task time, s   CPU time, s
     1 \ 0.9            3750           2370
     0.5 \ 0.8          2435           1463
     0.333 \ 0.5        2153           1081

Cecht. We share the same RX 570 GPU, but have very different CPUs, and I run Windows 10 vs. your Linux.

Somewhere in those or other differences there hides a wildly different CPU time relationship. At all levels from 1X up through 4X, my system reports slightly more CPU time than elapsed time, implying at least a little bit of simultaneous activity on the 5 or 6 reported threads. Mine thus reports much more CPU per task as one raises multiplicity. You, on the other hand, report less CPU time per task as the multiplicity level rises.

I speculate two things:

1. The difference in our behaviors seems most likely a difference between the Linux application and the Windows application, or possibly a difference on OS services as requested by the applications.

2. Much, or perhaps most of the CPU work on this application is not in fact directly required computation on the target problem, but some kind of data-shuffling overhead. Either that, or for some reason when running at higher multiplicities the Windows version re-runs much of the work.

As others have reported, delivery of 1.06 GPU tasks did not in fact stop when Bernd said "I'm disabling the GPU versions for now" eighteen hours ago. I downloaded just a little more solely in order to allow a 4X trial matched to my 1X, 2X, 3X trials. The result was (seemingly) successful completions at somewhat improved productivity.

I'll summarize the apparent productivities using the metric of implied task completions per day:

1X 19.9
2X 32.5
3X 39.6
4X 42.9

cecht's system is far more productive than mine at 1X, probably because his 3.9 GHz Coffee Lake processor shrugs off the marketing slur of "Pentium" and delivers single-core performance mostly attributable to clock rate and processor generation (his chip is listed as a 2-core Coffee Lake with a Q2 2018 launch date) coupled with probably considerably more efficient computation by the Linux version than the Windows version of this application. My i5-9400F is a bit more recent with a Q1 2019 launch date but just a 2.9 GHz clock rate and is also Coffee Lake, so likely very, very similar computational performance at a given clock rate. Possibly the fact mine has six physical cores compared to his two helps the higher multiplicity matter.

Of course, none of this matters unless an application is delivered that actually works. I've aborted my remaining GW tasks, and disabled acceptance of Beta Test work until I see some favorable indication.

(edited to add OS difference to CPU clock rate comment)

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119604694499

RAC: 24854975

Gary Roberts wrote:....

27 Jul 2019 1:37:15 UTC

Message 172359 in response to message 172336

(moderation:

)

Gary Roberts wrote:

.... Bernd's cryptic comment about validation hints at immediate action to stop the further flow of GPU tasks until he works out what is going on. People still may have GPU tasks on board but I'd be quite surprised if you could get more of them after that comment.

All I can do is apologise for the incorrect assessment and reiterate that I'm genuinely 'quite surprised' :-).

However, it seems to be just a waste of your resources to continue downloading and running them. Bernd should already have a big enough sample size to inspect whilst he tries to isolate the problem. I guess we'll be doing them all again at some point once the problem is rectified.

People often comment about lack of evidence of their completed tasks being subjected to the validation process. It seems to be that test tasks like these are deliberately held back from validation until some sort of 'inspection' is done as to the efficacy of the results. If all looks good, they get passed to the validator. If there are problems, some sort of manual intervention quarantines them and a credit is manually applied to 'compensate' the volunteer for their contribution.

Cheers,
Gary.

Rolf

Joined: 7 Aug 17

Posts: 27

Credit: 135377187

RAC: 0

About the validation, the

27 Jul 2019 6:27:12 UTC

Message 172360

(moderation:

)

About the validation, the pattern I have seen is that two CPUs mostly agree on the result (even if they happen to be AMD and Intel), and two GPUs also agree. The dispute starts when a CPU and GPU compare their results, it always end up with a referendum. Then the minority loses, so two GPUs will downvote a CPU and declare its result invalid, and vice versa.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4349

Credit: 253586515

RAC: 36341

There is now version 1.07

13 Aug 2019 8:18:41 UTC

Message 172604

(moderation:

)

There is now version 1.07 which should validate much better with the CPU versions.

Arif Mert Kapicioglu

Joined: 16 Jul 09

Posts: 7

Credit: 823300983

RAC: 0

Bernd Machenschalk

13 Aug 2019 11:53:52 UTC

Message 172606 in response to message 172604

(moderation:

)

Bernd Machenschalk wrote:

There is now version 1.07 which should validate much better with the CPU versions.

Currently running one tough the GPU load is fluctuating between %21-27. Win 10 X64, Vega 64, GPU temp 47 Celcius.

The O2-All Sky Gravitational Wave Search on GPUs - discussion thread.

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner