The O2-All Sky Gravitational Wave Search on GPUs - discussion thread.

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3392566540
RAC: 2833333

Keith Myers wrote:I had long

Keith Myers wrote:

I had long forgotten those posts.  OK.  Swan_sync not applicable for OpenCL tasks.  So the cpu usage is totally dependent  on how the app developer wrote the app for cpu support.

 

For my tasks, it looks to be 1:1 cpu_time:run_time except for one task I was playing with ncpus counts.

 

Isn't the CUDA swan_sync virtually the same thing as NVs OCL Spin? They dedicate a CPU thread waiting for instructions from the GPU to help keep it utilized. I've only heard of swan_sync being used for GPUGrid. Does it work for SETI?

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1588415735
RAC: 761004

I have a question regarding

I have a question regarding the invalid GPU GW results. Granted they do not match up to the CPU results but has anyone considered the possibility that they too are valid? Just finding something different does not mean it is wrong. 

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7220564931
RAC: 970958

Betreger wrote:has anyone

Betreger wrote:
has anyone considered the possibility that they too are valid?

Yes.

Bernd reported on August 22 that the validator matching criteria were relaxed a little in a way that moved about a quarter of the available sample of invalid GPU GW tasks into validity.  He clearly is willing to consider the possibility that not all GPU results scored as invalid are necessarily wrong.  

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18713319314
RAC: 6381196

mmonnin wrote:Keith Myers

mmonnin wrote:
Keith Myers wrote:

I had long forgotten those posts.  OK.  Swan_sync not applicable for OpenCL tasks.  So the cpu usage is totally dependent  on how the app developer wrote the app for cpu support.

 

For my tasks, it looks to be 1:1 cpu_time:run_time except for one task I was playing with ncpus counts.

 

Isn't the CUDA swan_sync virtually the same thing as NVs OCL Spin? They dedicate a CPU thread waiting for instructions from the GPU to help keep it utilized. I've only heard of swan_sync being used for GPUGrid. Does it work for SETI?

No it doesn't work for Seti.  The apps don't use it.  Or at least the stock apps. There are other ways to accomplish a similar thing by making sure each gpu task has enough cpu support to keep it well fed.

The CUDA special app which is for Linux only does allow for setting the application to use an override of the stock blocking sync behavior.  You set a flag named -nobs (no blocking sync) in your command line variable either in the app_info or app_config file.  Shaves 5-10 seconds off the task when used.  Which is actually a large percentage when the tasks only run for 30-60 seconds on the latest hardware.

 

Stef
Stef
Joined: 8 Mar 05
Posts: 206
Credit: 110568193
RAC: 0

I'm running CPU GW only and

I'm running CPU GW only and since a few days I have streaks with no GW tasks available, even though according to the server status page, it should have tasks to send.

2019-09-05 11:19:47.5534 [PID=16291] [debug]   [HOST#12787227] MSG(high) No work sent
2019-09-05 11:19:47.5534 [PID=16291] [debug]   [HOST#12787227] MSG(high) see scheduler log messages on https://einsteinathome.org/host/12787227/log
Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1588415735
RAC: 761004

Yesterday I had a fair number

Yesterday I had a fair number of GW GPU tasks in my cache. They disappeared over night. Methinks the project voided them. 

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7220564931
RAC: 970958

My remaining system running

My remaining system running v1.07 GW GPU work has usually been reliable, producing days of valid results in a row at times, though it had a brief period of high invalid rates a week or so ago.

It has had a new "bad period".

Particulars:

As is my current practice, I rebooted the system a week or so after previous reboot, hoping to avoid the "sloth mode" and "catatonic mode" behaviors I have seen on all three of my AMD GPU systems when running Einstein GRP work.  I had quit BOINC and other major applications before the reboot.

Quite shortly after the reboot, a current GW task terminated with a Computation error.  These are very rare on my fleet--I think it has been months since I last saw one.  After that, processing of GW tasks (currently running at 2X on that Windows 10 system) appeared normal.  However, in reviewing my task list on my account pages at Einstein it appears that tasks reported as returned between 17:48 UTC on September 5 and 23:51 UTC on that date were all, or nearly all, currently scored as computation error, invalid, or inconclusive. 

So of ten units processed in a row, nine are currently known to have failed (I'm counting the inconclusives), and one is still waiting for a quorum partner.

The first failure is actually the one returned before the computation failure, but probably was completed after the reboot from a checkpoint saved before.

The results just before and just after the bad stretch are very good.  For example, the next eight tasks in sequence have all already been declared valid.

So I score this as another case of the perplexing ability of systems running this application to switch from "very healthy" to "extremely bad" regarding GW GPU task validity from one moment to the next, and back again.  Somehow there is at least one unintended dependence on system state that affects the application result in a way the validator cares about.

Just to illustrate one of a limitless set of possibilities, one could see such a consequence of a dependency on an uninitialized variable.

 

Matt White
Matt White
Joined: 9 Jul 19
Posts: 120
Credit: 280798376
RAC: 0

archae86 wrote:Quite shortly

archae86 wrote:
Quite shortly after the reboot, a current GW task terminated with a Computation error.  These are very rare on my fleet--I think it has been months since I last saw one.

The last time I saw that particular error, I had shut down BOINC without first suspending any pending tasks. After the system came back up on a reboot, the GPU task I was running failed. This was on my windows box, with the NVIDIA GT1030. I have yet to see one on my AMD/Linux box.

When I first started, I had bought an inexpensive NVIDIA GT710, which failed with that error on every task. I suspect the card was defective, as it was an open box when I bought it.

I currently have 6 inconclusive GW tasks in queue, all of which I expect to lose in the CPU+CPU/GPU tug of war. :)

Clear skies,
Matt
archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7220564931
RAC: 970958

archae86 wrote:My remaining

archae86 wrote:

My remaining system running v1.07 GW GPU work has usually been reliable, producing days of valid results in a row at times, though it had a brief period of high invalid rates a week or so ago.

It has had a new "bad period".

And now it has had yet another "bad period".  Six tasks reported between 3:26 and 9:23 UTC September 7 are currently inconclusive (including four consecutive), so I expect to become invalid. 

I've set the project preferences to change from running tasks on this machine at 2X to running at 1X.  That setting will not take effect until a new task downloads.  GW GPU Work availability has been skimpy again lately.  At the recent rate of invalid, running this host at 1X will probably reduce net output of valid work (yes, the nominal gain from running 2X on the GW GPU tasks on this machine is very large).  So this is intended to be an experiment to see whether the "bad periods" recur even when running 1X, not a way to raise real productivity.

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1588415735
RAC: 761004

I'm down to my last 2 pulsar

I'm down to my last 2 pulsar tasks running 2X, which won't take long and then it's on to GW tasks running 1X on the GTX1060. My only complaints are the high invalid rate, hopefully that's fixed, and how low they pay per hour. Oh well GWs are my main interest in this project, that's the main reason I'm here. 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.