I had long forgotten those posts. OK. Swan_sync not applicable for OpenCL tasks. So the cpu usage is totally dependent on how the app developer wrote the app for cpu support.
For my tasks, it looks to be 1:1 cpu_time:run_time except for one task I was playing with ncpus counts.
Isn't the CUDA swan_sync virtually the same thing as NVs OCL Spin? They dedicate a CPU thread waiting for instructions from the GPU to help keep it utilized. I've only heard of swan_sync being used for GPUGrid. Does it work for SETI?
I have a question regarding the invalid GPU GW results. Granted they do not match up to the CPU results but has anyone considered the possibility that they too are valid? Just finding something different does not mean it is wrong.
has anyone considered the possibility that they too are valid?
Yes.
Bernd reported on August 22 that the validator matching criteria were relaxed a little in a way that moved about a quarter of the available sample of invalid GPU GW tasks into validity. He clearly is willing to consider the possibility that not all GPU results scored as invalid are necessarily wrong.
I had long forgotten those posts. OK. Swan_sync not applicable for OpenCL tasks. So the cpu usage is totally dependent on how the app developer wrote the app for cpu support.
For my tasks, it looks to be 1:1 cpu_time:run_time except for one task I was playing with ncpus counts.
Isn't the CUDA swan_sync virtually the same thing as NVs OCL Spin? They dedicate a CPU thread waiting for instructions from the GPU to help keep it utilized. I've only heard of swan_sync being used for GPUGrid. Does it work for SETI?
No it doesn't work for Seti. The apps don't use it. Or at least the stock apps. There are other ways to accomplish a similar thing by making sure each gpu task has enough cpu support to keep it well fed.
The CUDA special app which is for Linux only does allow for setting the application to use an override of the stock blocking sync behavior. You set a flag named -nobs (no blocking sync) in your command line variable either in the app_info or app_config file. Shaves 5-10 seconds off the task when used. Which is actually a large percentage when the tasks only run for 30-60 seconds on the latest hardware.
I'm running CPU GW only and since a few days I have streaks with no GW tasks available, even though according to the server status page, it should have tasks to send.
2019-09-05 11:19:47.5534 [PID=16291] [debug] [HOST#12787227] MSG(high) No work sent
2019-09-05 11:19:47.5534 [PID=16291] [debug] [HOST#12787227] MSG(high) see scheduler log messages on https://einsteinathome.org/host/12787227/log
My remaining system running v1.07 GW GPU work has usually been reliable, producing days of valid results in a row at times, though it had a brief period of high invalid rates a week or so ago.
It has had a new "bad period".
Particulars:
As is my current practice, I rebooted the system a week or so after previous reboot, hoping to avoid the "sloth mode" and "catatonic mode" behaviors I have seen on all three of my AMD GPU systems when running Einstein GRP work. I had quit BOINC and other major applications before the reboot.
Quite shortly after the reboot, a current GW task terminated with a Computation error. These are very rare on my fleet--I think it has been months since I last saw one. After that, processing of GW tasks (currently running at 2X on that Windows 10 system) appeared normal. However, in reviewing my task list on my account pages at Einstein it appears that tasks reported as returned between 17:48 UTC on September 5 and 23:51 UTC on that date were all, or nearly all, currently scored as computation error, invalid, or inconclusive.
So of ten units processed in a row, nine are currently known to have failed (I'm counting the inconclusives), and one is still waiting for a quorum partner.
The first failure is actually the one returned before the computation failure, but probably was completed after the reboot from a checkpoint saved before.
The results just before and just after the bad stretch are very good. For example, the next eight tasks in sequence have all already been declared valid.
So I score this as another case of the perplexing ability of systems running this application to switch from "very healthy" to "extremely bad" regarding GW GPU task validity from one moment to the next, and back again. Somehow there is at least one unintended dependence on system state that affects the application result in a way the validator cares about.
Just to illustrate one of a limitless set of possibilities, one could see such a consequence of a dependency on an uninitialized variable.
Quite shortly after the reboot, a current GW task terminated with a Computation error. These are very rare on my fleet--I think it has been months since I last saw one.
The last time I saw that particular error, I had shut down BOINC without first suspending any pending tasks. After the system came back up on a reboot, the GPU task I was running failed. This was on my windows box, with the NVIDIA GT1030. I have yet to see one on my AMD/Linux box.
When I first started, I had bought an inexpensive NVIDIA GT710, which failed with that error on every task. I suspect the card was defective, as it was an open box when I bought it.
I currently have 6 inconclusive GW tasks in queue, all of which I expect to lose in the CPU+CPU/GPU tug of war. :)
My remaining system running v1.07 GW GPU work has usually been reliable, producing days of valid results in a row at times, though it had a brief period of high invalid rates a week or so ago.
It has had a new "bad period".
And now it has had yet another "bad period". Six tasks reported between 3:26 and 9:23 UTC September 7 are currently inconclusive (including four consecutive), so I expect to become invalid.
I've set the project preferences to change from running tasks on this machine at 2X to running at 1X. That setting will not take effect until a new task downloads. GW GPU Work availability has been skimpy again lately. At the recent rate of invalid, running this host at 1X will probably reduce net output of valid work (yes, the nominal gain from running 2X on the GW GPU tasks on this machine is very large). So this is intended to be an experiment to see whether the "bad periods" recur even when running 1X, not a way to raise real productivity.
I'm down to my last 2 pulsar tasks running 2X, which won't take long and then it's on to GW tasks running 1X on the GTX1060. My only complaints are the high invalid rate, hopefully that's fixed, and how low they pay per hour. Oh well GWs are my main interest in this project, that's the main reason I'm here.
Keith Myers wrote:I had long
)
Isn't the CUDA swan_sync virtually the same thing as NVs OCL Spin? They dedicate a CPU thread waiting for instructions from the GPU to help keep it utilized. I've only heard of swan_sync being used for GPUGrid. Does it work for SETI?
I have a question regarding
)
I have a question regarding the invalid GPU GW results. Granted they do not match up to the CPU results but has anyone considered the possibility that they too are valid? Just finding something different does not mean it is wrong.
Betreger wrote:has anyone
)
Yes.
Bernd reported on August 22 that the validator matching criteria were relaxed a little in a way that moved about a quarter of the available sample of invalid GPU GW tasks into validity. He clearly is willing to consider the possibility that not all GPU results scored as invalid are necessarily wrong.
mmonnin wrote:Keith Myers
)
No it doesn't work for Seti. The apps don't use it. Or at least the stock apps. There are other ways to accomplish a similar thing by making sure each gpu task has enough cpu support to keep it well fed.
The CUDA special app which is for Linux only does allow for setting the application to use an override of the stock blocking sync behavior. You set a flag named -nobs (no blocking sync) in your command line variable either in the app_info or app_config file. Shaves 5-10 seconds off the task when used. Which is actually a large percentage when the tasks only run for 30-60 seconds on the latest hardware.
I'm running CPU GW only and
)
I'm running CPU GW only and since a few days I have streaks with no GW tasks available, even though according to the server status page, it should have tasks to send.
Yesterday I had a fair number
)
Yesterday I had a fair number of GW GPU tasks in my cache. They disappeared over night. Methinks the project voided them.
My remaining system running
)
My remaining system running v1.07 GW GPU work has usually been reliable, producing days of valid results in a row at times, though it had a brief period of high invalid rates a week or so ago.
It has had a new "bad period".
Particulars:
As is my current practice, I rebooted the system a week or so after previous reboot, hoping to avoid the "sloth mode" and "catatonic mode" behaviors I have seen on all three of my AMD GPU systems when running Einstein GRP work. I had quit BOINC and other major applications before the reboot.
Quite shortly after the reboot, a current GW task terminated with a Computation error. These are very rare on my fleet--I think it has been months since I last saw one. After that, processing of GW tasks (currently running at 2X on that Windows 10 system) appeared normal. However, in reviewing my task list on my account pages at Einstein it appears that tasks reported as returned between 17:48 UTC on September 5 and 23:51 UTC on that date were all, or nearly all, currently scored as computation error, invalid, or inconclusive.
So of ten units processed in a row, nine are currently known to have failed (I'm counting the inconclusives), and one is still waiting for a quorum partner.
The first failure is actually the one returned before the computation failure, but probably was completed after the reboot from a checkpoint saved before.
The results just before and just after the bad stretch are very good. For example, the next eight tasks in sequence have all already been declared valid.
So I score this as another case of the perplexing ability of systems running this application to switch from "very healthy" to "extremely bad" regarding GW GPU task validity from one moment to the next, and back again. Somehow there is at least one unintended dependence on system state that affects the application result in a way the validator cares about.
Just to illustrate one of a limitless set of possibilities, one could see such a consequence of a dependency on an uninitialized variable.
archae86 wrote:Quite shortly
)
The last time I saw that particular error, I had shut down BOINC without first suspending any pending tasks. After the system came back up on a reboot, the GPU task I was running failed. This was on my windows box, with the NVIDIA GT1030. I have yet to see one on my AMD/Linux box.
When I first started, I had bought an inexpensive NVIDIA GT710, which failed with that error on every task. I suspect the card was defective, as it was an open box when I bought it.
I currently have 6 inconclusive GW tasks in queue, all of which I expect to lose in the CPU+CPU/GPU tug of war. :)
archae86 wrote:My remaining
)
And now it has had yet another "bad period". Six tasks reported between 3:26 and 9:23 UTC September 7 are currently inconclusive (including four consecutive), so I expect to become invalid.
I've set the project preferences to change from running tasks on this machine at 2X to running at 1X. That setting will not take effect until a new task downloads. GW GPU Work availability has been skimpy again lately. At the recent rate of invalid, running this host at 1X will probably reduce net output of valid work (yes, the nominal gain from running 2X on the GW GPU tasks on this machine is very large). So this is intended to be an experiment to see whether the "bad periods" recur even when running 1X, not a way to raise real productivity.
I'm down to my last 2 pulsar
)
I'm down to my last 2 pulsar tasks running 2X, which won't take long and then it's on to GW tasks running 1X on the GTX1060. My only complaints are the high invalid rate, hopefully that's fixed, and how low they pay per hour. Oh well GWs are my main interest in this project, that's the main reason I'm here.