You are undoubtedly right that it has nothing to do with why I am not getting the Betas; that must be something else. I will try again later.
EDIT: I added the "ignore one GTX 960" to just run on one card and avoid the problem with the two cards. That seemed to work for a while, but when I stopped getting work units, I thought maybe they had decided to restrict distribution to systems with just one card, and that mine was still considered two cards even if one was ignored.
Jim, the other message means that the WU that the scheduler considered for you was already processed by the beta application. They don't want a second beta result but a stable result to verify against so that WU ist not assigned to you. Whether there are no fresh WU or the scheduler doesn't find them and gives up I can't tell.
OK, thanks. I will try again later. They seem to run well on the GTX 960, about 600 seconds, as compared to 1000 seconds that your GTX 750 Ti is getting, so the memory bandwidth does not seem to be as much of a handicap as for the BRP4Gs. That is what I was looking for.
Currently the scheduling array is filled with tasks that are bound to be run by the CPU version to validate results from the GPU version. However, the results of the GPU apps validate pretty well, I'll drop this restriction later today.
Thanks for the update and work done on the new FGRP GPU application. I see my NVIDIA GPU in Linux has been validating some of the tasks successfully and so far none of the tasks completed are being reported as invalid. The tasks are completing consistently at 221 seconds a piece.
Since BOINC does not report FP64 support, a dummy kernel compile check using FP64 is performed when OpenCL device is opened. If FP64 is OK, we use the GPU for almost everything (even sorting results). If the device does not support FP64, all kernels requiring "double" support are performed by the CPU (about 10x slower).
If you see "OpenCL device has FP64 support" in the logs, it means that the GPU has been recognized to support double floating point. Don't worry about performance, double precision is not the major part of processing.
On OSX, there are lots of warning compiling the FFT library, but this is harmless and should be ignored.
As Bernd said, we are still having issues with the Windows driver. I hope we will find soon what's causing the biggest OpenCL kernel to fail on Windows only.
I normally run just one at a time, but thought I would try two on my GTX 960 (Ubuntu 16.10, 367.57 drivers). This is a minimally factory overclocked card that runs at 58 C normally. But as soon as it hit two work units, it errored out. And all the remaining work units errored out too, after 11 seconds. This is like the problem that I initially had with two cards, except that I was running just one card. Apparently my machine does not like two work units at once in any manner.
@Jim: Looks you have 2Gb of GPU RAM which should be enough to perform 2 tasks at the same time. Does it occur if you run two BRP4G WU? (just quit the Beta to test it).
What's clear is that the Nvidia driver got completely crazy after the first error, and computation have be done outside normal GPU memory, which leads to the FP exception you get.
Did you had to reboot your machine to perform GPU tasks successfully, or just restart BOINC did it?
I tried to run 3 tasks parallel on GTX 960 2GB (driver 375.20) + Linux Mint 18 (4.9.0-040900rc8-generic). Allocated 0.33 GPU resources per task. All three 'Gamma-ray pulsar binary search #1 on GPUs v1.12 (FGRPopencl-nvidia) x86_64-pc-linux-gnu' tasks started at the same time.
Two of them continued running fine all the way to end, but third "line" in parallel kept erroring out always in about 18 secs. Here's error message for all those tasks:
[CRITICAL]: ERROR: MAIN() returned with error '-4'
FPU status flags:
Error in OpenCL context: CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_NDRANGE_KERNEL on GeForce GTX 960 (Device 0).
Holmis wrote:To get rid of
)
The first thing I did was look in my cc_config.xml, and didn't see it. This is what I have:
<cc_config>
<options>
<rec_half_life_days>1.000000</rec_half_life_days>
<use_all_gpus>1</use_all_gpus>
<ignore_nvidia_dev>1</ignore_nvidia_dev>
</options>
</cc_config>
You are undoubtedly right that it has nothing to do with why I am not getting the Betas; that must be something else. I will try again later.
EDIT: I added the "ignore one GTX 960" to just run on one card and avoid the problem with the two cards. That seemed to work for a while, but when I stopped getting work units, I thought maybe they had decided to restrict distribution to systems with just one card, and that mine was still considered two cards even if one was ignored.
Jim, the other message means
)
Jim, the other message means that the WU that the scheduler considered for you was already processed by the beta application. They don't want a second beta result but a stable result to verify against so that WU ist not assigned to you. Whether there are no fresh WU or the scheduler doesn't find them and gives up I can't tell.
OK, thanks. I will try again
)
OK, thanks. I will try again later. They seem to run well on the GTX 960, about 600 seconds, as compared to 1000 seconds that your GTX 750 Ti is getting, so the memory bandwidth does not seem to be as much of a handicap as for the BRP4Gs. That is what I was looking for.
Currently the scheduling
)
Currently the scheduling array is filled with tasks that are bound to be run by the CPU version to validate results from the GPU version. However, the results of the GPU apps validate pretty well, I'll drop this restriction later today.
BM
Thanks for the update and
)
Thanks for the update and work done on the new FGRP GPU application. I see my NVIDIA GPU in Linux has been validating some of the tasks successfully and so far none of the tasks completed are being reported as invalid. The tasks are completing consistently at 221 seconds a piece.
About warnings in the
)
About warnings in the logs.
Since BOINC does not report FP64 support, a dummy kernel compile check using FP64 is performed when OpenCL device is opened. If FP64 is OK, we use the GPU for almost everything (even sorting results). If the device does not support FP64, all kernels requiring "double" support are performed by the CPU (about 10x slower).
If you see "OpenCL device has FP64 support" in the logs, it means that the GPU has been recognized to support double floating point. Don't worry about performance, double precision is not the major part of processing.
On OSX, there are lots of warning compiling the FFT library, but this is harmless and should be ignored.
As Bernd said, we are still having issues with the Windows driver. I hope we will find soon what's causing the biggest OpenCL kernel to fail on Windows only.
Christophe
Crunching these wus in a
)
Crunching these wus in a HD7950 in Linux, no issue so far, around 315 seconds when oly one, 460 with two and 610 with three simultaneous wus.
I normally run just one at a
)
I normally run just one at a time, but thought I would try two on my GTX 960 (Ubuntu 16.10, 367.57 drivers). This is a minimally factory overclocked card that runs at 58 C normally. But as soon as it hit two work units, it errored out. And all the remaining work units errored out too, after 11 seconds. This is like the problem that I initially had with two cards, except that I was running just one card. Apparently my machine does not like two work units at once in any manner.
@Jim: Looks you have 2Gb of
)
@Jim: Looks you have 2Gb of GPU RAM which should be enough to perform 2 tasks at the same time. Does it occur if you run two BRP4G WU? (just quit the Beta to test it).
What's clear is that the Nvidia driver got completely crazy after the first error, and computation have be done outside normal GPU memory, which leads to the FP exception you get.
Did you had to reboot your machine to perform GPU tasks successfully, or just restart BOINC did it?
Thx
I tried to run 3 tasks
)
I tried to run 3 tasks parallel on GTX 960 2GB (driver 375.20) + Linux Mint 18 (4.9.0-040900rc8-generic). Allocated 0.33 GPU resources per task. All three 'Gamma-ray pulsar binary search #1 on GPUs v1.12 (FGRPopencl-nvidia) x86_64-pc-linux-gnu' tasks started at the same time.
Two of them continued running fine all the way to end, but third "line" in parallel kept erroring out always in about 18 secs. Here's error message for all those tasks: