Gamma-ray pulsar binary search #1 on GPUs

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

Holmis wrote:To get rid of

Holmis wrote:
To get rid of the error message edit your cc_config.xml and remove the tags.

The first thing I did was look in my cc_config.xml, and didn't see it.  This is what I have:

<cc_config>
  <options>    
      <rec_half_life_days>1.000000</rec_half_life_days>
    <use_all_gpus>1</use_all_gpus>
     <ignore_nvidia_dev>1</ignore_nvidia_dev>
  </options>
</cc_config>

You are undoubtedly right that it has nothing to do with why I am not getting the Betas; that must be something else.  I will try again later.

EDIT: I added the "ignore one GTX 960" to just run on one card and avoid the problem with the two cards.  That seemed to work for a while, but when I stopped getting work units, I thought maybe they had decided to restrict distribution to systems with just one card, and that mine was still considered two cards even if one was ignored.

 

floyd
floyd
Joined: 12 Sep 11
Posts: 133
Credit: 186610495
RAC: 0

Jim, the other message means

Jim, the other message means that the WU that the scheduler considered for you was already processed by the beta application. They don't want a second beta result but a stable result to verify against so that WU ist not assigned to you. Whether there are no fresh WU or the scheduler doesn't find them and gives up I can't tell.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

OK, thanks.  I will try again

OK, thanks.  I will try again later.  They seem to run well on the GTX 960, about 600 seconds, as compared to 1000 seconds that your GTX 750 Ti is getting, so the memory bandwidth does not seem to be as much of a handicap as for the BRP4Gs.  That is what I was looking for.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250375812
RAC: 35108

Currently the scheduling

Currently the scheduling array is filled with tasks that are bound to be run by the CPU version to validate results from the GPU version. However, the results of the GPU apps validate pretty well, I'll drop this restriction later today.

BM

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

Thanks for the update and

Thanks for the update and work done on the new FGRP GPU application. I see my NVIDIA GPU in Linux has been validating some of the tasks successfully and so far none of the tasks completed are being reported as invalid. The tasks are completing consistently at 221 seconds a piece.

choks
choks
Joined: 24 Feb 05
Posts: 16
Credit: 145988379
RAC: 81277

About warnings in the

About warnings in the logs.

Since BOINC does not report FP64 support, a dummy kernel compile check using FP64 is performed when OpenCL device is opened. If FP64 is OK, we use the GPU for almost everything (even sorting results). If the device does not support FP64, all kernels requiring "double" support are performed by the CPU (about 10x slower).

If you see "OpenCL device has FP64 support" in the logs, it means that the GPU has been recognized to support double floating point. Don't worry about performance, double precision is not the major part of processing.

On OSX, there are lots of warning compiling the FFT library, but this is harmless and should be ignored.

As Bernd said, we are still having issues with the Windows driver. I hope we will find soon what's causing the biggest OpenCL kernel to fail on Windows only.

Christophe

Trotador
Trotador
Joined: 2 May 13
Posts: 58
Credit: 2122643213
RAC: 0

Crunching these wus in a

Crunching these wus in a HD7950 in Linux, no issue so far, around 315 seconds when oly one, 460 with two and 610 with three simultaneous wus.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

I normally run just one at a

I normally run just one at a time, but thought I would try two on my GTX 960 (Ubuntu 16.10, 367.57 drivers).  This is a minimally factory overclocked card that runs at 58 C normally.  But as soon as it hit two work units, it errored out.  And all the remaining work units errored out too, after 11 seconds.  This is like the problem that I initially had with two cards, except that I was running just one card.  Apparently my machine does not like two work units at once in any manner.

choks
choks
Joined: 24 Feb 05
Posts: 16
Credit: 145988379
RAC: 81277

@Jim: Looks you have 2Gb of

@Jim: Looks you have 2Gb of GPU RAM which should be enough to perform 2 tasks at the same time. Does it occur if you run two BRP4G WU? (just quit the Beta to test it).

What's clear is that the Nvidia driver got completely crazy after the first error, and computation have be done outside normal GPU memory, which leads to the FP exception you get. 

Did you had to reboot your machine to perform GPU tasks successfully, or just restart BOINC did it? 

Thx

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

I tried to run 3 tasks

I tried to run 3 tasks parallel on GTX 960 2GB (driver 375.20) + Linux Mint 18 (4.9.0-040900rc8-generic). Allocated 0.33 GPU resources per task. All three 'Gamma-ray pulsar binary search #1 on GPUs v1.12 (FGRPopencl-nvidia) x86_64-pc-linux-gnu' tasks started at the same time.

Two of them continued running fine all the way to end, but third "line" in parallel kept erroring out always in about 18 secs. Here's error message for all those tasks:

[CRITICAL]: ERROR: MAIN() returned with error '-4'
FPU status flags:
Error in OpenCL context: CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_NDRANGE_KERNEL on GeForce GTX 960 (Device 0).

https://einsteinathome.org/host/12468219/tasks/error

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.