EM searches, BRP Raidiopulsar and FGRP Gamma-Ray Pulsar

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250444187
RAC: 35219

DAMMIT! Also fails with the

DAMMIT! Also fails with the same error on Windows (cuLaunchGrid() fails with error 1, although the host code and the call itself is identical to that of app version 0.03, which works).

BM

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46752002642
RAC: 64126461

v0.08 cuda55 works on Linux

v0.08 cuda55 works on Linux (except the tasks that get "stuck"), but like v0.05 and 0.07, it's noticeably slower than v0.03

_________________________________________________________________________

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250444187
RAC: 35219

It seems with v. 0.11 we now

It seems with v. 0.11 we now have a working CUDA version on Windows. The first step is to get something working at all, then to improve validation, in the end we'll optimize for speed. The 0.03 version is all single precision, which is fast and good enough for the Arecibo data (BRP4), but it's not accurate enough for the MeerKAT (BRP7) data. 0.08/0.11 seems to offer the best compromise of speed and accuracy, let's see how validation goes.

BM

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46752002642
RAC: 64126461

What percentage (roughly) of

What percentage (roughly) of the calculation is double precision? 

_________________________________________________________________________

tictoc
tictoc
Joined: 1 Jan 13
Posts: 44
Credit: 7192568747
RAC: 7648694

Bernd Machenschalk wrote: It

Bernd Machenschalk wrote:

It seems with v. 0.11 we now have a working CUDA version on Windows. The first step is to get something working at all, then to improve validation, in the end we'll optimize for speed. The 0.03 version is all single precision, which is fast and good enough for the Arecibo data (BRP4), but it's not accurate enough for the MeerKAT (BRP7) data. 0.08/0.11 seems to offer the best compromise of speed and accuracy, let's see how validation goes.

Bernd I had shut down my AMD machines, due to the high invalid rate.  I don't care about the points, I just wasn't sure if it was helpful to keep piling up invalid tasks.  I'm happy to turn them back on and crunch away.

 

The fp64 calcs explain some of the additional performance on my VIIs.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3945
Credit: 46752002642
RAC: 64126461

tictoc wrote: Bernd

tictoc wrote:

Bernd Machenschalk wrote:

It seems with v. 0.11 we now have a working CUDA version on Windows. The first step is to get something working at all, then to improve validation, in the end we'll optimize for speed. The 0.03 version is all single precision, which is fast and good enough for the Arecibo data (BRP4), but it's not accurate enough for the MeerKAT (BRP7) data. 0.08/0.11 seems to offer the best compromise of speed and accuracy, let's see how validation goes.

Bernd I had shut down my AMD machines, due to the high invalid rate.  I don't care about the points, I just wasn't sure if it was helpful to keep piling up invalid tasks.  I'm happy to turn them back on and crunch away.

 

The fp64 calcs explain some of the additional performance on my VIIs.

were you running your tasks at 1x per GPU? Or some multiple? 

_________________________________________________________________________

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519268408
RAC: 15428

Bernd Machenschalk wrote: It

Bernd Machenschalk wrote:

It seems with v. 0.11 we now have a working CUDA version on Windows. The first step is to get something working at all, then to improve validation, in the end we'll optimize for speed. The 0.03 version is all single precision, which is fast and good enough for the Arecibo data (BRP4), but it's not accurate enough for the MeerKAT (BRP7) data. 0.08/0.11 seems to offer the best compromise of speed and accuracy, let's see how validation goes.

Throw as much double precision as you like into the AMD ones, I've got the non-deliberately-inhibited proper cards.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 240
Credit: 10553335586
RAC: 25437283

We will start attempting to

We will start attempting to run these again on the rtx a6000 gpus which should excel at these double precision work units. We had some success with the 0.05 but mixed results.

Edit: What is the main limiting factor for these work units? In the end, I will just have to try and see how many I can run concurrently but was curious. 

tictoc
tictoc
Joined: 1 Jan 13
Posts: 44
Credit: 7192568747
RAC: 7648694

Quote:Ian&Steve C.

Quote:

Ian&Steve C. wrote:

were you running your tasks at 1x per GPU? Or some multiple? 

I only ran 122 tasks, and they were a mix of 1x and 2x.

Average runtime of 1x was 254s, and 440s at 2x, on an overclocked Radeon VII.  That is about 20-23% faster than the tasks that were run on the stock clocked 6900XT.

If I fire the beta tasks back up, I'll split the different GPUs into their own clients, so that the results page will be a bit less of a mess.

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519268408
RAC: 15428

Boca Raton Community HS

Boca Raton Community HS wrote:

Edit: What is the main limiting factor for these work units? In the end, I will just have to try and see how many I can run concurrently but was curious. 

With most projects I find the CPU limits the GPU from running at full power.  Hence if you run two tasks on the GPU, the GPU now has access to two CPU cores, and can also be running one task while waiting for the CPU to feed the other.  What I do is watch the GPU usage % in something like MSI Afterburner, and if it's not very high, I add more tasks until it is.  Also watching you don't run out of GPU RAM, as that really slows things down when it has to revert to system RAM.  If you're running CPU asks aswell, reduce the number of those so there's enough spare CPU time to help the GPU, again watching the CPU usage graph in MSI Afterburner.  For some reason you don't need an MSI card to use MSI afterburner.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.