EM searches, BRP Raidiopulsar and FGRP Gamma-Ray Pulsar

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4350

Credit: 253683537

RAC: 35092

DAMMIT! Also fails with the

31 Aug 2022 16:17:21 UTC

Message 200413

(moderation:

)

DAMMIT! Also fails with the same error on Windows (cuLaunchGrid() fails with error 1, although the host code and the call itself is identical to that of app version 0.03, which works).

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4155

Credit: 50091751553

RAC: 42334722

v0.08 cuda55 works on Linux

31 Aug 2022 16:26:58 UTC

Message 200414

(moderation:

)

v0.08 cuda55 works on Linux (except the tasks that get "stuck"), but like v0.05 and 0.07, it's noticeably slower than v0.03

_________________________________________________________________________

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4350

Credit: 253683537

RAC: 35092

It seems with v. 0.11 we now

31 Aug 2022 22:34:46 UTC

Message 200421

(moderation:

)

It seems with v. 0.11 we now have a working CUDA version on Windows. The first step is to get something working at all, then to improve validation, in the end we'll optimize for speed. The 0.03 version is all single precision, which is fast and good enough for the Arecibo data (BRP4), but it's not accurate enough for the MeerKAT (BRP7) data. 0.08/0.11 seems to offer the best compromise of speed and accuracy, let's see how validation goes.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4155

Credit: 50091751553

RAC: 42334722

What percentage (roughly) of

31 Aug 2022 22:41:26 UTC

Message 200422

(moderation:

)

What percentage (roughly) of the calculation is double precision?

_________________________________________________________________________

tictoc

Joined: 1 Jan 13

Posts: 47

Credit: 7774747618

RAC: 7813386

Bernd Machenschalk wrote: It

31 Aug 2022 23:34:06 UTC

Message 200424 in response to message 200421

(moderation:

)

Bernd Machenschalk wrote:

It seems with v. 0.11 we now have a working CUDA version on Windows. The first step is to get something working at all, then to improve validation, in the end we'll optimize for speed. The 0.03 version is all single precision, which is fast and good enough for the Arecibo data (BRP4), but it's not accurate enough for the MeerKAT (BRP7) data. 0.08/0.11 seems to offer the best compromise of speed and accuracy, let's see how validation goes.

Bernd I had shut down my AMD machines, due to the high invalid rate. I don't care about the points, I just wasn't sure if it was helpful to keep piling up invalid tasks. I'm happy to turn them back on and crunch away.

The fp64 calcs explain some of the additional performance on my VIIs.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4155

Credit: 50091751553

RAC: 42334722

tictoc wrote: Bernd

1 Sep 2022 0:04:39 UTC

Message 200425 in response to message 200424

(moderation:

)

tictoc wrote:

Bernd Machenschalk wrote:

It seems with v. 0.11 we now have a working CUDA version on Windows. The first step is to get something working at all, then to improve validation, in the end we'll optimize for speed. The 0.03 version is all single precision, which is fast and good enough for the Arecibo data (BRP4), but it's not accurate enough for the MeerKAT (BRP7) data. 0.08/0.11 seems to offer the best compromise of speed and accuracy, let's see how validation goes.

Bernd I had shut down my AMD machines, due to the high invalid rate. I don't care about the points, I just wasn't sure if it was helpful to keep piling up invalid tasks. I'm happy to turn them back on and crunch away.

The fp64 calcs explain some of the additional performance on my VIIs.

were you running your tasks at 1x per GPU? Or some multiple?

_________________________________________________________________________

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 522532327

RAC: 128273

Bernd Machenschalk wrote: It

1 Sep 2022 0:32:45 UTC

Message 200427 in response to message 200421

(moderation:

)

Bernd Machenschalk wrote:

It seems with v. 0.11 we now have a working CUDA version on Windows. The first step is to get something working at all, then to improve validation, in the end we'll optimize for speed. The 0.03 version is all single precision, which is fast and good enough for the Arecibo data (BRP4), but it's not accurate enough for the MeerKAT (BRP7) data. 0.08/0.11 seems to offer the best compromise of speed and accuracy, let's see how validation goes.

Throw as much double precision as you like into the AMD ones, I've got the non-deliberately-inhibited proper cards.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 303

Credit: 11497093130

RAC: 13784800

We will start attempting to

1 Sep 2022 1:51:29 UTC

Message 200432

(moderation:

)

We will start attempting to run these again on the rtx a6000 gpus which should excel at these double precision work units. We had some success with the 0.05 but mixed results.

Edit: What is the main limiting factor for these work units? In the end, I will just have to try and see how many I can run concurrently but was curious.

tictoc

Joined: 1 Jan 13

Posts: 47

Credit: 7774747618

RAC: 7813386

Quote:Ian&Steve C.

1 Sep 2022 1:45:20 UTC

Message 200434 in response to message 200425

(moderation:

)

Quote:

Ian&Steve C. wrote:

were you running your tasks at 1x per GPU? Or some multiple?

I only ran 122 tasks, and they were a mix of 1x and 2x.

Average runtime of 1x was 254s, and 440s at 2x, on an overclocked Radeon VII. That is about 20-23% faster than the tasks that were run on the stock clocked 6900XT.

If I fire the beta tasks back up, I'll split the different GPUs into their own clients, so that the results page will be a bit less of a mess.

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 522532327

RAC: 128273

Boca Raton Community HS

1 Sep 2022 1:59:43 UTC

Message 200435 in response to message 200432

(moderation:

)

Boca Raton Community HS wrote:

Edit: What is the main limiting factor for these work units? In the end, I will just have to try and see how many I can run concurrently but was curious.

With most projects I find the CPU limits the GPU from running at full power. Hence if you run two tasks on the GPU, the GPU now has access to two CPU cores, and can also be running one task while waiting for the CPU to feed the other. What I do is watch the GPU usage % in something like MSI Afterburner, and if it's not very high, I add more tasks until it is. Also watching you don't run out of GPU RAM, as that really slows things down when it has to revert to system RAM. If you're running CPU asks aswell, reduce the number of those so there's enough spare CPU time to help the GPU, again watching the CPU usage graph in MSI Afterburner. For some reason you don't need an MSI card to use MSI afterburner.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

EM searches, BRP Raidiopulsar and FGRP Gamma-Ray Pulsar

Forums › Technical News

Comment viewing options

Forums › Technical News