EM searches, BRP Raidiopulsar and FGRP Gamma-Ray Pulsar

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 240
Credit: 10628855586
RAC: 18422122

Thanks for the insight. I

Thanks for the insight. I leave a few cores open to help prevent bottlenecking which definitely helps. Gpu memory shouldn't be an issue with our gpus. I am still learning the details of gpus but do single precision tasks use the same cores of the gpu as double precision? I know that our ampere cards  have more active tesor cores which are for double precision tasks? Am I off base with this? 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3965
Credit: 47202112642
RAC: 65445477

Boca Raton Community HS

Boca Raton Community HS wrote:

Thanks for the insight. I leave a few cores open to help prevent bottlenecking which definitely helps. Gpu memory shouldn't be an issue with our gpus. I am still learning the details of gpus but do single precision tasks use the same cores of the gpu as double precision? I know that our ampere cards  have more active tesor cores which are for double precision tasks? Am I off base with this? 

the tensor cores are not the same as FP cores. tensor cores are specialized hardware for inferencing workloads like ML and AI. No BOINC project (yet) uses this hardware. 
 

GA102 die like in your A6000 or higher end GeForce 30-series cards (or any GA10x really) don’t really have dedicated FP64 hardware. Pretty sure they just double up FP32 cores for that. But the higher end Nvidia cards based on the GA100 core like the A100 do have dedicated FP64 cores. 

edit, correction:

the GA10x (Geforce 30x0, Ax000 "Quadro", etc) cards have only 2 FP64 cores per SM. but this is not depicted in most architecture diagrams so I missed it, had to dig into the white paper to find that. but with 128 FP32 cores/SM that explains why there's a 1:64 ratio in performance.

while the GA100 (A100) cards have 32 FP64 cores per SM and 64 FP32 cores per SM, for that nice 1:2 ratio in performance.

they basically swapped out the FP64 cores for the Ray Tracing cores on GA10x, which are not present on GA100.
 

https://images.nvidia.com/aem-dam/en-zz/Solutions/geforce/ampere/pdf/NVIDIA-ampere-GA102-GPU-Architecture-Whitepaper-V1.pdf

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4969
Credit: 18770719988
RAC: 7253339

Unless an application is

Unless an application is coded specifically to use Tensor cores they won't be used.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7229791531
RAC: 1156083

Bernd Machenschalk wrote: It

Bernd Machenschalk wrote:

It seems with v. 0.11 we now have a working CUDA version on Windows. The first step is to get something working at all, then to improve validation,...

let's see how validation goes.

I've run dozens of tasks on the windows/AMD v.011 code.  Most of these WUs are old ones with multiple tasks run on various hosts with older versions of the application, and I've gotten no validations from them.

But I recently got a validation, and the fabulous news is that my quorum partner was Windows/Cuda55 (so Nvidia) also running v0.11

As I had seen zero previous cases of successful validation for my AMD against Nvidia quorum partners of any description, this is a ray of hope on the validation front.

https://einsteinathome.org/workunit/667201533

Time will tell if this is a false dawn, or a harbinger of better times ahead.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250672854
RAC: 34831

Actually the only validation

Actually the only validation we're interested in is the one among the four latest app versions (0.08 on Linux, 0.11 on Windows). All other app versions that results may be still around from (including, unfortunately, the Windows 0.08) have a slightly different computation code that would likely prevent successful validation.

But it seems we're getting closer, so far we have 361 valid and only one invalid from comparison of these app versions. Best validation by far so far.

BM

Alexander Favorsky
Alexander Favorsky
Joined: 18 Jun 16
Posts: 36
Credit: 176714206
RAC: 76318

Hi! I'm doing v0.11 on

Hi! I'm doing v0.11 on Windows right now and it has been stuck at 99.998% for almost an hour. I guess I'm gonna abort it if computation time exceeds 3 hours.

Ereignishorizont
Ereignishorizont
Joined: 17 May 21
Posts: 19
Credit: 3025782861
RAC: 1451347

Nearly all of the v0.08-WUs

Nearly all of the v0.08-WUs have massive problems on my computers. They start but the GPU doesn't compute anything from the beginning. The load is between 1% and 10%, mostly at 1%. Maybe that's the regular load caused by Linux.

 

Even though they show very low progress, I aborted them because they get stuck earlier or later.

 

 

Alexander Favorsky
Alexander Favorsky
Joined: 18 Jun 16
Posts: 36
Credit: 176714206
RAC: 76318

Alexander Favorsky

Alexander Favorsky wrote:

Hi! I'm doing v0.11 on Windows right now and it has been stuck at 99.998% for almost an hour. I guess I'm gonna abort it if computation time exceeds 3 hours.

Have aborted the task at 4:13:12 of running time and still at 99.998%.

Maybe there's something wrong with v0.11 because v0.03 worked fine.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250672854
RAC: 34831

0.12 is out, it should fix

0.12 is out, it should fix the memory issue reported by Petri33 (Thanks!.

Hopefully this is the only reason for the "hang" problem.

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250672854
RAC: 34831

FWIW The memory issue that

FWIW The memory issue that probably caused the "hang" problem has been in the BRP7 code all along, since version 0.01. It's a matter of the data (i.e. workunit) whether and when it's triggered, not so much of the app version.

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.