32 bits constrained then GPU memory detected by Mod 4Gib

yuanchen huang
yuanchen huang
Joined: 23 Feb 21
Posts: 3
Credit: 1412811
RAC: 0
Topic 226283

Hi colleagues,

I changed my GPU just now, from a 8Gib GPU to a 11Gib one, while the detected memory varied from 4Gib to 3Gib. I wonder that whether the core of GW computation here is still the 32-bits one?

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3963
Credit: 47180552642
RAC: 65402884

detection of GPU memory

detection of GPU memory happens via BOINC, and nothing to do with Einstein tasks.

 

but yes, BOINC still uses an old memory detection method for Nvidia GPUs that is limited to 32-bits. so it can only read a maximum of 4095 MB (4GB) for Nvidia. the memory detection used for AMD doesn't have this limitation.

this limitation has been known for a long time, and BOINC devs have forever blamed the issue on Nvidia, saying there was nothing they could do about it, but a member of our team actually discovered that BOINC was simply using an old detection method and that a new method does indeed exist. He modified the BOINC code and recompiled the BOINC client with this new method and proved that it works. this information has been shared with BOINC devs, but they don't seem interested in fixing it since it hasn't been implemented. It's been about a year already.

_________________________________________________________________________

yuanchen huang
yuanchen huang
Joined: 23 Feb 21
Posts: 3
Credit: 1412811
RAC: 0

Yes, it's 4095 MB detected as

Yes, it's 4095 MB detected as the maximum memory of Nvidia, in both cases of P4000 card and 2080 ti card. I missed the same report of this on 2080ti(3105 is my free disk space), just now I got a nvidia-smi warning about my P4000 Nvidia card, then may be a memory error(a trust problem?). This is my first CUDA device, I lost my mind at that time. Thanks.

Joseph Stateson
Joseph Stateson
Joined: 7 May 07
Posts: 174
Credit: 3072267561
RAC: 529410

yuanchen huang wrote: Yes,

yuanchen huang wrote:

Yes, it's 4095 MB detected as the maximum memory of Nvidia, in both cases of P4000 card and 2080 ti card. I missed the same report of this on 2080ti(3105 is my free disk space), just now I got a nvidia-smi warning about my P4000 Nvidia card, then may be a memory error(a trust problem?). This is my first CUDA device, I lost my mind at that time. Thanks.

 

What was the error message?   

 

The free program MemtestCL uses openCL tools to test memory.  I have used it on Gtx1070 that I thought had a problem.  I ended up junking the motherboard and the card has been working fine on another mombo.

There is a companion version for CUDA MEmtestG80 but I have not used it.

 

yuanchen huang
yuanchen huang
Joined: 23 Feb 21
Posts: 3
Credit: 1412811
RAC: 0

Thanks, the error appeared

Thanks, the error appeared when I installed my P4000 card, I test the same board (a Dell 7910 case with X99) with my 2080ti, and no error appeared up to now. The error was reported by `nvidia-smi` that

WARNING: infoROM is corrupted at gpu 0000:03:00.0

WARNING: infoROM is corrupted at gpu 0000:03:00.0

(this is the same warning of my P4000 card and reported of a Titan X on Nvidia Forum)

and the Nvidia Official said it's an error of the non-volatile memory(ROM). I will try your suggestion, but I guess there is something wrong with my buying for this card since I found the warranty location is the Australia.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.