Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3965
Credit: 47236132642
RAC: 65380097

Not sure what our Lord and

Not sure what our Lord and savior has to do with this.

I agree with you about the 27% thing (or rather you agree with me? since you replied to my post about it) but I was addressing your comment about what BOINC detects, I assume you wouldn't mention this if you didn't think it was relevant? the driver doesn't "tell" anything. whatever app that wants the info has to probe/query for it, and the driver then responds.

I'm sure the apps here at Einstein are working fine, but that specific information isn't printed into the stderr file like the SETI apps did.

bottom line, 3GB and less cards simply don't have enough memory for GW tasks these days, and it's not because of any artificial OpenCL limit.

 

 

_________________________________________________________________________

TBar
TBar
Joined: 3 Apr 20
Posts: 24
Credit: 891961726
RAC: 0

Stderr

Stderr output

<core_client_version>7.16.5</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 8 CUDA device(s):
  Device 1: GeForce RTX 2070, 7982 MiB, regsPerBlock 65536
     computeCap 7.5, multiProcs 36 
     pciBusID = 3, pciSlotID = 0
  Device 2: GeForce GTX 1080 Ti, 11176 MiB, regsPerBlock 65536
     computeCap 6.1, multiProcs 28 
     pciBusID = 1, pciSlotID = 0
  Device 3: GeForce GTX 1080 Ti, 11178 MiB, regsPerBlock 65536
     computeCap 6.1, multiProcs 28 
     pciBusID = 2, pciSlotID = 0
  Device 4: GeForce GTX 1070, 8119 MiB, regsPerBlock 65536
     computeCap 6.1, multiProcs 15 
     pciBusID = 4, pciSlotID = 0
  Device 5: GeForce GTX 1070, 8119 MiB, regsPerBlock 65536
     computeCap 6.1, multiProcs 15 
     pciBusID = 6, pciSlotID = 0
  Device 6: GeForce GTX 1070, 8119 MiB, regsPerBlock 65536
     computeCap 6.1, multiProcs 15 
     pciBusID = 7, pciSlotID = 0
  Device 7: GeForce GTX 1070, 8119 MiB, regsPerBlock 65536
     computeCap 6.1, multiProcs 15 
     pciBusID = 8, pciSlotID = 0
  Device 8: GeForce GTX 1070, 8119 MiB, regsPerBlock 65536
     computeCap 6.1, multiProcs 15 
     pciBusID = 9, pciSlotID = 0

That's what happens when the App listens to the driver query correctly. Most Apps use it, and have for a very long time.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3965
Credit: 47236132642
RAC: 65380097

again its not about

again its not about "listening" correctly. it's about ASKING correctly in the first place. and the query being used in petris  code is not the same as either the query being used in BOINC, nor the fix the Ville implemented. BOINC right now says "What is the available memory, and give me the answer as a 32-bit number", Ville's code says "What is the available memory, and give me the answer as a 64-bit number", this is generalized from the different API calls being implemented between the two.

--edit---

this is what the Developer says:

Ville wrote:
Boinc uses a function named cuDeviceTotalMem() to get the memory size. That function is ancient and clamps the size to 32 bits even in modern versions of the cuda lib probably to maintain compatibility.

I checked what Special Sauce does to get the memory, but it was using a completely different api so doing that in boinc client would have required huge changes.

Then I snooped what symbol names the libcuda.so where that cuDeviceTotalMem comes from contains. There was a mysterious cuDeviceTotalMem_v2, so I tried what will happen if I make Boinc call that instead. It worked!

I made the change so that if the symbol is not found, then it uses the old version. So if the client is running in a system where the nvidia drivers are so ancient that the _v2 function doesn't exist, then it will just show wrong memory size instead of crashing.

Ville wrote:
Boinc coders had blamed Nvidia for the problem but it really was a Boinc problem.

The library has the old 32 version in it to stay binary compatible with the code compiled against the old version of the library. Any new 64 bit code normally linked with the library would be compiled using 64 bit header files that add that _v2 to the symbol name resolved from the library when the code calls the function and everything works correctly.

But boinc isn't using the library with normal linking. It wants to avoid dependency on Cuda development stuff, so it doesn't use any headers and accesses the library 'the hard way' by the running code finding the library file and extracting indvidual symbol names from it and casting them to function pointers to be used. If you do 'manual' linking this way, then it is your responsibility to handle the different function versions too. Boinc didn't do this, so the bug was entirely in their end.

 

reading a little more about the limit, it seems like it's still there (as I can find recent posts about it), but the limit is only implemented for a SINGLE buffer. I'm sure most science apps are getting around the limit simply by loading up multiple smaller buffers using less memory each. which makes sense for a highly parallelized application running on a processor with thousands of cores available.

 

it's just not a factor in why the tasks are failing here on cards with 3GB or less.

_________________________________________________________________________

TBar
TBar
Joined: 3 Apr 20
Posts: 24
Credit: 891961726
RAC: 0

Which is exactly what I said.

Which is exactly what I said. 32-bits only sees 4 GB.

SBS, Single Buffer Size, Is Not Total Memory. You might be able to use the Environmental variable to raise that, it use to work fine back when it mattered. I don't think it has mattered in a very long time.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3965
Credit: 47236132642
RAC: 65380097

I think we're talking cross

I think we're talking cross issues here. I initially only questioned why you mentioned what BOINC detects, as it has nothing to do with the OpenCL 27% limit. BOINC incorrectly getting memory size on nvidia has nothing to do with the openCL limit for single buffers.

you said

TBar wrote:
The driver tells BOINC how much vram can be used at startup, that is the working number.

and I merely commented that 'BOINC', while getting the wrong value, has nothing to do with it. If it did, then the incorrect value it pulls might limit you to only being able to use 4GB, but it doesn't so no point mentioning what BOINC does or does not see. its all up to the science app at that point. but there is no problem there. 

_________________________________________________________________________

TBar
TBar
Joined: 3 Apr 20
Posts: 24
Credit: 891961726
RAC: 0

Again, Single Buffer Size is

Again, Single Buffer Size is Not Total memory. There was a time when BOINC reported a lower number, but it was Very long ago. The Single Buffer Size is Still in play, but it is Not Total memory. This is what the SETI OpenCL App reports.



OpenCL Platform Name:					 NVIDIA CUDA
Number of devices:				 1
  Max compute units:				 30
  Max work group size:				 1024
  Max clock frequency:				 1770Mhz
  Max memory allocation:			 1610612736
  Cache type:					 Read/Write
  Cache line size:				 128
  Cache size:					 491520
  Global memory size:				 6442450944
  Constant buffer size:				 65536
  Max number of constant args:			 9
  Local memory type:				 Scratchpad
  Local memory size:				 49152
  Queue properties:				 
    Out-of-Order:				 Yes
  Name:						 GeForce RTX 2060
  Vendor:					 NVIDIA Corporation
  Driver version:				 432.00
  Version:					 OpenCL 1.2 CUDA

Note: Max memory allocation & Global memory size

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3965
Credit: 47236132642
RAC: 65380097

TBar wrote: Again, Single

TBar wrote:

Again, Single Buffer Size is Not Total memory.

I'm not saying that it is. I'm explicitly saying that's not the case. I don't know how you could interpret it any differently. You're the one that brought up BOINC's total available memory detection. But the fact is that whatever BOINC detects has nothing to do with what the science app is doing, they are separate entities. anything printed in the stderr file is done so by the Science App, NOT by BOINC. it's a simple concept.

 

as far as I see it, if you try to use a single buffer in your code, then you can't use more than 25% of the memory, BUT if you instead parallelize with multiple buffers running simultaneously, you can use 100% of your memory, provided that each single buffer is under the limit. this is where the 3GB problem is coming into play most likely at Einstein. since the sum of all the individual buffers is exceeding 3GB Total available on the card. nothing to do with detection, everything to do with just running out of space. 

_________________________________________________________________________

TBar
TBar
Joined: 3 Apr 20
Posts: 24
Credit: 891961726
RAC: 0

Ian&Steve C. wrote: TBar

Ian&Steve C. wrote:

TBar wrote:

Again, Single Buffer Size is Not Total memory.

I'm not saying that it is. I'm explicitly saying that's not the case. I don't know how you could interpret it any differently. You're the one that brought up BOINC's total available memory detection. But the fact is that whatever BOINC detects has nothing to do with what the science app is doing, they are separate entities. anything printed in the stderr file is done so by the Science App, NOT by BOINC. it's a simple concept.

 

as far as I see it, if you try to use a single buffer in your code, then you can't use more than 25% of the memory, BUT if you instead parallelize with multiple buffers running simultaneously, you can use 100% of your memory, provided that each single buffer is under the limit. this is where the 3GB problem is coming into play most likely at Einstein. since the sum of all the individual buffers is exceeding 3GB Total available on the card. nothing to do with detection, everything to do with just running out of space. 

Actually BOINC reports the numbers correctly for NV cards with 4 GB or less, which is the numbers that matter in this case. This is precisely why No developers have bothered to correct the 27% myth. There are people like You who grasp on one point and keep repeating it over and over. The people who matter have known about BOINC reporting 32 bit NV numbers forever. It simply doesn't matter as the science Apps do the work. So why do you keep bringing it up? Don't answer, I know the reason.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3965
Credit: 47236132642
RAC: 65380097

TBar wrote:Actually BOINC

TBar wrote:

Actually BOINC reports the numbers correctly for NV cards with 4 GB or less

I never said that it didn't. you seem stuck on arguing a point that no one is making. the fact that BOINC reports correctly for less than 4GB is immaterial to the issue at hand. It literally has nothing to do with anything, except to be used as a diversion tactic I guess?

 

the point I made was that whatever BOINC reports in NO WAY affects what the science app is doing. BOINC could report 0GB of GPU ram and the science app would still work just fine.  YOU mentioned the numbers that BOINC reported first. I said they don't matter, and you proceeded to deflect and argue with me until you came full circle to claim they don't matter to the science app, which is exactly what I said in the first place LOL, can't make this stuff up. I only brought up the issue of BOINC reporting the incorrect VRAM for nvidia cards as a foot note to illustrate my knowledge of the issue since you seemed to be implying every app is getting the data the same way and that somehow BOINC just "listens wrong", which is objectively incorrect. there are multiple methods that can be used. fixing BOINC's issue is a nice to have, not a necessity, which also highlighted the point that whatever BOINC detects doesnt matter in actual processing (because if it did, then people running 2x with GW on <6GB cards would also see failures, but they dont. so this must also be incorrect).

 

the developers CAN'T correct the 27% issue even if they wanted to, nvidia themselves seem to have control of this. it's not so much a myth as it seems to be misunderstood. with people interpreting it to be a limit on total use when the limit is actually only applied to single buffer use. you get around the single buffer limit, by using multiple smaller buffers, allowing you to use all of the available memory if you wish.

_________________________________________________________________________

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519371204
RAC: 15292

Ian&Steve C. wrote:Not sure

Ian&Steve C. wrote:

Not sure what our Lord and savior has to do with this.

Do you have to bring religion into a scientific discussion?  I thought anyone running Einstein was clever enough to be beyond religion.  What next?  The earth is flat?

Maybe there should be a project that searches for god like SETI did for aliens?

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.