cross referencing with actual GPU mem use in nvidia-smi, it looks like that "GPU RAM calculated: min:" value is about how much the memory the WU will use on the GPU (i see about 1GB and 1.8GB used on the GPUs running GW tasks on that system). it doesn't look like it's referencing what the card actually has available at all though. at least not for the nvidia app.
So it's just some kind of internal calculation to do a request to the GPU? And not going to prevent you getting them when it's too big? How come it didn't send any GW to Tbar's HD7750? Something made it realise that card couldn't run it.
like I mentioned in my post, it doesn't look like it's doing the memory check for nvidia cards. i don't see that line under the check for cuda devices in mine or Tbar's logs
like I mentioned in my post, it doesn't look like it's doing the memory check for nvidia cards. i don't see that line under the check for cuda devices in mine or Tbar's logs
Maybe something else prevents his HD7750 getting them. Compute capabilities etc. Not sure about for GPUs, but Boinc when starting lists all the SSE2 etc your CPU can do.
Oh well, unless they can fix it, I'm on Gamma and Milkyway only.
If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.
I made my test bench with a 4GB GTX 1650, report to BOINC that it only has 1GB of memory. this did not stop the Science App from using more and load up 1.8GB of data to the GPU, as I said would happen since what BOINC and the Science App are doing are totally separate from each other.
I'll wait until the next schedule request for a new task to see if it will send me another one, or not. see if it's doing the check for nvidia cards. The only way I can think that the Scheduler would know what the GPU memory is, is by passing along what BOINC detected. so we'll see.
like I mentioned in my post, it doesn't look like it's doing the memory check for nvidia cards. i don't see that line under the check for cuda devices in mine or Tbar's logs
Maybe something else prevents his HD7750 getting them. Compute capabilities etc. Not sure about for GPUs, but Boinc when starting lists all the SSE2 etc your CPU can do.
Oh well, unless they can fix it, I'm on Gamma and Milkyway only.
It looks pretty straight foreword to me. The Server calculates the amount of vram needed for the WU it's considering sending, compares it to what's available on the GPU, and doesn't send the task.
I was about to get the scheduler to finally deny my a task. in my previous attempt to trick the scheduler, I edited the available ram metrics (this is what shows on your host page). it wasn't until I reduced the global_mem_size in the coproc file, that the scheduler now says I don't have enough memory. the Science App still runs the task fine though if you already have it since what BOINC says doesn't matter to the Science App
so it looks like the check happens on nvidia cards too, but it doesnt flag out in the log unless you actually violate the limit. and its checking the gloabal_mem_size as detected by BOINC (not available size) to send you a task.
doesn't explain why 3GB cards are being sent tasks that are too large though, unless the scheduler is underestimating the amount of GPU ram that the WU needs or something. I'll have to watch it a little more closely to see what happens.
Oh, the Server is checking the NV cards, and when it doesn't find a problem it gives a "plan class ok" instead of a comparison. You don't see that OK in my HD7750 log. I see it on my NV Hosts though. The question is why is Your BOINC giving that false 1650 ram reading? Did you hack something? Are you still running that Highly Edited version of BOINC? You do realize any time you post some problem with BOINC people are going to ask which BOINC are you running and did you change it.
Considering the highest vram estimate I've seen for a GW tasks is under 2 GB, I's say it's being underestimated by the tool.
Oh, the Server is checking the NV cards, and when it doesn't find a problem it gives a "plan class ok" instead of a comparison. You don't see that OK in my HD7750 log. I see it on my NV Hosts though. The question is why is Your BOINC giving that false 1650 ram reading? Did you hack something? Are you still running that Highly Edited version of BOINC? You do realize any time you post some problem with BOINC people are going to ask which BOINC are you running and did you change it.
Considering the highest vram estimate I've seen for a GW tasks is under 2 GB, I's say it's being underestimated by the tool.
reading comprehension usually helps in these situations ;)
It should have been clear that I purposefully edited the coproc_info.xml file (using information that YOU posted in the past no less) as a test to try to trigger the issue (since I do not have any GPUs on hand with less than 3GB of VRAM) and find out what EXACTLY the scheduler is looking at. which I did. the scheduler is looking at the global_mem_size parameter under the opencl section.
the first issue I see is that it's checking global mem size instead of available mem size, since the cards running a monitor or desktop environment will have some of their GPU memory unavailable.
the second issue is that even a task taking up 3200+MB of GPU ram should trigger this conflict and not get sent the job since these 3GB cards only show a global mem size of about 3017MB. which is why I think the scheduler might be underestimating the WU size. I need to catch one in the act so I can check the log.
The first problem I see is you are trying to trouble shoot something while running Highly Edited versions of everything. That doesn't fly with anyone knowledgeable about such things. If you want to troubleshoot the Project, then use what the project is designed to use. That's what I'm doing, and I made the call by just looking at the vram estimates of a number of tasks the Server was trying to send. None of the estimates were over 2 GBs when I know they should be higher.
Peter Hucker
)
like I mentioned in my post, it doesn't look like it's doing the memory check for nvidia cards. i don't see that line under the check for cuda devices in mine or Tbar's logs
_________________________________________________________________________
Ian&Steve C. wrote:like I
)
Maybe something else prevents his HD7750 getting them. Compute capabilities etc. Not sure about for GPUs, but Boinc when starting lists all the SSE2 etc your CPU can do.
Oh well, unless they can fix it, I'm on Gamma and Milkyway only.
If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.
it's also possible that it
)
it's also possible that it only spits out that error if it sees a violation.
_________________________________________________________________________
I made my test bench with a
)
I made my test bench with a 4GB GTX 1650, report to BOINC that it only has 1GB of memory. this did not stop the Science App from using more and load up 1.8GB of data to the GPU, as I said would happen since what BOINC and the Science App are doing are totally separate from each other.
I'll wait until the next schedule request for a new task to see if it will send me another one, or not. see if it's doing the check for nvidia cards. The only way I can think that the Scheduler would know what the GPU memory is, is by passing along what BOINC detected. so we'll see.
_________________________________________________________________________
Peter Hucker
)
It looks pretty straight foreword to me. The Server calculates the amount of vram needed for the WU it's considering sending, compares it to what's available on the GPU, and doesn't send the task.
The CPU on that machine is a 6th gen Intel, it has up to AVX2.
my test bench which is only
)
my test bench which is only reporting 1GB of ram, still gets sent work >1GB.
https://einsteinathome.org/host/12830576
looks like it's not doing the ram comparison for nvidia cards. must be only for ATI/AMD cards._________________________________________________________________________
update: I was about to get
)
update:
I was about to get the scheduler to finally deny my a task. in my previous attempt to trick the scheduler, I edited the available ram metrics (this is what shows on your host page). it wasn't until I reduced the global_mem_size in the coproc file, that the scheduler now says I don't have enough memory. the Science App still runs the task fine though if you already have it since what BOINC says doesn't matter to the Science App
so it looks like the check happens on nvidia cards too, but it doesnt flag out in the log unless you actually violate the limit. and its checking the gloabal_mem_size as detected by BOINC (not available size) to send you a task.
doesn't explain why 3GB cards are being sent tasks that are too large though, unless the scheduler is underestimating the amount of GPU ram that the WU needs or something. I'll have to watch it a little more closely to see what happens.
_________________________________________________________________________
Oh, the Server is checking
)
Oh, the Server is checking the NV cards, and when it doesn't find a problem it gives a "plan class ok" instead of a comparison. You don't see that OK in my HD7750 log. I see it on my NV Hosts though. The question is why is Your BOINC giving that false 1650 ram reading? Did you hack something? Are you still running that Highly Edited version of BOINC? You do realize any time you post some problem with BOINC people are going to ask which BOINC are you running and did you change it.
Considering the highest vram estimate I've seen for a GW tasks is under 2 GB, I's say it's being underestimated by the tool.
TBar wrote: Oh, the Server
)
reading comprehension usually helps in these situations ;)
It should have been clear that I purposefully edited the coproc_info.xml file (using information that YOU posted in the past no less) as a test to try to trigger the issue (since I do not have any GPUs on hand with less than 3GB of VRAM) and find out what EXACTLY the scheduler is looking at. which I did. the scheduler is looking at the global_mem_size parameter under the opencl section.
the first issue I see is that it's checking global mem size instead of available mem size, since the cards running a monitor or desktop environment will have some of their GPU memory unavailable.
the second issue is that even a task taking up 3200+MB of GPU ram should trigger this conflict and not get sent the job since these 3GB cards only show a global mem size of about 3017MB. which is why I think the scheduler might be underestimating the WU size. I need to catch one in the act so I can check the log.
_________________________________________________________________________
The first problem I see is
)
The first problem I see is you are trying to trouble shoot something while running Highly Edited versions of everything. That doesn't fly with anyone knowledgeable about such things. If you want to troubleshoot the Project, then use what the project is designed to use. That's what I'm doing, and I made the call by just looking at the vram estimates of a number of tasks the Server was trying to send. None of the estimates were over 2 GBs when I know they should be higher.