Is there a possibility to exclude these "special" WUs without withdraw all GPU Jobs in general?
There aren't any "special" workunits. All of them are likely to fail on a GPU with only 4GB VRAM, particularly if some of that VRAM is reserved by the OS to run your display or for other things you are doing.
All you need to do is change your project preferences to exclude the All Sky search from your list of allowed searches. Just remove the tick mark for that search and make sure to scroll to the bottom and "Save Changes".
If you have set up different "Locations" (previously known as venues) make the change for the location your computer is set to. If you don't know what "Locations" are and you haven't previously used them, just ignore this last bit.
Machine has a 2 TB SSD dedicated to BOINC, and has been processing tasks successfully before and since - I think it's the tasks, rather than my machine.
Thanks for pointing this out! The disk limit certainly needs an update.
I noticed that the checkpoint alone takes a whopping 150MB, and it only covers the first GPU processing stage, which should taks <10 Min. I wonder whether we should turn that off altogether? Setting your preferences to "write to disk at most ..." to a time longer than the GPU part takes to run should do that for you, but you may not want to o this for all apps of all projects...
So...I am getting only "computation error" from any work units for the appllication
All-Sky Gravitational Wave search on O3 1.04 (GW-opencl-nvidia)
Are you aware of this? I think I get this on my 9th-gen I9, and am now getting it on my 12th-gen I9. This machine has an Nvidia RTX-A2000 in it.
The first thing to do is to review the std_err
For yours you'll find the text string:
CL_MEM_OBJECT_ALLOCATION_FAILURE
That is a big clue that trying to run these big tasks on your 4Gbyte laptop GPU is a mismatch. You could also look around the forums and notice that this is among the most frequently mentioned recent difficulties.
My suggestion is to disallow this application on that machine. It will be better for you, and better for the project.
Bernd, Is it possible to release the VRAM when the recalc phase begins? Was trying to run 3 O3AS WUs on a 12 GB GPU but they fail even though one or two of them are in the recalc phase and barely even using the GPU.
Guess they must be staggered for that to work. What's the BOINC command :-?
Bernd, Is it possible to release the VRAM when the recalc phase begins? Was trying to run 3 O3AS WUs on a 12 GB GPU but they fail even though one or two of them are in the recalc phase and barely even using the GPU.
Guess they must be staggered for that to work. What's the BOINC command :-?
Actually that's some optimization that we already thought of, though we didn't finish the implementation yet. In principle the BOINC client should start only so much tasks in parallel as fit in the available GPU memory, thus by adjusting the memory size (and free cores) one should be able to convince the client to start another GPU task when the memory is freed by the one still running. However I think the client only performs this check every five minutes, which might not be fine-grained enough. Also too I would change the app such that a memory allocation failure becomes a "transient" error, so the client would start the same task again after some time. For now, though, these are just thoughts, we never tried that, and there are a few other more urgent problems to solve. But we'll keep that in mind.
i don't think BOINC has any mechanism to monitor actual GPU VRAM use. but it does for system memory use. the "available memory" metric seems to only be checked at startup of the client for collecting coproc info. this is subsequently transmitted to the projects under that metric, but it never changes.
MyrCu wrote:Is there a
)
There aren't any "special" workunits. All of them are likely to fail on a GPU with only 4GB VRAM, particularly if some of that VRAM is reserved by the OS to run your display or for other things you are doing.
All you need to do is change your project preferences to exclude the All Sky search from your list of allowed searches. Just remove the tick mark for that search and make sure to scroll to the bottom and "Save Changes".
If you have set up different "Locations" (previously known as venues) make the change for the location your computer is set to. If you don't know what "Locations" are and you haven't previously used them, just ignore this last bit.
Cheers,
Gary.
Thank you verymuch for that
)
Thank you verymuch for that hint. I did not discoverd that before.
Richard Haselgrove
)
Thanks for pointing this out! The disk limit certainly needs an update.
I noticed that the checkpoint alone takes a whopping 150MB, and it only covers the first GPU processing stage, which should taks <10 Min. I wonder whether we should turn that off altogether? Setting your preferences to "write to disk at most ..." to a time longer than the GPU part takes to run should do that for you, but you may not want to o this for all apps of all projects...
BM
So...I am getting only
)
So...I am getting only "computation error" from any work units for the appllication
All-Sky Gravitational Wave search on O3 1.04 (GW-opencl-nvidia)
Are you aware of this? I think I get this on my 9th-gen I9, and am now getting it on my 12th-gen I9. This machine has an Nvidia RTX-A2000 in it.
Allnight wrote: So...I am
)
The first thing to do is to review the std_err
For yours you'll find the text string:
That is a big clue that trying to run these big tasks on your 4Gbyte laptop GPU is a mismatch. You could also look around the forums and notice that this is among the most frequently mentioned recent difficulties.
My suggestion is to disallow this application on that machine. It will be better for you, and better for the project.
... back to basics ...
)
... back to basics ...
Trying to make more efficient
)
Bernd, Is it possible to release the VRAM when the recalc phase begins? Was trying to run 3 O3AS WUs on a 12 GB GPU but they fail even though one or two of them are in the recalc phase and barely even using the GPU.
Guess they must be staggered for that to work. What's the BOINC command :-?
<stagger>1</stagger>
Aurum wrote: Bernd, Is it
)
there is no stagger command like that
_________________________________________________________________________
Actually that's some
)
Actually that's some optimization that we already thought of, though we didn't finish the implementation yet. In principle the BOINC client should start only so much tasks in parallel as fit in the available GPU memory, thus by adjusting the memory size (and free cores) one should be able to convince the client to start another GPU task when the memory is freed by the one still running. However I think the client only performs this check every five minutes, which might not be fine-grained enough. Also too I would change the app such that a memory allocation failure becomes a "transient" error, so the client would start the same task again after some time. For now, though, these are just thoughts, we never tried that, and there are a few other more urgent problems to solve. But we'll keep that in mind.
BM
i don't think BOINC has any
)
i don't think BOINC has any mechanism to monitor actual GPU VRAM use. but it does for system memory use. the "available memory" metric seems to only be checked at startup of the client for collecting coproc info. this is subsequently transmitted to the projects under that metric, but it never changes.
_________________________________________________________________________