With the change in GPU memory requirements, I had started a 4GB RX 570 GPU on v1.06 tasks and had no trouble running them at x1 and even (as an experiment) also at x2 with significant improvement in output and no problems at all.
Having depleted the initial 1 day cache of work, and with the new v1.07 app expected to be used, I've been trying for some hours to get further tasks. No new tasks are supplied, just the message to look at the scheduler log. Here is what seems to be the relevant line:-
It appears like the min RAM is now closer to 5GB. Did this get changed for the new v1.07 app??
EDIT:
Rather than have the host doing nothing, I edited coproc_info.xml and set the GPU RAM to be slightly above the above-listed minimum value. After saving, I set the immutable bit on the file. Stopping and restarting BOINC immediately allowed a bunch of new tasks for the previous v1.06 app. That was also somewhat confusing since I was expecting to see a v1.07 app.
However, tasks are crunching as per normal again so I'll see what happens next.
The GPU RAM requirement is that of the old-style workunits for the old GW-opencl-* plan classes (without "-2"). It could be that the SFT files your computer has drag the scheduler towards the old workunits (which come with the old apps and RAM requirements). If this is only a single computer, resetting the project might help to clear out old files and start with new workunits.
Your system is very similar to some of ours. Run multiple. Two minimum, perhaps up to four. Try running multiple, watch the times, and see which one gives you the best results. Right now, I am running 4x on our Threadripper Pro systems with single A4500 GPUs.
I originally postet in the wrong forum and was asked to post here again:
One question, since there still is quite some GPU downtime with the GW tasks, especially with older CPUs. But also when I run 2 task the same time with offset starting times the GPU drops down to 50% of it's performance while the CPU is calculating one task. A single task is not using all of the GPU on neither of my systems. Wether I run them at 1x or 2x.
Now CPUs have much more cores than a system has GPUs.
Is there a way to load the next task into the GPU already while the current task is finished by the CPU?
And also pre-calculate the next task by the CPU. So with a fast GPU but slow CPU system for example you have 2-3 cores finishing an task, the GPU is calculating with support of two cores and 2-3 cores are preparing a task. That would be full utilisation of a 8 core system for GW tasks. If possible to handle tasks like that within the boinc app wouldn't this be a way to enhance GPU efficency and get a lot more tasks done faster?
I originally postet in the wrong forum and was asked to post here again:
You were advised that you had posted in the wrong thread (the O3MD1 thread), not necessarily the wrong forum. If you had looked at the last message in that thread before you posted, you would have seen Bernd's comment about that O3MD1 CPU search having completely finished. He even mentioned the replacement search (O3AS) and that it would be GPU only. Perhaps that might help you understand that running extra calculations on a whole bunch of CPU cores (as you seem to be enquiring about) wont be possible.
Please understand the purpose of announcement threads in the Tech News Forum. They allow the Devs to advise of new searches, changes to existing searches, or completion of older searches, along with other more general announcements. If you contribute to a particular search you should stay up-to-date with any information being provided by the Devs concerning that search.
The other function is to allow skilled volunteers to post back performance information, or any details of bugs or other problems being experienced. It's not meant to be a general help forum if you're not really familiar with bug testing new releases so you should really be posting questions about how best to run the app either in the Problems Forum or Cruncher's Corner.
If you take the time to read through all the commentry since the O3AS run started back in June, you may find a lot of your queries have already been answered.
I had a quick look at your hosts list - all three are running the O3AS app. If your query relates to the oldest host that is taking about 12 hours per task, you should really stop running O3AS and try BRP7 instead. My guess is that it will be able to run BRP7 tasks in around 30 mins each, even with an old and slow CPU since BRP7 needs very little CPU support. O3AS needs a *lot* of fast CPU support since certain things aren't (yet) able to be done efficiently on the GPU. Please read the thread to see comments about this if you are interested.
Thank you for your reply, yes I intend this as a thought for further development, not to optimise current crunching. I am fully aware that this is not possible to change in the client, might not be possible to program at all within the boinc parameters.
I followed the discussions in the crunchers corner forums, and some pages in this thread, maybe not enough here which is also due to me being overworked at the moment. Sorry if that was not appropriate. Yet I think this is a tech-development related suggestion.
Yes exactly O3AS needs a lot of CPU time but it has quite a few minutes where it needs the CPU only. Hence my thought if it's possible to modify the project in a way that allows to already start the next task on a CPU core before the GPU intensive part is done. So now there are 2 GPU tasks running although only 1 on the GPU.
Once the GPU part of that is done the CPU finishes the calculation and the now not required GPU can jump straight Into the GPU part of the next task whose initial CPU part was already calculated.
Because even when I run the app at 2x on my fastest machine, that has a modern, fast CPU with an entry level GPU, when one task has it's CPU only part the GPU drops down to 50% load and power draw. Only when both tasks have their GPU part running the GPU is being used to the fullest.
I of course don't know if it's possible or worth the effort, but observing how much time the GPU is not used to the fullest - and as I understand these calculations can't be done by a GPU, edit: missed the "yet" you wrote, so maybe that makes my post irrelevant.
BRP7 tasks do not run on my oldest machine, this was already discussed in another forum, But thank you for the suggestion.
So are the old 4 GB needing work units gone completely now? I would like to run more at the same time but only if I won't get a mix of new, small, work units and old big ones that won't fit.
They are both getting 5000
)
They are both getting 5000 credits, but the claimed computation size is 5 times higher.
1000 to 5000 is also 5x ;)
)
1000 to 5000 is 5x
144000 to 720000 is also 5x ;)
it's no coincidence
_________________________________________________________________________
With the change in GPU memory
)
With the change in GPU memory requirements, I had started a 4GB RX 570 GPU on v1.06 tasks and had no trouble running them at x1 and even (as an experiment) also at x2 with significant improvement in output and no problems at all.
Having depleted the initial 1 day cache of work, and with the new v1.07 app expected to be used, I've been trying for some hours to get further tasks. No new tasks are supplied, just the message to look at the scheduler log. Here is what seems to be the relevant line:-
2023-11-10 09:20:17.4552 [PID=404245] [version] OpenCL GPU RAM required min: 4718592000.000000, supplied: 4172214272
It appears like the min RAM is now closer to 5GB. Did this get changed for the new v1.07 app??
EDIT:
Rather than have the host doing nothing, I edited coproc_info.xml and set the GPU RAM to be slightly above the above-listed minimum value. After saving, I set the immutable bit on the file. Stopping and restarting BOINC immediately allowed a bunch of new tasks for the previous v1.06 app. That was also somewhat confusing since I was expecting to see a v1.07 app.
However, tasks are crunching as per normal again so I'll see what happens next.
Cheers,
Gary.
The GPU RAM requirement is
)
The GPU RAM requirement is that of the old-style workunits for the old GW-opencl-* plan classes (without "-2"). It could be that the SFT files your computer has drag the scheduler towards the old workunits (which come with the old apps and RAM requirements). If this is only a single computer, resetting the project might help to clear out old files and start with new workunits.
BM
I just run one at a time is
)
I just run one at a time is there good benift to run mutiple?
Your system is very similar
)
Your system is very similar to some of ours. Run multiple. Two minimum, perhaps up to four. Try running multiple, watch the times, and see which one gives you the best results. Right now, I am running 4x on our Threadripper Pro systems with single A4500 GPUs.
I originally postet in the
)
I originally postet in the wrong forum and was asked to post here again:
One question, since there still is quite some GPU downtime with the GW tasks, especially with older CPUs. But also when I run 2 task the same time with offset starting times the GPU drops down to 50% of it's performance while the CPU is calculating one task. A single task is not using all of the GPU on neither of my systems. Wether I run them at 1x or 2x.
Now CPUs have much more cores than a system has GPUs.
Is there a way to load the next task into the GPU already while the current task is finished by the CPU?
And also pre-calculate the next task by the CPU. So with a fast GPU but slow CPU system for example you have 2-3 cores finishing an task, the GPU is calculating with support of two cores and 2-3 cores are preparing a task. That would be full utilisation of a 8 core system for GW tasks. If possible to handle tasks like that within the boinc app wouldn't this be a way to enhance GPU efficency and get a lot more tasks done faster?
B.I.G wrote:I originally
)
You were advised that you had posted in the wrong thread (the O3MD1 thread), not necessarily the wrong forum. If you had looked at the last message in that thread before you posted, you would have seen Bernd's comment about that O3MD1 CPU search having completely finished. He even mentioned the replacement search (O3AS) and that it would be GPU only. Perhaps that might help you understand that running extra calculations on a whole bunch of CPU cores (as you seem to be enquiring about) wont be possible.
Please understand the purpose of announcement threads in the Tech News Forum. They allow the Devs to advise of new searches, changes to existing searches, or completion of older searches, along with other more general announcements. If you contribute to a particular search you should stay up-to-date with any information being provided by the Devs concerning that search.
The other function is to allow skilled volunteers to post back performance information, or any details of bugs or other problems being experienced. It's not meant to be a general help forum if you're not really familiar with bug testing new releases so you should really be posting questions about how best to run the app either in the Problems Forum or Cruncher's Corner.
If you take the time to read through all the commentry since the O3AS run started back in June, you may find a lot of your queries have already been answered.
I had a quick look at your hosts list - all three are running the O3AS app. If your query relates to the oldest host that is taking about 12 hours per task, you should really stop running O3AS and try BRP7 instead. My guess is that it will be able to run BRP7 tasks in around 30 mins each, even with an old and slow CPU since BRP7 needs very little CPU support. O3AS needs a *lot* of fast CPU support since certain things aren't (yet) able to be done efficiently on the GPU. Please read the thread to see comments about this if you are interested.
Cheers,
Gary.
Thank you for your reply, yes
)
Thank you for your reply, yes I intend this as a thought for further development, not to optimise current crunching. I am fully aware that this is not possible to change in the client, might not be possible to program at all within the boinc parameters.
I followed the discussions in the crunchers corner forums, and some pages in this thread, maybe not enough here which is also due to me being overworked at the moment. Sorry if that was not appropriate. Yet I think this is a tech-development related suggestion.
Yes exactly O3AS needs a lot of CPU time but it has quite a few minutes where it needs the CPU only. Hence my thought if it's possible to modify the project in a way that allows to already start the next task on a CPU core before the GPU intensive part is done. So now there are 2 GPU tasks running although only 1 on the GPU.
Once the GPU part of that is done the CPU finishes the calculation and the now not required GPU can jump straight Into the GPU part of the next task whose initial CPU part was already calculated.
Because even when I run the app at 2x on my fastest machine, that has a modern, fast CPU with an entry level GPU, when one task has it's CPU only part the GPU drops down to 50% load and power draw. Only when both tasks have their GPU part running the GPU is being used to the fullest.
I of course don't know if it's possible or worth the effort, but observing how much time the GPU is not used to the fullest - and as I understand these calculations can't be done by a GPU, edit: missed the "yet" you wrote, so maybe that makes my post irrelevant.
BRP7 tasks do not run on my oldest machine, this was already discussed in another forum, But thank you for the suggestion.
So are the old 4 GB needing
)
So are the old 4 GB needing work units gone completely now? I would like to run more at the same time but only if I won't get a mix of new, small, work units and old big ones that won't fit.
Thank you.