At my new PC I have an for me unusual high failure rate of 27% at O3(GPU)-tasks only. For other GPU-tasks this is close to zero. From server status page for O3MDF I learned, that (Task failed)/(Task valid + Task failed) = 4,9%, which is also high, but not so high as at my side. All though O3MD1 has a failure rate of 43% at server status page and the average failure rate over all applications I calculated to 8,7%. What a waste of crunching power!
I think that currently O3MD1 CPU workunits are generated with an erroneously small value for "memory bound". The workunits state ONLY ~1.9 GiB as upper memory limit but actually allocate 3.2 GiB:
This hinders the correct management of these tasks by BOINC and is possibly a reason for the currently high rate of 44% failed tasks for O3MD1 CPU, while this error rate for O3MDF GPU tasks is only about 4%.
I discussed this finding in more detail with example tasks in this thread in the 'problems' section.
Problem solved. There are reissued, previously failed tasks, with memory bound now set to ~3.2 GB. The problem will be out of the pipeline in a few days.
Problem solved. There are reissued, previously failed tasks, with memory bound now set to ~3.2 GB. The problem will be out of the pipeline in a few days.
This happens on the first day when we add a new "sub-search". The one we just added ("C2") is the last of the current "O3MD1" search (series).
This one seems trickier to get started than the ones before. We are still struggling with memory requirement predictions that seems to be way off. For this search in particular, as this was originally designed to run on CPUs and is not put on the GPU app.
This will be the most demanding sub-search in terms of memory. I manually raised the requirement to 4.5GB, but I'm still not sure that this will be sufficient.
When this is done, we will certainly revise the model our memory predictions are based on.
This happens on the first day when we add a new "sub-search". The one we just added ("C2") is the last of the current "O3MD1" search (series).
This one seems trickier to get started than the ones before. We are still struggling with memory requirement predictions that seems to be way off. For this search in particular, as this was originally designed to run on CPUs and is not put on the GPU app.
This will be the most demanding sub-search in terms of memory. I manually raised the requirement to 4.5GB, but I'm still not sure that this will be sufficient.
When this is done, we will certainly revise the model our memory predictions are based on.
Yay, time to install that spare stick of RAM!! Thanks for the explanation sir.
Hallo! At my new PC I have
)
Hallo!
At my new PC I have an for me unusual high failure rate of 27% at O3(GPU)-tasks only. For other GPU-tasks this is close to zero. From server status page for O3MDF I learned, that (Task failed)/(Task valid + Task failed) = 4,9%, which is also high, but not so high as at my side. All though O3MD1 has a failure rate of 43% at server status page and the average failure rate over all applications I calculated to 8,7%. What a waste of crunching power!
Can I reduce this by adjustments at my side?
Kind regards and happy crunching
Martin
I think that currently O3MD1
)
I think that currently O3MD1 CPU workunits are generated with an erroneously small value for "memory bound". The workunits state ONLY ~1.9 GiB as upper memory limit but actually allocate 3.2 GiB:
This hinders the correct management of these tasks by BOINC and is possibly a reason for the currently high rate of 44% failed tasks for O3MD1 CPU, while this error rate for O3MDF GPU tasks is only about 4%.
I discussed this finding in more detail with example tasks in this thread in the 'problems' section.
See also server status page: https://einsteinathome.org/server_status.php
[updated 17 Apr 2023, 10:25:01 UTC]
"O3MD1" (CPU):
Tasks...
"O3MDF" (GPU):
Scrooge McDuck wrote:I
)
This is true. While investigating, I stopped workunit generaton for O3MD1.
BM
The latest O3MD1 CPU
)
The latest O3MD1 CPU workunits now have NEGATIVE memory bound values.
Uuuh.... an overflowing (signed) INT32 set to ~3.5*10^9 (~3.5G bytes)? which gives ~ -796M ?
...
Problem solved. There are
)
Problem solved. There are reissued, previously failed tasks, with memory bound now set to ~3.2 GB. The problem will be out of the pipeline in a few days.
task name: h1_1413.60_O3aC01Cl0In0__O3MD1V2a_VelaJr1_1414.00Hz_537_1
Scrooge McDuck
)
WOO HOO!!! Thank you for your hard work in identifying the problem and for getting the right people at Einstein to fix it!!
hi, a quick question: The
)
hi, a quick question:
The status page shows negative values for the O3MDF tasks, does this mean that those WU's are 100% complete?
screenshot
This happens on the first day
)
This happens on the first day when we add a new "sub-search". The one we just added ("C2") is the last of the current "O3MD1" search (series).
This one seems trickier to get started than the ones before. We are still struggling with memory requirement predictions that seems to be way off. For this search in particular, as this was originally designed to run on CPUs and is not put on the GPU app.
This will be the most demanding sub-search in terms of memory. I manually raised the requirement to 4.5GB, but I'm still not sure that this will be sufficient.
When this is done, we will certainly revise the model our memory predictions are based on.
BM
Thanks for the explanation,
)
Thanks for the explanation, You're the best Bernd !
Bernd Machenschalk
)
Yay, time to install that spare stick of RAM!! Thanks for the explanation sir.