Strangely enough, it's actually worth quite a lot! :-). It's always helpful to have a brief idea of why things change!
Congratulations and sincere thanks for a very welcome speed increase. I think your 20% estimate is a little conservative since some people in this thread now report close to a doubling in speed or halving in crunch time. Perhaps it may vary with different operating systems or processor architectures. I'm seeing the same and already a 0.06 result of mine has validated against a 0.03.
I'm using two machines with well over 200 completed tasks between them (mainly 0.03) and no invalids so far that I've noticed. The current 0.06 tasks are project estimated at over 20 hours but are only taking 4.5 hours. The 0.03s were taking 8.5 hours. If this improvement is to remain, is it possible to refine the estimate, please?
I don't know where to put this but my RAC has been plummeting down lately and completed tasks waiting for validation are piling up in big numbers. I have now at least 12 pages of completed tasks waiting now and the oldest tasks are almost 1 month old? Is this normal or is there some ongoing problem with validation?
Tasks are mostly:
-Gamma-ray pulsar binary search #1 on GPUs v1.20
-Gravitational Wave Engineering run on LIGO O1 Open Data v0.04
A good way to work that out is to think about the purpose of the various forums and how that relates to exactly what is troubling you. In your case you probably had three choices:-
Technical News - A place where the staff start threads to make announcements and give ongoing information about things of a technical nature. Volunteers make comments directly related to the announcement.
Cruncher's Corner - A good place to discuss all sorts of performance observations and issues. Unless your comment is directly related to an ongoing discussion, it's best to start a new thread rather than take an existing discussion off topic in a new direction.
Problems & Bug Reports - A place for getting help with problems you are having or bugs you think may exist in the way your host is interacting with the project servers.
Your concern is about the number of tasks you have that are 'pending validation'. Having tasks in the pending category is a normal everyday fact of life. It's not news, it's not a problem, it's just the way things have always been. Pendings increase in two particular cases. Firstly, if you have a fast GPU, you can churn out lots of results before your partners can catch up. Secondly, for new searches where locality scheduling is being used to control the distribution of tasks. You can end up not having quorum partners in a timely manner. Locality scheduling is necessary to minimise the bandwidth needed to efficiently deploy the large numbers of large data files to volunteer computers.
You actually have (at the time I looked) 179 pendings - 9 pages - so if you had 12 pages earlier, things are definitely on the improve. There were 99 pendings for FGRPB1G and 34 for FGRP5. There were only 46 for the new O1OD1E engineering run. None of these seem to be particularly excessive. You have a fast, modern GPU so pendings for the FGRPB1G search are to be expected.
Just realise that the number of pendings is entirely due to factors beyond the control of the project. If you are unlucky enough to be partnered with lots of other hosts that don't return valid work promptly (but you do) you will end up with lots of pendings. Your only recourse is to write a stern letter to all your quorum partners, telling them to hurry up and get their fingers out :-).
A good way to work that out is to think about the purpose of the various forums and how that relates to exactly what is troubling you. In your case you probably had three choices:-
Technical News - A place where the staff start threads to make announcements and give ongoing information about things of a technical nature. Volunteers make comments directly related to the announcement.
Cruncher's Corner - A good place to discuss all sorts of performance observations and issues. Unless your comment is directly related to an ongoing discussion, it's best to start a new thread rather than take an existing discussion off topic in a new direction.
Problems & Bug Reports - A place for getting help with problems you are having or bugs you think may exist in the way your host is interacting with the project servers.
Your concern is about the number of tasks you have that are 'pending validation'. Having tasks in the pending category is a normal everyday fact of life. It's not news, it's not a problem, it's just the way things have always been. Pendings increase in two particular cases. Firstly, if you have a fast GPU, you can churn out lots of results before your partners can catch up. Secondly, for new searches where locality scheduling is being used to control the distribution of tasks. You can end up not having quorum partners in a timely manner. Locality scheduling is necessary to minimise the bandwidth needed to efficiently deploy the large numbers of large data files to volunteer computers.
You actually have (at the time I looked) 179 pendings - 9 pages - so if you had 12 pages earlier, things are definitely on the improve. There were 99 pendings for FGRPB1G and 34 for FGRP5. There were only 46 for the new O1OD1E engineering run. None of these seem to be particularly excessive. You have a fast, modern GPU so pendings for the FGRPB1G search are to be expected.
Just realise that the number of pendings is entirely due to factors beyond the control of the project. If you are unlucky enough to be partnered with lots of other hosts that don't return valid work promptly (but you do) you will end up with lots of pendings. Your only recourse is to write a stern letter to all your quorum partners, telling them to hurry up and get their fingers out :-).
Thank you, for your answer.
I'm not here to blame anyone, just wondered how the things have been lately. I switched from 1060GTX to 2070RTX roughly 2 months ago so that might explain something.
Sometimes i just don't get how RAC goes down that fast as my computer keeps crunching numbers all the time.
Yeah, i know about such utilities. But i don't use them as there is a built in "native" BOINC option to "tune" this:
Via option section of cc_config.xml
<process_priority>N</process_priority>, <process_priority_special>N</process_priority_special>
The OS process priority at which tasks are run. Values are 0 (lowest priority, the default), 1 (below normal), 2 (normal), 3 (above normal), 4 (high) and 5 (real-time - not recommended). 'special' process priority is used for coprocessor (GPU) applications, wrapper applications, and non-compute-intensive applications, 'process priority' for all others. The two options can be used independently.
But you don't get a point: there are many possibilities to control process priority from the user side if a particular user pay attention to it and know how to tune it. I.e. for some geeks only.
We spoke about default behavior for ALL users which can be set from the project side.
Windows will still move around the load and the GPU exe will still end up waiting. Even with AMD cards that use low CPU util for much of the task run time, GPU util will drop unless the GPU exe is set to its own free CPU thread.
mmonnin, yes, you are right - this is a very old known problem of almost all E@H GPU apps. While usually not a problem for other BOINC GPU projects.
Some of volunteers even pointed to root source of this problem a long ago. But for unknown reasons this issue is still here popping up again and again.
Quick remainder that cause this need of reserving of full CPU core and/or running multiple GPU tasks to avoid significant loss of performance of GPU computations.
It is not a low priority of E@H app (process), but low priority of main thread inside app process.
For example main thread of current app for Gamma-ray pulsar binary search #1 on GPUs v1.18 (FGRPopencl1K-ati) app is set to 1 (one) - very lowest possible value. Independent of the app process priority - even if i assign high priority (=13) to FGRP app - main thread will still remains at lowest priority (=1).
For other GPU app i saw it work quite different: threads inherit priority from process priority by default. Eg normal priority (=8) set to process - thread also get normal (8) priority. Process get high priority (=13) and it's thread gets high priority too.
There for example screenshots of current GPUs app from E@H and MW@H running on same Windows machine. Both apps set to run at normal priority, but that is happening with threads priority inside them:
This is a reason why FGRPB run at max speed only if there is no ANY competition for CPU resources. Even from other BOINC CPU apps running at lowest priority. E.g. then app has own whole CPU core to use.
This is actual for current GW GPU test app too: its main thread priority still is lowest possible (1) always regardless of the priority of the process. So ANY other thread/process can take resources from it and thus slow down GPU computations.
Sorry, this complaint didn't get through to me yet, or I have been too busy with other things to listen carefully enough. App Version 0.12 is in the pipeline, which should have this fixed. If so, I hope to get another FGRPB1G App version out this week.
Thanks BERND. I have got few GW WU of new 0.13 ver. Now it inherit thread priority from process priority as expected: https://yadi.sk/i/1jdbcRjgqebyCg
This should greatly increase performance for users who do not pay attention to things like reserving CPU core for GPU apps.
Meanwhile start testing 4 GW WU on one GPU in parallel on one of computer. Looks good so far:
- GPU load tripled atleast (from ~20% to 60-65%)
- GPU RAM consumption quadrupled as expected but not a problem - its only ~500 MB
- average runtimes and validation results are pending...
Also i have noticed what main load in the CPU part of computations for current GW app created by OpenCL library (amdocl64.dll in my case). Especially at start of each computation cycle (as captured on screenshot above) when OCL dll consume whole CPU core while GPU load ~0%. But other time amdocl64.dll still creates about 60-70% of total CPU load of app.
Is it expected behavior? And part of computation which does not ported to GPU code yet done via some calls of OCL dll? Or something went wrong and some functions which should run on GPU actually run on CPU in emulation mode?
I saw such errors few times on other projects - wrong call of OpenCL can lead to emulation instead of actual GPU computation.
We essentially stopped the "O1 Engineering run", i.e aren't generating new workunits anymore.
Instead we will continue the previously suspended "O2AS" run as a GW run. The current GPU App will not give much benefit in that setup, so "O2AS" will (for now) continue to be CPU-only.
If all goes as planned we will start with the "O1OD1 injection run" on GPUs.
Bernd Machenschalk wrote:FWIW
)
Strangely enough, it's actually worth quite a lot! :-). It's always helpful to have a brief idea of why things change!
Congratulations and sincere thanks for a very welcome speed increase. I think your 20% estimate is a little conservative since some people in this thread now report close to a doubling in speed or halving in crunch time. Perhaps it may vary with different operating systems or processor architectures. I'm seeing the same and already a 0.06 result of mine has validated against a 0.03.
I'm using two machines with well over 200 completed tasks between them (mainly 0.03) and no invalids so far that I've noticed. The current 0.06 tasks are project estimated at over 20 hours but are only taking 4.5 hours. The 0.03s were taking 8.5 hours. If this improvement is to remain, is it possible to refine the estimate, please?
Cheers,
Gary.
I don't know where to put
)
I don't know where to put this but my RAC has been plummeting down lately and completed tasks waiting for validation are piling up in big numbers. I have now at least 12 pages of completed tasks waiting now and the oldest tasks are almost 1 month old? Is this normal or is there some ongoing problem with validation?
Tasks are mostly:
-Gamma-ray pulsar binary search #1 on GPUs v1.20 -Gravitational Wave Engineering run on LIGO O1 Open Data v0.04
Eskomorko wrote:I don't know
)
A good way to work that out is to think about the purpose of the various forums and how that relates to exactly what is troubling you. In your case you probably had three choices:-
Your concern is about the number of tasks you have that are 'pending validation'. Having tasks in the pending category is a normal everyday fact of life. It's not news, it's not a problem, it's just the way things have always been. Pendings increase in two particular cases. Firstly, if you have a fast GPU, you can churn out lots of results before your partners can catch up. Secondly, for new searches where locality scheduling is being used to control the distribution of tasks. You can end up not having quorum partners in a timely manner. Locality scheduling is necessary to minimise the bandwidth needed to efficiently deploy the large numbers of large data files to volunteer computers.
You actually have (at the time I looked) 179 pendings - 9 pages - so if you had 12 pages earlier, things are definitely on the improve. There were 99 pendings for FGRPB1G and 34 for FGRP5. There were only 46 for the new O1OD1E engineering run. None of these seem to be particularly excessive. You have a fast, modern GPU so pendings for the FGRPB1G search are to be expected.
Just realise that the number of pendings is entirely due to factors beyond the control of the project. If you are unlucky enough to be partnered with lots of other hosts that don't return valid work promptly (but you do) you will end up with lots of pendings. Your only recourse is to write a stern letter to all your quorum partners, telling them to hurry up and get their fingers out :-).
Cheers,
Gary.
Gary Roberts wrote:Eskomorko
)
Thank you, for your answer.
I'm not here to blame anyone, just wondered how the things have been lately. I switched from 1060GTX to 2070RTX roughly 2 months ago so that might explain something.
Sometimes i just don't get how RAC goes down that fast as my computer keeps crunching numbers all the time.
Mad_Max wrote:Yeah, i know
)
Windows will still move around the load and the GPU exe will still end up waiting. Even with AMD cards that use low CPU util for much of the task run time, GPU util will drop unless the GPU exe is set to its own free CPU thread.
mmonnin, yes, you are right -
)
mmonnin, yes, you are right - this is a very old known problem of almost all E@H GPU apps. While usually not a problem for other BOINC GPU projects.
Some of volunteers even pointed to root source of this problem a long ago. But for unknown reasons this issue is still here popping up again and again.
Quick remainder that cause this need of reserving of full CPU core and/or running multiple GPU tasks to avoid significant loss of performance of GPU computations.
It is not a low priority of E@H app (process), but low priority of main thread inside app process.
For example main thread of current app for Gamma-ray pulsar binary search #1 on GPUs v1.18 (FGRPopencl1K-ati) app is set to 1 (one) - very lowest possible value. Independent of the app process priority - even if i assign high priority (=13) to FGRP app - main thread will still remains at lowest priority (=1).
For other GPU app i saw it work quite different: threads inherit priority from process priority by default. Eg normal priority (=8) set to process - thread also get normal (8) priority. Process get high priority (=13) and it's thread gets high priority too.
There for example screenshots of current GPUs app from E@H and MW@H running on same Windows machine. Both apps set to run at normal priority, but that is happening with threads priority inside them:
FGRPB - https://yadi.sk/i/_DwVUckqo0iUhA
MW - https://yadi.sk/i/tcQ5n3RuMU_WXA
This is a reason why FGRPB run at max speed only if there is no ANY competition for CPU resources. Even from other BOINC CPU apps running at lowest priority. E.g. then app has own whole CPU core to use.
This is actual for current GW GPU test app too: its main thread priority still is lowest possible (1) always regardless of the priority of the process. So ANY other thread/process can take resources from it and thus slow down GPU computations.
Sorry, this complaint didn't
)
Sorry, this complaint didn't get through to me yet, or I have been too busy with other things to listen carefully enough. App Version 0.12 is in the pipeline, which should have this fixed. If so, I hope to get another FGRPB1G App version out this week.
BM
Hm, 012. didn't work. 0.13 is
)
Hm, 012. didn't work. 0.13 is out.
BM
Thanks BERND. I have got few
)
Thanks BERND. I have got few GW WU of new 0.13 ver. Now it inherit thread priority from process priority as expected: https://yadi.sk/i/1jdbcRjgqebyCg
This should greatly increase performance for users who do not pay attention to things like reserving CPU core for GPU apps.
Meanwhile start testing 4 GW WU on one GPU in parallel on one of computer. Looks good so far:
- GPU load tripled atleast (from ~20% to 60-65%)
- GPU RAM consumption quadrupled as expected but not a problem - its only ~500 MB
- average runtimes and validation results are pending...
Also i have noticed what main load in the CPU part of computations for current GW app created by OpenCL library (amdocl64.dll in my case). Especially at start of each computation cycle (as captured on screenshot above) when OCL dll consume whole CPU core while GPU load ~0%. But other time amdocl64.dll still creates about 60-70% of total CPU load of app.
Is it expected behavior? And part of computation which does not ported to GPU code yet done via some calls of OCL dll? Or something went wrong and some functions which should run on GPU actually run on CPU in emulation mode?
I saw such errors few times on other projects - wrong call of OpenCL can lead to emulation instead of actual GPU computation.
We essentially stopped the
)
We essentially stopped the "O1 Engineering run", i.e aren't generating new workunits anymore.
Instead we will continue the previously suspended "O2AS" run as a GW run. The current GPU App will not give much benefit in that setup, so "O2AS" will (for now) continue to be CPU-only.
If all goes as planned we will start with the "O1OD1 injection run" on GPUs.
BM