- There is a new "Beta Test" app version 1.18 with a few improvements Christophe developed over the holidays. It should be significantly faster than 1.17, in particular on GPUs that support double precision.
If this speedup holds for all FGRP-GPU tasks then I suspect that some adjustments to the estimates needs to be done again or my DCF will start swinging more wildly. My DCF right now is at 0.84 and the FGRP-GPU tasks, both 1.17 and 1.18, are estimated at about 54 minutes.
Even before it hit, the spread in the DCF targets from CPU and GPU tasks has meant that trying to maintain a several day queue on my Haswell + GTX980 and GTX1080 hosts has occasionally meant needing to abort blocks of >100 CPU tasks that were downloaded when a large batch of GPU ones smashed the DCF down crazy low and massively over allocated my CPU.
After running a few of these 1.18 tasks on my GTX660 it seems times are down to 2.5 hrs from a bit over 4hrs running 2 at a time. The only thing that would make me happier would be a cuda 50 app so I could get my 2 cpu cores back.
FP64 code consumes more power, so if you see the GPU clock falling, you might need to raise the GPU power limit.
Also undervolting might help in lots of cases, but depends on particular GPU design.
Even before it hit, the spread in the DCF targets from CPU and GPU tasks has meant that trying to maintain a several day queue on my Haswell + GTX980 and GTX1080 hosts has occasionally meant needing to abort blocks of >100 CPU tasks that were downloaded when a large batch of GPU ones smashed the DCF down crazy low and massively over allocated my CPU.
It may not just be the DCF mismatch. Here is an example of what can happen. It's not meant to represent any of your hosts because I want readers to understand the general principle and then work out for themselves if any of their hosts are affected.
Imagine a quad core host with a GPU running 2 concurrent tasks. Each of these will require a CPU for support duties, leaving just 2 cores available for crunching CPU tasks. If BOINC is set to use 100% of the CPU cores, it will fetch work for all 4 cores even if only 2 are being used for crunching. So the problem is really compounded if you keep a multi-day cache.
As a simple example, let's say the cache setting is 4 days. Let's also say that the fast GPU tasks have lowered the DCF to the point that CPU tasks are estimated at half the time they will actually take. So, because of just the DCF behaviour, the 4 day cache has become 8 days of real work. That's bad enough but shouldn't actually cause you to have to abort CPU tasks. However on top of this, only 2 cores are doing the crunching so the tasks will actually take 16 days to crunch.
Can this problem be avoided? Yes it can by making sure BOINC only fetches work for 2 cores rather than 4. You need an app_config.xml to achieve this. You need to do two things :-
set BOINC prefs to use the apprpriate number of cores - 50% in this example.
Create an app_config.xml with <gpu_usage> of 0.5 and <cpu_usage> of <0.5 say 0.4. This overrides the default value of 1 for <cpu_usage>. The key is to make sure this value times the number of GPU tasks being crunched doesn't equal or exceed a full CPU core.
With these settings this example host would still be crunching 2 GPU tasks and 2 CPU tasks exactly as before. The only difference would be that the CPU work fetch would be the proper amount rather than double the proper amount. It would still take 8 days to crunch the tasks rather than 4 but the work queue wouldn't grow any larger. Obviously the DCF mismatch needs to be addressed as well but at least the problem is minimised whilst waiting for that.
After running a few of these 1.18 tasks on my GTX660 it seems times are down to 2.5 hrs from a bit over 4hrs running 2 at a time. The only thing that would make me happier would be a cuda 50 app so I could get my 2 cpu cores back.
(I agree about the cuda 50 app as you know)
Ok I decided to test the new v1.18 and just finished 8 tasks
This is on a SC'd 660Ti on a not so new Athlon II X4 630 with 12GB ram....and Windows 10
It took an average of 2hrs 2min's X2
Watching them run it had the GPU at 100% and the CPU at 100% (which made it slow just trying to do anything else)
The one thing I am wondering since if it was mentioned I didn't see it but what is the difference between the tasks that are both running Gamma-ray pulsar binary search #1 on GPUs v1.18 (FGRPopencl-Beta-nvidia) BUT even running at basically the same amount of time the granted credits are different ( 3,465.00 and 1,365.00)
The one with 3,465 credits I am looking at right now did actually run about 1000 seconds faster.
How do you know what the difference is between these tasks other than the credits?
I woke up for some reason (pc's talking to me in my sleep) at 2am
Good thing since my 8-core did a Windows 10 Update and rebooted and it takes my PIN to start it back up again (I have to get rid of that so it will just start back up on its own)
So I go and take a look at the 660Ti and see if it is close to finishing one and I see one just finished and started a new task and one was almost to 2hrs and when it hit 89.xxx% it just froze up and I thought GREAT I check one and now it will crash on me......well I checked everything as you may imagine as it is sitting there and as I bring up the task manager just to see that it went to 100% and started a new one........I checked what it said in the stats and that was one of the tasks that got 1,365 credits.....no idea if that happens all the time or not.
Ok its almost 3:30 am so I will see if the Science Channel will put me back to sleep and I will see how the 660Ti is doing later today
With these settings this example host would still be crunching 2 GPU tasks and 2 CPU tasks exactly as before. The only difference would be that the CPU work fetch would be the proper amount rather than double the proper amount.
Very clear example, thanks.
Additionally, limiting the number of CPU cores used will avoid virtualbox multicore tasks to take all 4 cores, even if the gpu requests one of them, which is very limiting for the gpu.
........I checked what it said in the stats and that was one of the tasks that got 1,365 credits.....no idea if that happens all the time or not.
I haven't checked but I'll bet the task that got 1,365 credits was a resend task for a failed task that was in a quorum when the award was 1,365. You can easily spot a resend task. It will have an extension in the task name higher than _1. The two primary copies have _0 and _1 extensions. The credit is locked in at the time the original WU was generated so if you get a resend for one of those it will be carrying the former credit value.
Also, all tasks get to 89.997% and then stop progressing continuously. Crunching is over and the follow-up stage (sorting out the top candidates, I think) is happening. When that is finished, progress will suddenly jump to 100% and the results will be uploaded. This is not anything to worry about.
I haven't checked but I'll bet the task that got 1,365 credits was a resend task for a failed task that was in a quorum when the award was 1,365. You can easily spot a resend task. It will have an extension in the task name higher than _1. The two primary copies have _0 and _1 extensions. The credit is locked in at the time the original WU was generated so if you get a resend for one of those it will be carrying the former credit value.
Also, all tasks get to 89.997% and then stop progressing continuously. Crunching is over and the follow-up stage (sorting out the top candidates, I think) is happening. When that is finished, progress will suddenly jump to 100% and the results will be uploaded. This is not anything to worry about.
Thanks Gary,
You are right about it doing that at 89.997%
I also checked that code and the 3,465 credit tasks had 0,1,and 2 and the ones with 1,365 had either 2 or 3
So of the 15 tasks so far 8 are the 3,465 version and 7 are the 1,365 version
I loaded those the first time I saw we switched to v1.18
Next I am going to try my 560Ti OC but it is with my older 3-core Phenom- Win 10 - 8GB ram
If I make the drive to the post office in my rain storm I have four sticks of 8GB waiting for me!!
I still hope we can get back to the GPU only so I can run all 7 of mine since this is year 13 here and I also run those VB tasks for Cern and I started there 3 months before that......but no other projects and they work great together with the pure GPU only tasks.
... So of the 15 tasks so far 8 are the 3,465 version and 7 are the 1,365 version
I loaded those the first time I saw we switched to v1.18
If you have a resend of a task originally generated with 1,365 as the award (these will be tasks whose name starts with LATeah0010L...) then that's what you'll get. Tasks whose name starts with LATeah0009L... are the latest ones so resends of those will most likely be getting 3,465. I don't remember if any 0009L tasks were generated before the change to 3,465 but maybe there were.
MAGIC Quantum Mechanic wrote:
Next I am going to try my 560Ti OC but it is with my older 3-core Phenom- Win 10 - 8GB ram
The speed of the CPU and having huge amounts of RAM don't seem to be all that critical. I get pretty much the same GPU task elapsed times for a given GPU even with 2009 vintage core 2 quads with only 4GB RAM. A lot of my more modern hosts only have 8GB. However make sure you use at least dual channel mode - 2 matched sticks. Maybe you need more RAM with Win10 :-).
I have a dual core host (2x2GB RAM) with a 1GB 550Ti and I started it up again just before the switch to 1.18. It's running 1 GPU task and 1 CPU task (default settings). With 1.17 the GPU elapsed times were around 9,700 secs. With 1.18 this has dropped to around 6,200 secs. Your 560Ti should do OK as is - 2x4GB RAM I presume? :-).
MAGIC Quantum Mechanic wrote:
I still hope we can get back to the GPU only ...
You can do GPU only right now, can't you? Just disable the CPU version of the search in your project prefs. Sure, you need a core per GPU task for support but the remaining cores on each host would be able to be used elsewhere.
MAGIC Quantum Mechanic wrote:
Thanks again Gary and hope all is well.
You're welcome, thank you, and all is very well at the moment. I'm even setting up a couple of machines with dual GPUs for the first time and discovering bugs in the automated procedure my distro uses for multi-card configuration. That procedure looks like it's handling the driver selection properly but only the first card gets the selected proprietary driver and the second card is left with a different (open source) driver which causes the X server to crash and the whole machine to lockup during startup with a black screen and the reset button being the only viable option.
Luckily I have a basic understanding of xorg.conf configuration so I quickly noticed the wrong driver (and other stuff not quite right as well). I've manually adjusted the config file and I now have a dual 2GB HD7850 setup that's running very nicely. It's a quad core host with 2x4GB RAM running 4 concurrent GPU tasks and 2 x O1MD tasks on CPU cores.
I'm quite pleased at how close the crunch times are for each GPU seeing as the second PCI-e slot is only x4. I was expecting at least some sort of a slowdown but at most it seems like around 25 seconds only. The x16 slot card takes around 2275 secs for a pair and the x4 slot card around 2300 secs (rough figures only but seems consistent). I'll check properly when I get some time :-).
To fill in all my spare time at the moment, I think I'll go buy another dual slot board and a couple more AMD cards. Might be time to try out the new stuff that needs AMDGPU-PRO drivers rather than fglrx. Should be fun getting that to work on a non- 'buntu style of Linux :-). I can always bother AgentB who has a RX 480 running on Ubuntu if I really get stuck. I imagine he's the local expert on AMDGPU-PRO stuff.
feeling courageous again after my machines settled back down to 1.17s for a couple of days, and wanted to try these 1.19s again. So I switched the Beta back on in prefs but I keep getting only 1.17s now. Has 1.19 been pulled again?
Holmis wrote:Bernd
)
Even before it hit, the spread in the DCF targets from CPU and GPU tasks has meant that trying to maintain a several day queue on my Haswell + GTX980 and GTX1080 hosts has occasionally meant needing to abort blocks of >100 CPU tasks that were downloaded when a large batch of GPU ones smashed the DCF down crazy low and massively over allocated my CPU.
After running a few of these
)
After running a few of these 1.18 tasks on my GTX660 it seems times are down to 2.5 hrs from a bit over 4hrs running 2 at a time. The only thing that would make me happier would be a cuda 50 app so I could get my 2 cpu cores back.
FP64 code consumes more
)
FP64 code consumes more power, so if you see the GPU clock falling, you might need to raise the GPU power limit.
Also undervolting might help in lots of cases, but depends on particular GPU design.
-----
DanNeely wrote:Even before it
)
It may not just be the DCF mismatch. Here is an example of what can happen. It's not meant to represent any of your hosts because I want readers to understand the general principle and then work out for themselves if any of their hosts are affected.
Imagine a quad core host with a GPU running 2 concurrent tasks. Each of these will require a CPU for support duties, leaving just 2 cores available for crunching CPU tasks. If BOINC is set to use 100% of the CPU cores, it will fetch work for all 4 cores even if only 2 are being used for crunching. So the problem is really compounded if you keep a multi-day cache.
As a simple example, let's say the cache setting is 4 days. Let's also say that the fast GPU tasks have lowered the DCF to the point that CPU tasks are estimated at half the time they will actually take. So, because of just the DCF behaviour, the 4 day cache has become 8 days of real work. That's bad enough but shouldn't actually cause you to have to abort CPU tasks. However on top of this, only 2 cores are doing the crunching so the tasks will actually take 16 days to crunch.
Can this problem be avoided? Yes it can by making sure BOINC only fetches work for 2 cores rather than 4. You need an app_config.xml to achieve this. You need to do two things :-
With these settings this example host would still be crunching 2 GPU tasks and 2 CPU tasks exactly as before. The only difference would be that the CPU work fetch would be the proper amount rather than double the proper amount. It would still take 8 days to crunch the tasks rather than 4 but the work queue wouldn't grow any larger. Obviously the DCF mismatch needs to be addressed as well but at least the problem is minimised whilst waiting for that.
Cheers,
Gary.
Betreger wrote:After running
)
(I agree about the cuda 50 app as you know)
Ok I decided to test the new v1.18 and just finished 8 tasks
This is on a SC'd 660Ti on a not so new Athlon II X4 630 with 12GB ram....and Windows 10
It took an average of 2hrs 2min's X2
Watching them run it had the GPU at 100% and the CPU at 100% (which made it slow just trying to do anything else)
The one thing I am wondering since if it was mentioned I didn't see it but what is the difference between the tasks that are both running Gamma-ray pulsar binary search #1 on GPUs v1.18 (FGRPopencl-Beta-nvidia) BUT even running at basically the same amount of time the granted credits are different ( 3,465.00 and 1,365.00)
The one with 3,465 credits I am looking at right now did actually run about 1000 seconds faster.
How do you know what the difference is between these tasks other than the credits?
https://einsteinathome.org/task/602928802
https://einsteinathome.org/task/602928116
I woke up for some reason (pc's talking to me in my sleep) at 2am
Good thing since my 8-core did a Windows 10 Update and rebooted and it takes my PIN to start it back up again (I have to get rid of that so it will just start back up on its own)
So I go and take a look at the 660Ti and see if it is close to finishing one and I see one just finished and started a new task and one was almost to 2hrs and when it hit 89.xxx% it just froze up and I thought GREAT I check one and now it will crash on me......well I checked everything as you may imagine as it is sitting there and as I bring up the task manager just to see that it went to 100% and started a new one........I checked what it said in the stats and that was one of the tasks that got 1,365 credits.....no idea if that happens all the time or not.
Ok its almost 3:30 am so I will see if the Science Channel will put me back to sleep and I will see how the 660Ti is doing later today
Gary Roberts wrote: With
)
Very clear example, thanks.
Additionally, limiting the number of CPU cores used will avoid virtualbox multicore tasks to take all 4 cores, even if the gpu requests one of them, which is very limiting for the gpu.
MAGIC Quantum Mechanic
)
I haven't checked but I'll bet the task that got 1,365 credits was a resend task for a failed task that was in a quorum when the award was 1,365. You can easily spot a resend task. It will have an extension in the task name higher than _1. The two primary copies have _0 and _1 extensions. The credit is locked in at the time the original WU was generated so if you get a resend for one of those it will be carrying the former credit value.
Also, all tasks get to 89.997% and then stop progressing continuously. Crunching is over and the follow-up stage (sorting out the top candidates, I think) is happening. When that is finished, progress will suddenly jump to 100% and the results will be uploaded. This is not anything to worry about.
Cheers,
Gary.
Gary Roberts wrote: I haven't
)
Thanks Gary,
You are right about it doing that at 89.997%
I also checked that code and the 3,465 credit tasks had 0,1,and 2 and the ones with 1,365 had either 2 or 3
So of the 15 tasks so far 8 are the 3,465 version and 7 are the 1,365 version
I loaded those the first time I saw we switched to v1.18
Next I am going to try my 560Ti OC but it is with my older 3-core Phenom- Win 10 - 8GB ram
If I make the drive to the post office in my rain storm I have four sticks of 8GB waiting for me!!
I still hope we can get back to the GPU only so I can run all 7 of mine since this is year 13 here and I also run those VB tasks for Cern and I started there 3 months before that......but no other projects and they work great together with the pure GPU only tasks.
Thanks again Gary and hope all is well.
-Samson
MAGIC Quantum Mechanic
)
If you have a resend of a task originally generated with 1,365 as the award (these will be tasks whose name starts with LATeah0010L...) then that's what you'll get. Tasks whose name starts with LATeah0009L... are the latest ones so resends of those will most likely be getting 3,465. I don't remember if any 0009L tasks were generated before the change to 3,465 but maybe there were.
The speed of the CPU and having huge amounts of RAM don't seem to be all that critical. I get pretty much the same GPU task elapsed times for a given GPU even with 2009 vintage core 2 quads with only 4GB RAM. A lot of my more modern hosts only have 8GB. However make sure you use at least dual channel mode - 2 matched sticks. Maybe you need more RAM with Win10 :-).
I have a dual core host (2x2GB RAM) with a 1GB 550Ti and I started it up again just before the switch to 1.18. It's running 1 GPU task and 1 CPU task (default settings). With 1.17 the GPU elapsed times were around 9,700 secs. With 1.18 this has dropped to around 6,200 secs. Your 560Ti should do OK as is - 2x4GB RAM I presume? :-).
You can do GPU only right now, can't you? Just disable the CPU version of the search in your project prefs. Sure, you need a core per GPU task for support but the remaining cores on each host would be able to be used elsewhere.
You're welcome, thank you, and all is very well at the moment. I'm even setting up a couple of machines with dual GPUs for the first time and discovering bugs in the automated procedure my distro uses for multi-card configuration. That procedure looks like it's handling the driver selection properly but only the first card gets the selected proprietary driver and the second card is left with a different (open source) driver which causes the X server to crash and the whole machine to lockup during startup with a black screen and the reset button being the only viable option.
Luckily I have a basic understanding of xorg.conf configuration so I quickly noticed the wrong driver (and other stuff not quite right as well). I've manually adjusted the config file and I now have a dual 2GB HD7850 setup that's running very nicely. It's a quad core host with 2x4GB RAM running 4 concurrent GPU tasks and 2 x O1MD tasks on CPU cores.
I'm quite pleased at how close the crunch times are for each GPU seeing as the second PCI-e slot is only x4. I was expecting at least some sort of a slowdown but at most it seems like around 25 seconds only. The x16 slot card takes around 2275 secs for a pair and the x4 slot card around 2300 secs (rough figures only but seems consistent). I'll check properly when I get some time :-).
To fill in all my spare time at the moment, I think I'll go buy another dual slot board and a couple more AMD cards. Might be time to try out the new stuff that needs AMDGPU-PRO drivers rather than fglrx. Should be fun getting that to work on a non- 'buntu style of Linux :-). I can always bother AgentB who has a RX 480 running on Ubuntu if I really get stuck. I imagine he's the local expert on AMDGPU-PRO stuff.
Cheers,
Gary.
Bernd Machenschalk wrote:1.19
)
Hi Bernd,
feeling courageous again after my machines settled back down to 1.17s for a couple of days, and wanted to try these 1.19s again. So I switched the Beta back on in prefs but I keep getting only 1.17s now. Has 1.19 been pulled again?
Thanks for your support!
Kai.