BTW, I am getting much better utilization of my GTX 1060, from ~15% to ~86% by running 4 instances at once. That's not to say 4 is the sweet spot though, that's just a guess for now and the number will of course vary by GPU and possibly even more by platform/CPU, since these seem highly dependent on CPU.
I see the same. Running only 1 task at a time isn't enought to max out GPU core clock. It runs at 1569 MHz (with this MSI GTX 1060 card). Mem clock runs at max speed already with 1 tasks though. Running 2 or more tasks will push also core clock to max (1936 MHz).
Also GTX 960 runs at least 4 tasks nicely. Completion times somewhere around 3 hours and 86% GPU load ...vs. 13% load if only one task running and much more variation on completion times then. I guess 5 tasks at a time could be optimal with these cards at this point, but I haven't tried that.
I guess 5 tasks at a time could be optimal with these cards at this point...
No, it isn't.
Nvidia GTX 960 (Asus Strix 2GB) , driver 430.00
Xeon X56.. @ 4GHz , Windows 10 (17763)
GW GPU tasks v0.11 only, no CPU tasks
1 CPU core per 1 GPU task
4x
run time about 11900 s = 2975 s / task
GPU-Z:
avg power consumption 52.0 W
avg GPU load 80 %
avg mem controller load 18 %
mem used 733 MB
5x
run time about 14350 s = 2870 s / task
GPU-Z:
avg power consumption 52.2 W
avg GPU load 81 %
avg mem controller load 18 %
mem used 886 MB
It looks like going from 4x to 5x doesn't change the overall GPU load or GPU wattage at all. The additional gain in run time is very small when put in perspective with the total run time. I think the fifth CPU core is wasted this way. It would be wiser to put it into use with something else.
* I learned that if you want to run 5 tasks at a time "0.20" doesn't necessarily work well for the 'GPU usage' in app_config.xml .
I had troubles starting the fifth task. 6 cores were available for Boinc to use for whatever. 4 tasks were running. I changed the GPU usage value in app_config.xml from 0.25 to 0.20 (tried also 0.2) and clicked 'read config files'. Nothing happened. I triple checked everything but don't know what prevented the fifth task from starting.
Then I changed that value to 0.19 and clicked 'read config files'. Fifth task started right away.
I changed that value back to 0.20 and clicked 'read config files'. All five tasks kept running.
They were running for about 40 minutes. Then I noticed one task was "waiting to run".
I changed that value in app_config.xml again to 0.19 and clicked 'read config files'. Fifth task continued running.
?? There was nothing else running. Only these GW GPU tasks. Everything else was suspended and not in RAM in any way. Boinc seems to think occasionally 5 x 0.20 is more than 1.
AMD R9 390 (MSI Gaming 8G), driver 19.4.1
Xeon X56.. @ 4GHz , Windows 10 (18362) 4x
run time about 6660 s = 1665 s / task
Sapphire Trixx:
power consumption about 65 W (2x FGRPB1G is about 110 W with same speed settings... board power limit -20 %)
GPU load avg 61 %
GPU temp... low
AMD RX 580 (MSI Gaming X+ 8G) , driver 19.4.1
Xeon X56.. @ 4GHz , Windows 10 (17763) 4x
run time about 7270 s = 1818 s / task
Sapphire Trixx:
power consumption about 65 W (2x FGRPB1G is about 100 W with same speed settings... board power limit -20 %)
GPU load avg 51 %
GPU temp... low
AMD RX 580 (MSI Gaming X 8G) , driver 19.4.1
Xeon X56.. @ 4GHz , Windows 10 (18865) 4x
run time about 6450 s = 1613 s / task
Sapphire Trixx:
power consumption about 65 W (2x FGRPB1G is about 100 W with same speed settings... board power limit -20 %)
GPU load avg 78 % 61 %
GPU temp... low
( I don't really know what makes that GPU load difference between 580's. Platforms are almost identical and those hosts are running these tasks only. Those two cards should be almost identical brothers, but I've noticed they seem to behave somewhat differently (in a way that should be the other way around ).
EDIT : The load difference between those cards isn't that much. It's 51 vs 61 (not 78). I must have looked at wrong value earlier.
One conclusion at this point (app v0.11) : This medium speed Nvidia is sort of faster than it should be. Those AMD cards are 3-4 x as fast as GTX 960 when crunching FGRPB1G. Now they aren't even 2x as fast. Interesting to see how things will be in the end.
AMD R9 270X (Gigabyte 4GB)
Xeon X56.. @ 4GHz , Windows 10 (18865) 4x
run time about 7320 s = 1830 s / task
Sapphire Trixx:
GPU load avg 69 %
GPU temp... low
This was interesting to me. With FGRPB1G this card is only half as fast as 390 / 580.
Basically now this card is able to run these GW tasks as fast as those two cards (running 4x and with identical CPU support).
I don't know what is that one doing, but she has a problem with these tasks.
She's the only one with Windows 7, other similar sisters run Windows 10. She's the only one having problems now.
GPU driver is 19.4.1 , same as on the others (just different OS variant).
I tried to run these (4x, 3x) and tasks seems to start running. But...
For the first 4 minutes I see the GPU core clock is jumping between 300-max and mem clock 150-max. Pure sawtooth graph for both. GPU usage is 0.
Then after 4 minutes clock speeds stabilize to max, but... GPU usage jumps straight to 100 % and stays there. Rarely a few notches downwards to 98 or so. At this point the progress is developing well and with good speed.
After 8 minutes the progress resets to 0 and starts from the beginning.
That is normal, I think. I see that happening on the other hosts too during the first few minutes.
BUT... from this on, this host is probably doing just pseudo progress.
CPU is doing something, I see it from the temp, even though temp is still low. GPU usage is still at 100 % and GPU is doing something, but temp is low..
The problem is, progress at this point is going further only '0.004' per second. That means of course a task would take something like 10 hours to complete. That whole thing doesn't look normal.
This host is able to run FGRPB1G tasks quite well, but not these GW GPU tasks, it seems.
AMD R9 390 (MSI Gaming 8G), driver 19.4.1
Xeon X56.. @ 4GHz , Windows 10 (18362) 6x
run time about 8760 s = 1460 s / task (4x was 1665)
Sapphire Trixx:
power consumption about 65 W ... barely any difference to 4x
GPU load avg 65 % (4x was 61 %)
GPU temp... low
Run time per task shortened about 200 s ... which is 12 % quicker, I believe.
I'm pretty sure similar result would apply also with RX 570/580 's.
I'm going to try 8x if this machine will be stable with that.
Richie,
Thank you very much for going to the trouble of documenting what you are seeing in such detail. I certainly appreciate the effort and I'm sure others do to.
I hope others will be trying a range of different GPUs to probe the limits for concurrent tasks. I'll be doing the same once there is a Linux app.
Once again, thanks very much for providing such useful information.
I don't know what is that one doing, but she has a problem with these tasks.
She's the only one with Windows 7, other similar sisters run Windows 10. She's the only one having problems now.
GPU driver is 19.4.1 , same as on the others (just different OS variant).
I tried to run these (4x, 3x) and tasks seems to start running. But...
For the first 4 minutes I see the GPU core clock is jumping between 300-max and mem clock 150-max. Pure sawtooth graph for both. GPU usage is 0.
Then after 4 minutes clock speeds stabilize to max, but... GPU usage jumps straight to 100 % and stays there. Rarely a few notches downwards to 98 or so. At this point the progress is developing well and with good speed.
After 8 minutes the progress resets to 0 and starts from the beginning.
That is normal, I think. I see that happening on the other hosts too during the first few minutes.
BUT... from this on, this host is probably doing just pseudo progress.
CPU is doing something, I see it from the temp, even though temp is still low. GPU usage is still at 100 % and GPU is doing something, but temp is low..
The problem is, progress at this point is going further only '0.004' per second. That means of course a task would take something like 10 hours to complete. That whole thing doesn't look normal.
This host is able to run FGRPB1G tasks quite well, but not these GW GPU tasks, it seems.
Cut back to 1x or 2x. If run times explode and more than tripple going from 1x to 3x then its what others have seen here with parallel tasks. I can only run 2x on my 580 at E@H. 3x will end up taking forever. some other projects I can only run a single task on it.
Gary, thank you very much for your encouraging words. My anxious tests are again far from scientific precision… but I’ll be happy if I succeed to provide some basic picture where these cards are at the large performance ballpark.
I think these different hosts can’t be compared very strictly, because they seem to have some performance variation even if the platforms are almost identical. I see my other RX 580 has been currently even somewhat slower than RX 570. Both have been running 4x. I don’t know why that is. It’s weird. I suspect that a much larger system RAM and different memory settings on the 580 host could cause it to be slower.
Other reason could be that I have overclocked memory on both cards. They have seemingly the same mem speed, but maybe GPU mem and GPU clock values aren’t syncing as well on the 580 then. I’ll have to look into that more… and I could start with stock settings of course.
But currently I wish think also that if I’ll manage to provide enough examples then a host could be compared at least with itself… in different situations. I wish to provide at least overall picture… how this hardware runs these tasks, approximately, about.
Quote:
I'm going to try 8x if this machine will be stable with that.
Yep, that didn’t work at all. Tasks run for hours, but then they started crashing one by one. Speed per task was extremely slow anyway (total run time was going to be something like 9 hours).
I tried also 7x, but it lead to tasks crashing too.
Like Zalster did mention about the full CPU core per task… I believe that lack of real physical cores was in effect on my system (6c / 12t). Maybe this system had to start dividing CPU resources so much that it was a serious bottleneck and then at some point meant additional problems. Even if that was the case, threads might still work well on more modern platforms or with less concurrent tasks.
After tasks started crashing I tried to change back to 6x on the fly, but even that wasn’t stable anymore. I think something was too messed up at that point. A small background info… I had tried to update Windows to new version yesterday (after running successfully a few tasks 6x). Windows failed to complete that update (unknown reason) and reverted back to were it had been. I’m not sure if everything is running well right now. I’ll keep an eye how this machine will respond to different stress levels.
I decided to start from a scratch with this 390 (MSI) and run 1x. I reset GPU settings and lowered the GPU clock from default 1040 MHz to 960 MHz (which I had used for a long time when running 2x FGRPB1G). No any kind of power limiting this time.
Well, I found out running these tasks 1x is not an option with this particular host. Again an exception that probably can’t be generalized to other hosts. Task started running just fine, but after about 20 minutes the card got bored and dropped GPU clock from about 960 to 300-400. Siesta time… and that meant progress was ticking about ‘0.004’ per second and task would run for hours. GPU load was about 30% though. I watched that half-sleeping thing for some time, but it wasn’t meaningful to continue.
Maybe one tasks doesn’t provide enough load for this particular host. I don’t know.
That is confusing also because…
mmonnin wrote:
Cut back to 1x or 2x. If run times explode and more than tripple going from 1x to 3x then its what others have seen here with parallel tasks. I can only run 2x on my 580 at E@H. 3x will end up taking forever. some other projects I can only run a single task on it.
Thanks Mmonnin! I went back to basics on that host and it actually seems to be able to crunch these GW GPU tasks at 1x. Two tasks completed, avg 4673 s and unlike the other host with same GPU type, this one doesn’t seem to have that problem of GPU falling half asleep while crunching 1x.
EDIT: Early observations of 2x: Doesn't look right. Too high GPU usage, too slow progress.
It seems the other host can't run 1x and this one can't run more than 1x. I guess that's one kind of a balance.
I’m happy to see at this point how the completed “4x” tasks haven’t started to accumulate to invalids. 5 so far, but that looks quite alright. Final app version might change this strange situation completely. Then cards will run hotter and… heat death is soon behind my door again, even without that
A quick note about total wattage. I don't have a Kill A Watt type of tool, but three of my hosts get their AC from a power conditioner. It shows amperes.
I don't see individual readings but I know how much current these hosts require in total while crunching nothing else than 2x FGRPB1G per host (1 CPU + 0.5 GPU per task). 1 Nvidia and 2 AMD GPUs involved.
The same triplet crunching these GW GPU v0.11 tasks 4x per host (1 CPU + 0.25 GPU per task) requires pretty much identical amount of juice. I would say the difference in wattage from the wall is max 50W total. That can't be much per host. If there's a small difference between total load then I would say this GW task scenario takes the lower end on that short scale.
My i5-4690 PC is consuming
)
My i5-4690 PC is consuming 78W when all 4 cores are running GW engineering runs.
But the CPU is undervolted by 0.1V and no GPU.
Stef wrote:My i5-4690 PC is
)
Preliminary test for mine as follows. (I need a gpu onboard so I can see it's up and running)
i7 6950K @4.0 GHz vcore 1.2792 for a single GWE on CPU runs 183-184 Watts per hour
for 8 GWE on 8 cores it runs at 258-260 Watts per hour
Just FYI 8 cores with 8 work unit with 3 1080Tis (with 1 per card) running sits at 899 Watts per hour.
For the GWE on GPU I won't get to test until Thursday as that MS machine is at different location.
crashtech wrote:BTW, I am
)
I see the same. Running only 1 task at a time isn't enought to max out GPU core clock. It runs at 1569 MHz (with this MSI GTX 1060 card). Mem clock runs at max speed already with 1 tasks though. Running 2 or more tasks will push also core clock to max (1936 MHz).
Also GTX 960 runs at least 4 tasks nicely. Completion times somewhere around 3 hours and 86% GPU load ...vs. 13% load if only one task running and much more variation on completion times then. I guess 5 tasks at a time could be optimal with these cards at this point, but I haven't tried that.
Richie wrote:I guess 5 tasks
)
No, it isn't.
Nvidia GTX 960 (Asus Strix 2GB) , driver 430.00
Xeon X56.. @ 4GHz , Windows 10 (17763)
GW GPU tasks v0.11 only, no CPU tasks
1 CPU core per 1 GPU task
4x run time about 11900 s = 2975 s / task GPU-Z: avg power consumption 52.0 W avg GPU load 80 % avg mem controller load 18 % mem used 733 MB
5x run time about 14350 s = 2870 s / task GPU-Z: avg power consumption 52.2 W avg GPU load 81 % avg mem controller load 18 % mem used 886 MB
It looks like going from 4x to 5x doesn't change the overall GPU load or GPU wattage at all. The additional gain in run time is very small when put in perspective with the total run time. I think the fifth CPU core is wasted this way. It would be wiser to put it into use with something else.
* I learned that if you want to run 5 tasks at a time "0.20" doesn't necessarily work well for the 'GPU usage' in app_config.xml .
I had troubles starting the fifth task. 6 cores were available for Boinc to use for whatever. 4 tasks were running. I changed the GPU usage value in app_config.xml from 0.25 to 0.20 (tried also 0.2) and clicked 'read config files'. Nothing happened. I triple checked everything but don't know what prevented the fifth task from starting.
Then I changed that value to 0.19 and clicked 'read config files'. Fifth task started right away.
I changed that value back to 0.20 and clicked 'read config files'. All five tasks kept running.
They were running for about 40 minutes. Then I noticed one task was "waiting to run".
I changed that value in app_config.xml again to 0.19 and clicked 'read config files'. Fifth task continued running.
?? There was nothing else running. Only these GW GPU tasks. Everything else was suspended and not in RAM in any way. Boinc seems to think occasionally 5 x 0.20 is more than 1.
----------------------------------------------------------------------------------
AMD R9 390 (MSI Gaming 8G), driver 19.4.1
Xeon X56.. @ 4GHz , Windows 10 (18362)
4x
run time about 6660 s = 1665 s / task
Sapphire Trixx:
power consumption about 65 W (2x FGRPB1G is about 110 W with same speed settings... board power limit -20 %)
GPU load avg 61 %
GPU temp... low
AMD RX 580 (MSI Gaming X+ 8G) , driver 19.4.1
Xeon X56.. @ 4GHz , Windows 10 (17763)
4x
run time about 7270 s = 1818 s / task
Sapphire Trixx:
power consumption about 65 W (2x FGRPB1G is about 100 W with same speed settings... board power limit -20 %)
GPU load avg 51 %
GPU temp... low
AMD RX 580 (MSI Gaming X 8G) , driver 19.4.1
Xeon X56.. @ 4GHz , Windows 10 (18865)
4x
run time about 6450 s = 1613 s / task
Sapphire Trixx:
power consumption about 65 W (2x FGRPB1G is about 100 W with same speed settings... board power limit -20 %)
GPU load avg 78 % 61 %
GPU temp... low
( I don't really know what makes that GPU load difference between 580's. Platforms are almost identical and those hosts are running these tasks only. Those two cards should be almost identical brothers, but I've noticed they seem to behave somewhat differently (in a way that should be the other way around ).
EDIT : The load difference between those cards isn't that much. It's 51 vs 61 (not 78). I must have looked at wrong value earlier.
One conclusion at this point (app v0.11) : This medium speed Nvidia is sort of faster than it should be. Those AMD cards are 3-4 x as fast as GTX 960 when crunching FGRPB1G. Now they aren't even 2x as fast. Interesting to see how things will be in the end.
AMD R9 270X (Gigabyte
)
AMD R9 270X (Gigabyte 4GB)
Xeon X56.. @ 4GHz , Windows 10 (18865)
4x
run time about 7320 s = 1830 s / task
Sapphire Trixx:
GPU load avg 69 %
GPU temp... low
This was interesting to me. With FGRPB1G this card is only half as fast as 390 / 580.
Basically now this card is able to run these GW tasks as fast as those two cards (running 4x and with identical CPU support).
https://einsteinathome.org/host/12684310
I don't know what is that one doing, but she has a problem with these tasks.
She's the only one with Windows 7, other similar sisters run Windows 10. She's the only one having problems now.
GPU driver is 19.4.1 , same as on the others (just different OS variant).
I tried to run these (4x, 3x) and tasks seems to start running. But...
For the first 4 minutes I see the GPU core clock is jumping between 300-max and mem clock 150-max. Pure sawtooth graph for both. GPU usage is 0.
Then after 4 minutes clock speeds stabilize to max, but... GPU usage jumps straight to 100 % and stays there. Rarely a few notches downwards to 98 or so. At this point the progress is developing well and with good speed.
After 8 minutes the progress resets to 0 and starts from the beginning.
That is normal, I think. I see that happening on the other hosts too during the first few minutes.
BUT... from this on, this host is probably doing just pseudo progress.
CPU is doing something, I see it from the temp, even though temp is still low. GPU usage is still at 100 % and GPU is doing something, but temp is low..
The problem is, progress at this point is going further only '0.004' per second. That means of course a task would take something like 10 hours to complete. That whole thing doesn't look normal.
This host is able to run FGRPB1G tasks quite well, but not these GW GPU tasks, it seems.
AMD R9 390 (MSI Gaming 8G), driver 19.4.1
Xeon X56.. @ 4GHz , Windows 10 (18362)
6x
run time about 8760 s = 1460 s / task (4x was 1665)
Sapphire Trixx:
power consumption about 65 W ... barely any difference to 4x
GPU load avg 65 % (4x was 61 %)
GPU temp... low
Run time per task shortened about 200 s ... which is 12 % quicker, I believe.
I'm pretty sure similar result would apply also with RX 570/580 's.
I'm going to try 8x if this machine will be stable with that.
Richie, Thank you very much
)
Richie,
Thank you very much for going to the trouble of documenting what you are seeing in such detail. I certainly appreciate the effort and I'm sure others do to.
I hope others will be trying a range of different GPUs to probe the limits for concurrent tasks. I'll be doing the same once there is a Linux app.
Once again, thanks very much for providing such useful information.
Cheers,
Gary.
Richie
)
Cut back to 1x or 2x. If run times explode and more than tripple going from 1x to 3x then its what others have seen here with parallel tasks. I can only run 2x on my 580 at E@H. 3x will end up taking forever. some other projects I can only run a single task on it.
Ok, finally got some data
)
Ok, finally got some data from the MS7 machine for Gravity Wave Engineering on GPU
1 GPU 1 work unit 3255 seconds 195 Watts 31% Utilization
1 GPU 2 work units 5040 seconds total or 2520 seconds each 245 Watts
2 GPUs 1 work unit each 3255 seconds 281 Watts 70% Utilization
2 GPUs 2 work units each 5040 seconds total or 2520 seconds each 372 Watts
2 GPUs 3 work units each 6420 seconds total or 2140 second each 398 Watts 81% Utilization
Can't go any higher as there are only 6 physical cores which I directly link to 1 work unit each.
Gary, thank you very much for
)
Gary, thank you very much for your encouraging words. My anxious tests are again far from scientific precision… but I’ll be happy if I succeed to provide some basic picture where these cards are at the large performance ballpark.
I think these different hosts can’t be compared very strictly, because they seem to have some performance variation even if the platforms are almost identical. I see my other RX 580 has been currently even somewhat slower than RX 570. Both have been running 4x. I don’t know why that is. It’s weird. I suspect that a much larger system RAM and different memory settings on the 580 host could cause it to be slower.
Other reason could be that I have overclocked memory on both cards. They have seemingly the same mem speed, but maybe GPU mem and GPU clock values aren’t syncing as well on the 580 then. I’ll have to look into that more… and I could start with stock settings of course.
But currently I wish think also that if I’ll manage to provide enough examples then a host could be compared at least with itself… in different situations. I wish to provide at least overall picture… how this hardware runs these tasks, approximately, about.
Yep, that didn’t work at all. Tasks run for hours, but then they started crashing one by one. Speed per task was extremely slow anyway (total run time was going to be something like 9 hours).
I tried also 7x, but it lead to tasks crashing too.
Like Zalster did mention about the full CPU core per task… I believe that lack of real physical cores was in effect on my system (6c / 12t). Maybe this system had to start dividing CPU resources so much that it was a serious bottleneck and then at some point meant additional problems. Even if that was the case, threads might still work well on more modern platforms or with less concurrent tasks.
After tasks started crashing I tried to change back to 6x on the fly, but even that wasn’t stable anymore. I think something was too messed up at that point. A small background info… I had tried to update Windows to new version yesterday (after running successfully a few tasks 6x). Windows failed to complete that update (unknown reason) and reverted back to were it had been. I’m not sure if everything is running well right now. I’ll keep an eye how this machine will respond to different stress levels.
I decided to start from a scratch with this 390 (MSI) and run 1x. I reset GPU settings and lowered the GPU clock from default 1040 MHz to 960 MHz (which I had used for a long time when running 2x FGRPB1G). No any kind of power limiting this time.
Well, I found out running these tasks 1x is not an option with this particular host. Again an exception that probably can’t be generalized to other hosts. Task started running just fine, but after about 20 minutes the card got bored and dropped GPU clock from about 960 to 300-400. Siesta time… and that meant progress was ticking about ‘0.004’ per second and task would run for hours. GPU load was about 30% though. I watched that half-sleeping thing for some time, but it wasn’t meaningful to continue.
Maybe one tasks doesn’t provide enough load for this particular host. I don’t know.
That is confusing also because…
Thanks Mmonnin! I went back to basics on that host and it actually seems to be able to crunch these GW GPU tasks at 1x. Two tasks completed, avg 4673 s and unlike the other host with same GPU type, this one doesn’t seem to have that problem of GPU falling half asleep while crunching 1x.
EDIT: Early observations of 2x: Doesn't look right. Too high GPU usage, too slow progress.
It seems the other host can't run 1x and this one can't run more than 1x. I guess that's one kind of a balance.
I’m happy to see at this point how the completed “4x” tasks haven’t started to accumulate to invalids. 5 so far, but that looks quite alright. Final app version might change this strange situation completely. Then cards will run hotter and… heat death is soon behind my door again, even without that
A quick note about total
)
A quick note about total wattage. I don't have a Kill A Watt type of tool, but three of my hosts get their AC from a power conditioner. It shows amperes.
I don't see individual readings but I know how much current these hosts require in total while crunching nothing else than 2x FGRPB1G per host (1 CPU + 0.5 GPU per task). 1 Nvidia and 2 AMD GPUs involved.
The same triplet crunching these GW GPU v0.11 tasks 4x per host (1 CPU + 0.25 GPU per task) requires pretty much identical amount of juice. I would say the difference in wattage from the wall is max 50W total. That can't be much per host. If there's a small difference between total load then I would say this GW task scenario takes the lower end on that short scale.