My times are significantly different from the times you are getting. I am using the median run time over the last 7 days to compare the relative performance of my computers. Is the elapsed time you are using the same as the run time I am using? I am pulling the run time from the task webpage.
I am using the average rather than the median, but after discarding the short-running units. As the distribution looks pretty well-behaved, I think the average and median in this case should be close to the same.
As to data source, I am using the history tab of BOINCTasks, selecting the column with the label "Elapsed Time". I believe this to be the same value as that which you find as "Run time" on the task web page, and on a single check case it agreed to the second.
It appears your configuration and host are getting more use out of your 1070 than I am getting out of mine. Given the considerable host burden posed by the Einstein application, part of the answer is probably that I am running two GPU cards on the same PC. If the project releases an improved Windows application with substantially less CPU support requirement (I believe Bernd suggested this was likely) and less Bus requirement, the two-card disadvantage may reasonably be hoped to drop down a good bit. While not so many years ago the two-card disadvantage was quite substantial, for the two previous major Einstein GPU applications it was very modest.
I am using the average rather than the median, but after discarding the short-running units. As the distribution looks pretty well-behaved, I think the average and median in this case should be close to the same.
I agree. In the last 7 days computer 12462786 completed 694 tasks with an average runtime of 1744.20 seconds and a median runtime of 1745.85 seconds. I think the comparison between our computers is valid and agree the dual GPU configuration is part of the difference. For past E@H applications, it didn't matter what CPU/MB host the GPU was in. For this application, good CPU support is critical. I will probably be reshuffling my GPU/CPU setup next week in an attempt to maximize output.
Dualboot Dell T20, ECC Memory, NVidia 750Ti, no overclocking, no CPU Tasks, 2x FGRP 1.17:
Card CPU mult ET(h:mm:ss) Driver OS Hostid
750Ti E3-1225v3 2X ~2:18:45(!) 367.57 Linux Mint 17.3 12241921
750Ti E3-1225v3 2X ~1:57:30 376.57 Win 7/64 Bit 12247194
Interesting results - thanks very much for posting. Really does show that there's nothing sub-standard about the Windows app for that particular card, in comparison to the Linux app.
I have two 750Tis in separate hosts running on Linux (PCLinuxOS) with older driver versions than either of the above. The run times are averaging pretty much what your Mint system is getting. One is about 2 mins faster and the other about 2 mins slower. Both hosts have Pentium dual core processors and the faster run times come from the machine with a slightly faster processor speed.
Both sets of results show huge CPU time components - not much less than the run time in each case. The task mix on both hosts is 2x for GPU tasks with a single CPU task. This leaves just one core 'free'. As an experiment, I recently suspended all CPU tasks on both hosts for about 12 hours, so as to have both CPU cores available for GPU task support. It made very little difference - perhaps a very slight reduction - to the GPU task run/CPU times as reported. I didn't leave it long enough to be really sure. To me, this seems to indicate that the reported CPU times for GPU tasks are fictitious in some way. If the GPU tasks really are consuming that much CPU time, shouldn't running a CPU task alongside the 2 GPU tasks really have a big impact???
I don't think we have seen any new revelations about the nature of OpenCL apps. What I see here at Einstein is exactly the same behavior as the OpenCL apps over at SETI. OpenCL by its nature demands much more CPU support than CUDA apps. Look back at the last run of BRP4G Beta55 CUDA tasks run at ~1850 secs. Fairly close runtimes to the new FGRPB1G tasks run with OpenCL. The difference is that the old CUDA app only needed ~300 secs or so of CPU support vs almost identical CPU runtimes for the the FGRPB1G GPU runtimes.
As far as why the OpenCL tasks run at 2X don't apparently need as much CPU support as expected, I think it has a lot to do with how much spin-waiting the apps have to do while waiting for a support CPU slice. Even though my SIV monitor shows 100% CPU usage of all eight cores when running 4x Einstein tasks, I have not experienced any sluggishness in response of the system which a CPU starved desktop should present. Over at SETI, you have to pay attention to how you configure your task CPU and GPU usage on lesser hardware with the current mix of CUDA and OpenCL apps to not get system lags. The SETI app developers are making bigger strides in app development for Linux systems compared to Windows systems just because the Linux architecture is just more efficient in timeslice sharing compared to the legacy Windows architecture.
I don't think the CPU times reported here at Einstein are fictitious, just of a different nature.
Maybe "spin-waiting" is what is going on. It should not be. I occasionally see an installer that hogs a whole CPU core to itself during run time without really using it most of the time, but it is not behavior I expect from a proper application.
Now that I think about it, I see that on a 4-core (no HT) host supporting two cards, a 1060 and a 1070 both running at 2X, I see that same near-equality of reported CPU time to elapsed time others are reporting. The fun part is that this means on the same host, a job running on the (slower) 1060 is reported as consuming considerably more CPU time than one running on the 1070.
There is indication, however, that the clFinish()s in the current code cause a lot of CPU load via the driver. I'm really not sure how much data transfer this involves, it really depends on the implementation in the driver. We'll work on a way to get rid of these clFinish()s as much as possible, which should also reduce the CPU utilization.
.... To me, this seems to indicate that the reported CPU times for GPU tasks are fictitious in some way. If the GPU tasks really are consuming that much CPU time, shouldn't running a CPU task alongside the 2 GPU tasks really have a big impact???
I need to apologise for certain lack of attention to detail in making the above statement.
The two hosts in question were being restarted after a big storm caused a power outage that took out the entire farm earlier this month. I have had very limited time to fix problems and get hosts back on line so it has been a very slow process. The transition from BRP4G to FGRPB1G has made things more difficult so I concentrated on getting hosts with better performing GPUs (HD7850, etc) back running first. The best NVIDIA GPUS I have are the 750Tis and I've only just brought them back on line. I've got lots of GTX 650s (quite old now) and they were good performers with the BRP6 app. They are woeful with FGRPB1G.
In deciding what to do with my NVIDIA GPUs, I just started playing with the two 750Tis. They are 2GB so running two tasks seemed appropriate. Running a CPU task as well seemed to make very little difference, which puzzled me because I wasn't looking at what was happening to the CPU task. Turns out it was running extremely slowly and I completely overlooked that. So there is no fictitious CPU time. The CPU task is just getting small chunks of CPU cycles when a GPU task releases them and building up a huge run time in the process.
On further experimenting, running only a single GPU task alongside a CPU task causes the GPU run time to approximately halve and the CPU task time to return to a more normal value. It gives about the same credit output as running 2 GPU tasks and no CPU tasks. It's a quite woeful output compared with what these two used to produce running the BRP apps.
I'm considering replacing all my GTX650s with R7 370s. I can buy these for little more than $US100 and the output for FGRPB1G is very good. I converted one machine today. It's a G640 Pentium dual core and it's running 2x GPU tasks and a single CPU task. The first pair of GPU tasks just finished in around 46 mins. My older HD7850s (basically the same architecture) are taking about 52-54 mins at the same settings. That seems to be a nice benefit for the R7 370.
It will be interesting to see if the Devs can get Kepler series GPUs like the GTX650 to perform better in the future.
My oldest host, an i7-930 does very poorly with the new Fermi GPU tasks. BRP4/6 tasks on it ran fine using a GT560 and getting about half of the performance of my i7-4790K/4770K hosts with GTX 980/770's. After switching to the new Fermi tasks it was only getting 1/3 to 1/4th the throughput of my faster machines. Initially I assumed it was just the 560 finally showing its age and assumed it would speed up with a newer card. Since I bought myself a GTX1080 for Christmas and was able to shuffle all my cards downstream that was easy enough to do. It barely helped though, I put the GTX770 (which was doing solo WUs in ~1800 seconds on my i7-4770K) into my i7-930 but it only nudged the runtime in that host from 6600 to 6200 seconds/task.
The relative performance gap got larger when I shifted both hosts to 2:1 and runtimes (for a pair of WUs) went to 9500 vs 2300 seconds.
It's got a current GPU driver installed (I had to install a new one when swapping the card); and the CPU appears to be functioning normally (about 50% longer for CPU tasks compared to my newer machines - which is in line with the innate performance gap). At this point, I'm not sure what else to try on my end.
The only thing I can think of is if the CPU part of the app only has AVX and x87 codepaths but not SSE (the 930 is too old to have AVX), in which case I'm stuck unless a new app version is released. If that's the case, or it's something else I can't identify or fix, I'll probably swap that host over to run something else on the GPU that it can run more efficiently but wanted to ask here first.
[version] NVidia device (or driver) doesn't support OpenCL
For this GPU:
NVIDIA GeForce GTX 750 Ti (2048MB) driver: 364.72
The NVIDIA website for this GPU says:
GTX 750 Ti Support: 4.4OpenGL
Does not this GPU support OpenCL?
Thanks!
OpenGL is not the same as OpenCL.
Try to reinstall or update the graphics driver, if the one installed was supplied by Windows Update it probably doesn't include OpenCL support. A GTX 750Ti does support OpenCL and should have no problems running the new FRGRP GPU app.
[version] NVidia device (or driver) doesn't support OpenCL
For this GPU:
NVIDIA GeForce GTX 750 Ti (2048MB) driver: 364.72
The NVIDIA website for this GPU says:
GTX 750 Ti Support: 4.4OpenGL
Does not this GPU support OpenCL?
Thanks!
One of the two GPUs on this PC is a 750Ti. It is running this work succcessfully (along with a 970 on the same machine).
Other than two GPUs on one PC, which is not likely to help, the most obvious difference is that I am running a more modern driver. When this set of applications first came out, one of my four GPU machines initially would not get tasks, and started to as soon as I updated the driver (which was not very old at all).
750Ti can run this work. I suggest you update the driver.
[edited after posting to add details: the exact error message I received before updating the driver was
"[version] NVidia device (or driver) doesn't support OpenCL"
The driver version I had when NOT getting work was 372.54.]
n12365 wrote:My times are
)
I am using the average rather than the median, but after discarding the short-running units. As the distribution looks pretty well-behaved, I think the average and median in this case should be close to the same.
As to data source, I am using the history tab of BOINCTasks, selecting the column with the label "Elapsed Time". I believe this to be the same value as that which you find as "Run time" on the task web page, and on a single check case it agreed to the second.
It appears your configuration and host are getting more use out of your 1070 than I am getting out of mine. Given the considerable host burden posed by the Einstein application, part of the answer is probably that I am running two GPU cards on the same PC. If the project releases an improved Windows application with substantially less CPU support requirement (I believe Bernd suggested this was likely) and less Bus requirement, the two-card disadvantage may reasonably be hoped to drop down a good bit. While not so many years ago the two-card disadvantage was quite substantial, for the two previous major Einstein GPU applications it was very modest.
archae86 wrote:I am using the
)
I agree. In the last 7 days computer 12462786 completed 694 tasks with an average runtime of 1744.20 seconds and a median runtime of 1745.85 seconds. I think the comparison between our computers is valid and agree the dual GPU configuration is part of the difference. For past E@H applications, it didn't matter what CPU/MB host the GPU was in. For this application, good CPU support is critical. I will probably be reshuffling my GPU/CPU setup next week in an attempt to maximize output.
DF1DX wrote: Dualboot Dell
)
Interesting results - thanks very much for posting. Really does show that there's nothing sub-standard about the Windows app for that particular card, in comparison to the Linux app.
I have two 750Tis in separate hosts running on Linux (PCLinuxOS) with older driver versions than either of the above. The run times are averaging pretty much what your Mint system is getting. One is about 2 mins faster and the other about 2 mins slower. Both hosts have Pentium dual core processors and the faster run times come from the machine with a slightly faster processor speed.
Both sets of results show huge CPU time components - not much less than the run time in each case. The task mix on both hosts is 2x for GPU tasks with a single CPU task. This leaves just one core 'free'. As an experiment, I recently suspended all CPU tasks on both hosts for about 12 hours, so as to have both CPU cores available for GPU task support. It made very little difference - perhaps a very slight reduction - to the GPU task run/CPU times as reported. I didn't leave it long enough to be really sure. To me, this seems to indicate that the reported CPU times for GPU tasks are fictitious in some way. If the GPU tasks really are consuming that much CPU time, shouldn't running a CPU task alongside the 2 GPU tasks really have a big impact???
Cheers,
Gary.
I don't think we have seen
)
I don't think we have seen any new revelations about the nature of OpenCL apps. What I see here at Einstein is exactly the same behavior as the OpenCL apps over at SETI. OpenCL by its nature demands much more CPU support than CUDA apps. Look back at the last run of BRP4G Beta55 CUDA tasks run at ~1850 secs. Fairly close runtimes to the new FGRPB1G tasks run with OpenCL. The difference is that the old CUDA app only needed ~300 secs or so of CPU support vs almost identical CPU runtimes for the the FGRPB1G GPU runtimes.
As far as why the OpenCL tasks run at 2X don't apparently need as much CPU support as expected, I think it has a lot to do with how much spin-waiting the apps have to do while waiting for a support CPU slice. Even though my SIV monitor shows 100% CPU usage of all eight cores when running 4x Einstein tasks, I have not experienced any sluggishness in response of the system which a CPU starved desktop should present. Over at SETI, you have to pay attention to how you configure your task CPU and GPU usage on lesser hardware with the current mix of CUDA and OpenCL apps to not get system lags. The SETI app developers are making bigger strides in app development for Linux systems compared to Windows systems just because the Linux architecture is just more efficient in timeslice sharing compared to the legacy Windows architecture.
I don't think the CPU times reported here at Einstein are fictitious, just of a different nature.
Maybe "spin-waiting" is what
)
Maybe "spin-waiting" is what is going on. It should not be. I occasionally see an installer that hogs a whole CPU core to itself during run time without really using it most of the time, but it is not behavior I expect from a proper application.
Now that I think about it, I see that on a 4-core (no HT) host supporting two cards, a 1060 and a 1070 both running at 2X, I see that same near-equality of reported CPU time to elapsed time others are reporting. The fun part is that this means on the same host, a job running on the (slower) 1060 is reported as consuming considerably more CPU time than one running on the 1070.
Maybe fixing this is what Bernd was foreshadowing.
On December 18 he wrote
Gary Roberts wrote:.... To
)
I need to apologise for certain lack of attention to detail in making the above statement.
The two hosts in question were being restarted after a big storm caused a power outage that took out the entire farm earlier this month. I have had very limited time to fix problems and get hosts back on line so it has been a very slow process. The transition from BRP4G to FGRPB1G has made things more difficult so I concentrated on getting hosts with better performing GPUs (HD7850, etc) back running first. The best NVIDIA GPUS I have are the 750Tis and I've only just brought them back on line. I've got lots of GTX 650s (quite old now) and they were good performers with the BRP6 app. They are woeful with FGRPB1G.
In deciding what to do with my NVIDIA GPUs, I just started playing with the two 750Tis. They are 2GB so running two tasks seemed appropriate. Running a CPU task as well seemed to make very little difference, which puzzled me because I wasn't looking at what was happening to the CPU task. Turns out it was running extremely slowly and I completely overlooked that. So there is no fictitious CPU time. The CPU task is just getting small chunks of CPU cycles when a GPU task releases them and building up a huge run time in the process.
On further experimenting, running only a single GPU task alongside a CPU task causes the GPU run time to approximately halve and the CPU task time to return to a more normal value. It gives about the same credit output as running 2 GPU tasks and no CPU tasks. It's a quite woeful output compared with what these two used to produce running the BRP apps.
I'm considering replacing all my GTX650s with R7 370s. I can buy these for little more than $US100 and the output for FGRPB1G is very good. I converted one machine today. It's a G640 Pentium dual core and it's running 2x GPU tasks and a single CPU task. The first pair of GPU tasks just finished in around 46 mins. My older HD7850s (basically the same architecture) are taking about 52-54 mins at the same settings. That seems to be a nice benefit for the R7 370.
It will be interesting to see if the Devs can get Kepler series GPUs like the GTX650 to perform better in the future.
Cheers,
Gary.
My oldest host, an i7-930
)
My oldest host, an i7-930 does very poorly with the new Fermi GPU tasks. BRP4/6 tasks on it ran fine using a GT560 and getting about half of the performance of my i7-4790K/4770K hosts with GTX 980/770's. After switching to the new Fermi tasks it was only getting 1/3 to 1/4th the throughput of my faster machines. Initially I assumed it was just the 560 finally showing its age and assumed it would speed up with a newer card. Since I bought myself a GTX1080 for Christmas and was able to shuffle all my cards downstream that was easy enough to do. It barely helped though, I put the GTX770 (which was doing solo WUs in ~1800 seconds on my i7-4770K) into my i7-930 but it only nudged the runtime in that host from 6600 to 6200 seconds/task.
The relative performance gap got larger when I shifted both hosts to 2:1 and runtimes (for a pair of WUs) went to 9500 vs 2300 seconds.
It's got a current GPU driver installed (I had to install a new one when swapping the card); and the CPU appears to be functioning normally (about 50% longer for CPU tasks compared to my newer machines - which is in line with the innate performance gap). At this point, I'm not sure what else to try on my end.
The only thing I can think of is if the CPU part of the app only has AVX and x87 codepaths but not SSE (the 930 is too old to have AVX), in which case I'm stuck unless a new app version is released. If that's the case, or it's something else I can't identify or fix, I'll probably swap that host over to run something else on the GPU that it can run more efficiently but wanted to ask here first.
I get this log message:
)
I get this log message:
rbpeake wrote:I get this log
)
OpenGL is not the same as OpenCL.
Try to reinstall or update the graphics driver, if the one installed was supplied by Windows Update it probably doesn't include OpenCL support. A GTX 750Ti does support OpenCL and should have no problems running the new FRGRP GPU app.
rbpeake wrote:I get this log
)
One of the two GPUs on this PC is a 750Ti. It is running this work succcessfully (along with a 970 on the same machine).
Other than two GPUs on one PC, which is not likely to help, the most obvious difference is that I am running a more modern driver. When this set of applications first came out, one of my four GPU machines initially would not get tasks, and started to as soon as I updated the driver (which was not very old at all).
750Ti can run this work. I suggest you update the driver.
[edited after posting to add details: the exact error message I received before updating the driver was
"[version] NVidia device (or driver) doesn't support OpenCL"
The driver version I had when NOT getting work was 372.54.]