In reporting times here, I suggest mentioning the host CPU, not just the GPU. I currently believe that, although some of the work is sent to the GPU, that the GPU finishes that work so very fast that total elapsed time is dominated by the CPU contribution. As it happens I have three hosts with moderately dissimilar GPUs. Which of the two GPUs happens to run a particular one of these tasks has little, if any, systematic influence on the task elapsed time.
My fastest machine running this work is now taking about 2 hours 36 minutes elapsed time. That has a i5-4690K CPU running at stock clocks. The GPUs on that host are a GTX 1070 and a 6 GB 1060. I don't yet have stabalized values for the other machines.
Not only is the relationship between reported completion and actual completion non-linear, but it also differs drastically from machine to machine within my set. While some of the machines creep slowly up the last few percent toward 100, and either reach there or get very close, the fast machine actually finishes just after reporting slightly under 90% completion. While this is annoying, I'm am mentioning it as a warning to other users, not as an improvement request to the developer(s). I think development effort would at the moment far better be spent on moving more of the calculation from CPU to GPU than on perfecting completion reporting.
I now see spikes even though I closed programs that would cause activity. You are right that the GPU is being used, but it is not effectively being used. I thought the spikes were due to my activity.
Strange i get heavy run-time variation between my hosts despite the fact that they have same GPUs (Radeon 7850/7870) and similar processors (two 2 AMD FX8320 and AMD Phenom II X6)
And also very nonlinear progress of WUs - longer they running the slower they become:
~50% after 1 hour of work
~75% after 2 hours
~90% after 3 hours
~95% after 4 hours
~98% after 5 hours
now @ 6 hours and not completed yet
Lol, it reminds me "paradox" of Achilles and the tortoise! (Achilles can never overtake the tortoise)
And app NOT freeze or stop working. CPU load about same all time: jumping in 30-100% range of 1 CPU core per each task. And GPU load at same level too: 20-40% for one task and 40-70% for 2 concurrent tasks on one GPU
Restart does nothing - WU successfully resumes from checkpoint but continue to work very slowly.
Inadequately inefficient version!
My hosts can do such WUs in 7-8 hour on 1 CPU core only(without using GPU at all) by running Gamma-ray pulsar binary search #1 on GPUs v1.05 (FGRPSSE) app instead.
So almost no any speed up from GPU use - only pure waste of resources: speed about same level, while electricity consumption(CPU+GPU vs CPU only) is approximately 3 times higher.
I think Gary Roberts is well-known to advocate lowering your request work queue settings with new work types or other major configuration changes. I, personally, have lowered my requested amount from a bit over two days to about a half day. Initial indications on my machines are that the first estimate for completion time is well under the truth, so without this precaution I'd have considerably over-fetched.
Firstly, thank you very much for starting this thread. It's so helpful to have a separate place for people to report experiences and get tips on all sorts of things without polluting the real news content of the technical news announcements. Hopefully, people will continue the discussion here rather than there :-).
And yes, I know from bitter experience that it's much safer to ask for just a few tasks of a new run, for a number of reasons. It's much easier to safely get more once you know how the first few are going.
Secondly, I don't run any Windows machines so I'll be interested in following your reports. Hopefully, the Windows version will eventually become as efficient as the Linux version but in the meantime, for those who want to experiment with running multiple concurrent GPU tasks without consuming all the available CPU cores, I would suggest using app_config.xml to tweak the requirement for 1 CPU core per GPU task to something less. That way you can immediately override the default settings provided by the GPU utilization factor mechanism by making changes to app_config.xml and using the 're-read config files' option in BOINC Manager to immediately implement those changes.
As an example, if you had at least a 2GB card (based on Linux experience - may be different for Windows), you could try running 2 GPU tasks supported by 1 CPU core (instead of the default 2) by using the following app_config.xml file, placed in the Einstein project directory. Note that I'm posting this from a borrowed computer in a hospital room so it's from (possibly faulty) memory. Check the above link for documentation on app_config.xml
If 0.5 CPUs per GPU task is not enough CPU support, you could try (with sufficient GPU memory) running 3 GPU tasks supported by 2 CPU cores. That would effectively reserve 0.67 CPUs per GPU task. To do this, just set gpu_usage to 0.33 and cpu_usage to 0.67 in the above example, followed by a 're-read config files' in BOINC Manager. Please realise that I don't have access to a Windows machine to try any of this so anyone following this will need to experiment to find the optimum conditions. It may well be that one core per GPU task really is the best option anyway.
With only some of the calculation being done on the GPU, more tasks running concurrently may make better use of the GPU but at the expense of interfering with the CPU support for each of those tasks. Hopefully the current inefficiency of the Windows app is just a temporary problem.
On my GTX660 how many instances of this new app do I want to run? I have been very happy running 3 on BRPs.
Do I want to free up a core to geed the GPU, currently I don't.
For the duration (I hope short) of this specific 1.15 application, I think we Windows sufferers should switch our mindset.
Don't think of the CPU task as support for the GPU. Think of the GPU as support for a CPU task.
In practice this means forgetting about "reserving a core" (which is a misnomer anyway). You'll get less done, not more, if you leave any CPU cycles unused with this application (not true for lots of other applications). The GPU can keep up with the CPU just about well enough for the difference between, e.g. a 1060 and a 1070, or between a 750Ti and a 970 not to be noticed.
To simplify things I'll assume you want to run only 1.15 "GPU" tasks. Say you have 4 total cores and two GPU cards (this is real for me on my fastest machine). Then if you allow in Compute preferences use at most 100% of the CPU time and set the Einstein project preference for the machine's location/venue for GPU Utilization of FGRP aps to 0.5, you'll find that once boincmgr on your machine catches on to your desire it will start four total 1.15 tasks, two assigned to each GPU, and each consuming 100% of a core (or what is left after any overhead on your machine).
In this specific case, if you have pure CPU tasks in your queue, they will go to a "waiting to run" state until BOINC thinks you are in schedule trouble, at which point it will suspend some 1.15 "GPU" work and run one or more CPU tasks in "high priority" mode.
So if you choose to run this way you'll want to turn off the setting at Project Preferences|your location|Resource Settings|Use CPU|Request CPU-only tasks from this project.
On the other hand if you like to run a mix of CPU and "GPU" work just set a low enough multiplicity so that BOINC does not start so many "CPU" tasks as to consume all allowed cores.
Gary's advice may become useful if some future version of the application is not so nearly a 100% CPU user as to make it useful to trick the scheduler into starting more tasks. For the current version I'm pretty sure doing so will prove at least slightly counter-productive, though I'd be happy for someone to post actual experimental results.
totally agree, its simply not worth it! A GPU task should use the GPU, not try to swamp the CPU.
The recently stopped for lack of work BRP4G setup used 0.02 cpu per WU, which was fine here, would also have been ok at 0.1 but running 1 CPU + 1 GPU with the GPU heavily underused leading to excessive run times is a waste of rescources.
As is the servers habit of providing more work than can be done before the time limit expires, I had to abort several WU that I couldn't possibly complete before the deadline.
The server estimate of 30 odd mins to complete a task that can take up to over 4 hours is unworkable.
[edit] Just checked WU did validate but the timings were daft
13,838.86
13,822.67
The above are total run time and cpu time used. Which quite clearly demonstates the lack of GPU usage.
So I've set E@H to NNT until such time as either there are more BRP4G available or the project can provide a more GPU dedicated app.
In reporting times here, I
)
In reporting times here, I suggest mentioning the host CPU, not just the GPU. I currently believe that, although some of the work is sent to the GPU, that the GPU finishes that work so very fast that total elapsed time is dominated by the CPU contribution. As it happens I have three hosts with moderately dissimilar GPUs. Which of the two GPUs happens to run a particular one of these tasks has little, if any, systematic influence on the task elapsed time.
My fastest machine running this work is now taking about 2 hours 36 minutes elapsed time. That has a i5-4690K CPU running at stock clocks. The GPUs on that host are a GTX 1070 and a 6 GB 1060. I don't yet have stabalized values for the other machines.
Not only is the relationship between reported completion and actual completion non-linear, but it also differs drastically from machine to machine within my set. While some of the machines creep slowly up the last few percent toward 100, and either reach there or get very close, the fast machine actually finishes just after reporting slightly under 90% completion. While this is annoying, I'm am mentioning it as a warning to other users, not as an improvement request to the developer(s). I think development effort would at the moment far better be spent on moving more of the calculation from CPU to GPU than on perfecting completion reporting.
I now see spikes even though
)
I now see spikes even though I closed programs that would cause activity. You are right that the GPU is being used, but it is not effectively being used. I thought the spikes were due to my activity.
This does not bode well. I
)
This does not bode well. I look forward to getting onto them after I run out of BRPs.
Strange i get heavy run-time
)
Strange i get heavy run-time variation between my hosts despite the fact that they have same GPUs (Radeon 7850/7870) and similar processors (two 2 AMD FX8320 and AMD Phenom II X6)
And also very nonlinear progress of WUs - longer they running the slower they become:
~50% after 1 hour of work
~75% after 2 hours
~90% after 3 hours
~95% after 4 hours
~98% after 5 hours
now @ 6 hours and not completed yet
Lol, it reminds me "paradox" of Achilles and the tortoise! (Achilles can never overtake the tortoise)
And app NOT freeze or stop working. CPU load about same all time: jumping in 30-100% range of 1 CPU core per each task. And GPU load at same level too: 20-40% for one task and 40-70% for 2 concurrent tasks on one GPU
Restart does nothing - WU successfully resumes from checkpoint but continue to work very slowly.
Inadequately inefficient version!
My hosts can do such WUs in 7-8 hour on 1 CPU core only(without using GPU at all) by running Gamma-ray pulsar binary search #1 on GPUs v1.05 (FGRPSSE) app instead.
So almost no any speed up from GPU use - only pure waste of resources: speed about same level, while electricity consumption(CPU+GPU vs CPU only) is approximately 3 times higher.
This new app is insane! Very
)
This new app is insane!
Very low GPU usage (3 WU at a time <40%) and very large time to compleate > 4 hrs.
Looks like it not realy uses the GPU.
Any clues how to fix that?
Since I'm about out of BRPs
)
Since I'm about out of BRPs and going to switch over to these I have a few questions.
On my GTX660 how many instances of this new app do I want to run? I have been very happy running 3 on BRPs.
Do I want to free up a core to geed the GPU, currently I don't.
All 3 of my boxes are Windoz.
archae86 wrote:I think Gary
)
Firstly, thank you very much for starting this thread. It's so helpful to have a separate place for people to report experiences and get tips on all sorts of things without polluting the real news content of the technical news announcements. Hopefully, people will continue the discussion here rather than there :-).
And yes, I know from bitter experience that it's much safer to ask for just a few tasks of a new run, for a number of reasons. It's much easier to safely get more once you know how the first few are going.
Secondly, I don't run any Windows machines so I'll be interested in following your reports. Hopefully, the Windows version will eventually become as efficient as the Linux version but in the meantime, for those who want to experiment with running multiple concurrent GPU tasks without consuming all the available CPU cores, I would suggest using app_config.xml to tweak the requirement for 1 CPU core per GPU task to something less. That way you can immediately override the default settings provided by the GPU utilization factor mechanism by making changes to app_config.xml and using the 're-read config files' option in BOINC Manager to immediately implement those changes.
As an example, if you had at least a 2GB card (based on Linux experience - may be different for Windows), you could try running 2 GPU tasks supported by 1 CPU core (instead of the default 2) by using the following app_config.xml file, placed in the Einstein project directory. Note that I'm posting this from a borrowed computer in a hospital room so it's from (possibly faulty) memory. Check the above link for documentation on app_config.xml
If 0.5 CPUs per GPU task is not enough CPU support, you could try (with sufficient GPU memory) running 3 GPU tasks supported by 2 CPU cores. That would effectively reserve 0.67 CPUs per GPU task. To do this, just set gpu_usage to 0.33 and cpu_usage to 0.67 in the above example, followed by a 're-read config files' in BOINC Manager. Please realise that I don't have access to a Windows machine to try any of this so anyone following this will need to experiment to find the optimum conditions. It may well be that one core per GPU task really is the best option anyway.
With only some of the calculation being done on the GPU, more tasks running concurrently may make better use of the GPU but at the expense of interfering with the CPU support for each of those tasks. Hopefully the current inefficiency of the Windows app is just a temporary problem.
Cheers,
Gary.
Betreger wrote:On my GTX660
)
For the duration (I hope short) of this specific 1.15 application, I think we Windows sufferers should switch our mindset.
Don't think of the CPU task as support for the GPU. Think of the GPU as support for a CPU task.
In practice this means forgetting about "reserving a core" (which is a misnomer anyway). You'll get less done, not more, if you leave any CPU cycles unused with this application (not true for lots of other applications). The GPU can keep up with the CPU just about well enough for the difference between, e.g. a 1060 and a 1070, or between a 750Ti and a 970 not to be noticed.
To simplify things I'll assume you want to run only 1.15 "GPU" tasks. Say you have 4 total cores and two GPU cards (this is real for me on my fastest machine). Then if you allow in Compute preferences use at most 100% of the CPU time and set the Einstein project preference for the machine's location/venue for GPU Utilization of FGRP aps to 0.5, you'll find that once boincmgr on your machine catches on to your desire it will start four total 1.15 tasks, two assigned to each GPU, and each consuming 100% of a core (or what is left after any overhead on your machine).
In this specific case, if you have pure CPU tasks in your queue, they will go to a "waiting to run" state until BOINC thinks you are in schedule trouble, at which point it will suspend some 1.15 "GPU" work and run one or more CPU tasks in "high priority" mode.
So if you choose to run this way you'll want to turn off the setting at Project Preferences|your location|Resource Settings|Use CPU|Request CPU-only tasks from this project.
On the other hand if you like to run a mix of CPU and "GPU" work just set a low enough multiplicity so that BOINC does not start so many "CPU" tasks as to consume all allowed cores.
Gary's advice may become useful if some future version of the application is not so nearly a 100% CPU user as to make it useful to trick the scheduler into starting more tasks. For the current version I'm pretty sure doing so will prove at least slightly counter-productive, though I'd be happy for someone to post actual experimental results.
Windows 8.1 running a NVIDIA
)
Windows 8.1 running a NVIDIA 980ti I now over 3.5 hours. Not worth it.
Hi Raymondo,totally agree,
)
Hi Raymondo,
totally agree, its simply not worth it! A GPU task should use the GPU, not try to swamp the CPU.
The recently stopped for lack of work BRP4G setup used 0.02 cpu per WU, which was fine here, would also have been ok at 0.1 but running 1 CPU + 1 GPU with the GPU heavily underused leading to excessive run times is a waste of rescources.
As is the servers habit of providing more work than can be done before the time limit expires, I had to abort several WU that I couldn't possibly complete before the deadline.
The server estimate of 30 odd mins to complete a task that can take up to over 4 hours is unworkable.
[edit] Just checked WU did validate but the timings were daft
The above are total run time and cpu time used. Which quite clearly demonstates the lack of GPU usage.
So I've set E@H to NNT until such time as either there are more BRP4G available or the project can provide a more GPU dedicated app.
Cliff,
Been there, Done that, Still no damm T Shirt.