I have been testing how to optimize my computer's processing performance on Einstein tasks. I am having a problem with my computer 4213062.
The computer is:
Intel Core 2 Quad CPU Q8400 @ 2.66GHz [Family 6 Model 23 Stepping 10]
RAM: 8190 MB
GPU: one NVIDIA GeForce GTX 1070 (4095MB) driver: 390.65
Note: Einstein shows the GPU has 4095 MB, but it is supposed to have 8192 MB, and GPU-Z shows 8192 MB.
Windows 10 Professional x64 Edition, (10.00.18362.00)
BOINC Ver: 7.14.2
I have a problem using app_config.xml to change the number of simultaneous tasks on the GPU.
It works fine if I try to run one or two tasks at a time, but attempts to run three or more at once only run two tasks simultaneously.
I do save the changes to my app_config.xml and restart BOINC each time I make a change.
Here is an example of the code I am using to attempt to run two O2MDF tasks or three FGRP tasks.
<app_config>
<app>
<name>hsgamma_FGRPB1G</name>
<gpu_versions>
<gpu_usage>0.33</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
<app>
<name>einstein_O2MDF</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>
Am I doing something wrong? If so, what should I change?
Also, are there any other app names I should add to ensure I am running all of Einstein's task types?
Thanks!
Copyright © 2024 Einstein@Home. All rights reserved.
Hoping to have understood
)
Hoping to have understood your question ...
This is how I work it:
<app_config>
<project_max_concurrent>8</project_max_concurrent>
<app>
<name>hsgamma_FGRPB1G</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
<app>
<name>einstein_O2MDF</name>
<max_concurrent>1</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>
The above lets me run max 8 WUs, consisting of
max 2 FGRPB1G gpu WUs
and
max 1 O2MDF gpu WU
plus
max 8 cpu WUs --> minus max 3 gpu WUs = 5 cpus WUs
The max_concurrent part might be of interest to you.
I would update the NVIDIA driver to perhaps 445.87, yours is very old.
Thank you, San Fernando. I
)
Thank you, San Fernando.
I tried your suggestions. Unfortunately, they didn't work on my computer. Nothing would run.
I think I didn't do a good job of explaining what I have, and what I am trying to do.
The computer in question has one processor with four CPUs built into that chip. It's an old Intel Core2Quad chip. I have only one GPU, the GTX 1070 I mentioned. At this time I wish to run only GPU tasks. The app_config file I pasted into my original post will only run two tasks at a time on the one GPU, regardless of whether they are both two gamma ray tasks, two gravitational wave tasks, or one of each. I would like to be able to run up to three tasks at a time on my one GPU. For some reason I can't get that third task to run. What I had worked okay in SETI. I could run up to four tasks simultaneously on my one GPU by setting the GPU Usage to 0.25 and the CPU usage to 1. However, for Einstein, setting the GPU and CPU Usages to 1 allows one GPU task to run; and setting the GPU Usage to 0.5 and the CPU Usage to 1 allows two Einstein GPU tasks to run on my one GPU simultaneously. However, setting the GPU Usage to 0.33 or smaller and the CPU Usage to 1 does not allow any additional Einstein tasks to run simultaneously on my one GPU.
That is my problem.
Is it possible to get that one GPU to run three or more Einstein GPU tasks simultaneously, and, if it is possible, how do I set it up?
Well, I guess I misunderstood
)
Well, I guess I misunderstood you.
Sorry.
Have you checked the Event Log and the stderr log of the tasks?
I guess you know about these.
Did you update your gpu driver?
One gpu WU uses about 900MB ram on my GTX 1650.
So maybe running three might be to much if the system thinks you only have a 4000MB ram gpu card.
If it is a ram problem, you should be able to see the appropiate message in stderr log.
That is about all I can help you. Godd luck!
Cherokee150 wrote:Thank you,
)
What were your times with 1 WU to a GPU vs 2 WU to the GPU? Was there any significant decrease or was it just twice as long? What was your GPU utilization with 2 work units? I'm trying to remember but I don't think there was any benefit to running more than 1 WU per GPU. Unlike Seti, the OpenCl apps are rooted in fairly old programing. Seti had the benefit of Petri's special Cuda or Raistmer's OpenCl. Both of which had been highly modified over the years to run leaner and quicker.
Z
Cherokee150 wrote:I have been
)
Thanks for documenting everything so well. I don't use nvidia GPUs at all but I do have lots of experience with the GPU apps on a whole variety of AMD GPUs. Also, since I have about 20 machines with Q8400 CPUs, all with discrete GPUs, I have some experience that might be relevant to your CPU.
There is nothing wrong with the syntax of your app_config.xml file. However, trying to run 3 GRP tasks concurrently is most likely to be of no benefit to you. The GRP app puts enough load on the GPU that running just 2 of these pretty much will be the best you can do. I get about a 7-10% improvement in throughput from running 2x. When I try to run 3x, there is no gain in throughput and crunch times suddenly become variable. I interpret this as some resources being over-committed. Tasks have to wait for resources to be freed. Sometimes tasks even crash.
There has been for me a very nice benefit for running GW GPU tasks at 2x, 3x, and even 4x in earlier times. This is because the algorithm is apparently not amenable to running everything in a parallel fashion. There are sections of code that need the sequential processing capabilities of a CPU and so would probably run much more slowly if forced to run on a GPU. By running multiple concurrent tasks, with enough CPU cores available to handle the CPU only bits, there was considerable extra throughput to be gained.
The biggest problem with that was that the CPU architecture turned out to be extremely important. I spent a lot of time testing older architecture like the Q6600 and Q8400 quads and found that both crunch times and error rates were much higher with the older CPUs compared with the same GPU being supported by a more modern CPU. Admittedly, that was in much earlier times when there were more bugs in the app, so I don't really know if that has now changed. What I do know is that the most recent GW GPU tasks seem to be a lot more memory hungry and people are now reporting problems running even single tasks on 3GB nvidia GPUs. I'm still running 3x on 4GB AMD RX 570 GPUs but I'm about to reduce to 2x to make sure I avoid problems with the very latest tasks.
BOINC detects your hardware during startup. Check your event log to see what BOINC says about your GPU. If BOINC is mis-detecting, you could try upgrading to the newly released 7.16.5 to see if that fixes things. If not, perhaps report it on the BOINC boards. Einstein just uses what BOINC says.
If BOINC is already detecting the 8192 MB, let us know and I'll report it to the Einstein staff to see why it's being listed incorrectly. Your GPU should be able to handle 2x. Because of the nature of the app, that should give a reasonable improvement. You still have the problem of CPU architecture, though.
I don't know why that would be. The only thing I can think of is the setting for how many cores BOINC is allowed to use. You don't have that set to 50% by any chance?
Another point here that you need to consider is the way nvidia implements OpenCL in it's drivers. nvidia derives a lot of revenue from professional series cards running CUDA for compute. Whilst it supports OpenCL, it stands to reason that it's not going to damage its revenue stream by allowing OpenCL to compete on a level playing field. You should draw your own conclusions about that. I'm not technically qualified to comment in detail. I will note that nvidia GPUs consistently require more than a full CPU for the support of GW GPU tasks and this is quite different from the way AMD GPUs behave.
No, you're not doing anything 'wrong' but I'll make some suggestions about some 'difficulties' you might like to avoid.
For your hardware (both machines) there are just the two GPU searches you are currently discussing. The GRP search has estimates that are significantly longer than the true crunch time. It's the exact opposite for the GW search. For Einstein, it needs to use Locality Scheduling to handle GW tasks (both CPU and GPU) and that is its main 'business' - the first detection of continuous GW. To use Locality Scheduling it uses its own highly customised version of the BOINC server code, complete with the older DCF (duration correction factor) mechanism for adjusting task estimates.
Since there is a single DCF for all searches, badly calibrated task estimates can play havoc with large work caches when alternate types of tasks try to impose radically different 'corrections' to estimates. This is not really a problem at all, if you keep the cache size small and ignore the incorrect estimates. BOINC will be able to manage if you don't exceed something like 0.5 - 1.0 cache size.
A better way (which could work for you) is to run just one search on each host. To use different preferences for each host you would need to use two separate 'locations'. For example you could put one in 'home' and one in 'work' and set the preferences for each separately.
Each host would have vastly different DCF values stored in the state file but the estimates would be correct and the work on hand would agree with what the cache size was set to. You have 2 machines. You could use the one with the GTX 960 to run just the GRP tasks - you could try 2x - and the one with the 1070 to run just the GW tasks. 2x should be fine on that. You might also get a benefit from 3x. The only way to find out is to try :-).
Please ask further questions if there is anything that's not clear. I hope some of the above will be useful to you.
Cheers,
Gary.
Quote: BOINC detects your
)
In the case of Nvidia cards, BOINC is unable to correctly read or determine the amount of VRAM on card because it is using a 32 bit API from Nvidia. The max reported will always be 4096MB, the most you can enumerate with 32 bits. You can't do anything about that until the BOINC developers correct that problem.
The GPUUG developers figured out where the BOINC code is wrong and developed our own client which reads Nvidia card RAM capacities correctly. We informed the BOINC developers what code changes need to be done to fix the issue. It is just a simple matter of changing what API structure is used to probe the card. The client needs to be updated to incorporate the change.
You can look at my hosts and see that my card RAM capacities are read correctly.
Keith Myers wrote:You can
)
OK, thanks very much for the explanation. I don't have nvidia cards so I didn't realise that BOINC had that problem.
Have the BOINC Devs given any reason why they don't adopt the fix you mention? Seems very strange that they wouldn't immediately adopt something handed to them on a plate.
Cheers,
Gary.
Gary Roberts
)
My most recent testing was a few years ago, but I saw a several percent net RAC gain running 3x GRP tasks on 1070/1080 GPUs under windows. It was much smaller than that from going from 1 to 2 tasks because during normal operations 2 tasks were enough to fully max out the GPU. There was a startup (shutdown?) period in each task that put minimal load on the GPU though, and a single task during its main crunching phase couldn't max the card out which let the 3x task mode give a modest - but larger than that from running another CPU task - gain over 2x.
Gary Roberts wrote:Keith
)
Haha LOL. You obviously have never interacted with the devs or follow the github conversations. They will take their time studying the issue IF they even put the issue to the top of their TODO list. Then they have to pass the code through their validators and code profilers and finally submit the code change for merge consensus.
This bug has existed since the beginning of gpu use and they never considered it worthy of attention. This was the comment from our GPUUG developer.
Keith Myers wrote: The
)
Supposedly one of the upcoming versions will be 64 bit only and stay that way from then one, with the current version still staying around to support 32 bit stuff. They haven't said which one it will be or how long it will take on the email list I get.