Optimising GPU-usage

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 500247903
RAC: 211609
Topic 195281

Hi community,

I started this thread, because some weeks ago a posted a request for help. I found, that my GPU runs more or less in idle when running the E@H cuda app.
I got one answer, that this is not possible due to memory-usage of this app. MSI Afterburner and GPU-Z are excellent tools to find out infos about your GPU.
So the result is: 130MB and 9% for my GPU (GTX260-195). Other users posted different numbers, but also requests for app-changes to achieve a better efficiency, maybe in the form of an app_info.xml.
Speaking for myself, I accept, that there are 'required' apps and 'optional' apps. I do not want to override these reqirements, I simply don't want to run my GPU in idle. Hardware costs a lot of money and energy is not for free. The goal is to run at least two cuda apps at a time or simply do not inform BOINC, that a whole GPU is required. This would give the possibility to run another app simulanously, for example a SETI-wu.

One word to all the guys who do not accept that or have different interests - please use another thread.

In response to a question in the Milkyway-forum I posted an answer about E@H and app_info.xml. A former forum-moderator sent me an PM, placed a thread in the SETI-Forum and did really a lot to help.
http://setiathome.berkeley.edu/forum_thread.php?id=61215
If someone else has the feeling, that an app_info.xml or any equivalent way would be useful or can explain that better (english is not my native language), please post it here.
If someone has a solution, please share it with us!
If a App-developer reads this, please leave a clear statement HOW or WHY NOT it is possible (or not) to do that.
I believe that this is not against the interests of E@H and therefore it is worth to place a statement from a responsible person.

Kind regards,

Alexander

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 500247903
RAC: 211609

Optimising GPU-usage

Hi,

one hour passed since I wrote the thread. Meanwhile a guy from Denmark has posted in the SETI-Forum a solution that might work.
I'm currently running tests; memory usage is a point of interest and possible problems which could arise.

Alexander

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 500247903
RAC: 211609

This is how it should look

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109408054562
RAC: 35264159

Hi Alex, Notionally, I'm

Hi Alex,

Notionally, I'm replying to your first message but I'm really replying to all three and, in addition, to comments elsewhere (often inaccurate and ill-informed) that imply that there is something sinister about the use of the AP (anonymous platform) mechanism. If you are successful in your quest to make more efficient use of your GPU, and if you need to use AP to do that, you should be congratulated for your persistence in finding the improvement in efficiency.

If there are no downsides such as trashing of tasks in the cache or damage to returned results, etc, I would assume the Devs would be more than interested in the details of your experiments. I'm not trying to be a wet blanket but I suspect that if it really were possible to run multiple GPU tasks simultaneously, with an improved performance and no associated problems, the Devs would already have set that up. Ultimately, if you can achieve a performance gain, I would see this progressing to the point where changes serverside facilitate this without having to use AP or edits to client_state.xml. I'm just a volunteer here like everybody else so I'm not speaking in any 'official' capacity. I do correspond with the Devs from time to time to bring issues to their attention so when you feel you have something that really is fully workable, I'll make sure your efforts don't go unnoticed :-).

I gather from recent exchanges you have participated in elsewhere, that there may be an issue with being reset to 1 sometime after it has been edited. I could imagine that each time your client contacts the server and exchanges information, the edited value of this parameter might clash with what the server expects. If you had to keep editing to restore the value, that would make things rather tedious and pretty much impossible to manage for routine crunching.

As I have no CUDA capable GPUs, I have no experience with using the E@H CUDA app. I can't see myself being tempted any time soon into buying an NVIDIA GPU so that inexperience is bound to continue. Those who do have the hardware tend to use it on other projects, particularly if they are at all worried about efficient use of the hardware. This is probably why your previous request for help didn't elicit the sort of responses you were looking for. Hopefully more 'power' users will be attracted if/when the Devs are able to port more of the code to the massively parallel world of the GPU. Perhaps it may not be possible to ever get to that stage. I'm not a programmer so I have no real idea.

Good luck with your experiments, though. Please keep us informed of what happens to the three tasks you show crunching simultaneously. Is there any adverse effect on the performance of the MW task? Does anything adverse happen when a MW task finishes? And what about when an E@H task finishes?

Cheers,
Gary.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6534
Credit: 284730859
RAC: 105773

RE: .... the massively

Message 99214 in response to message 99213

Quote:
.... the massively parallel world of the GPU ....


Yes. It's worth pointing out that GPU's are not general purpose in the sense that CPU's are. They are designed for pipelined graphics behaviour. The success in using GPU's for non-graphics reasons depends, ultimately, on whether the problem space can be viewed in some parallel fashion. For E@H I believe that's the FFT ( Fast Fourier Transform ) part of the job, and so the premium for using GPU's depends on the fraction of the total problem that is FFT. Plus the GPU advantage only significantly appears when the number of threads is in the hundreds to thousands.

I remember a group ( in the late 1980's - ? at MIT ) that was doing very long term solar system planetary orbit predictions. They wound up designing and building their own digital orrery which did such calculations phenomenally well. They did that because it would have taken ages to have general purpose machines achieve the same results for their research. It would be nice to have hardware that targets the problem better and/or software that slots a given problem nicely into the hardware too .....

Plus if you want top speed from GPU's you'd try to maximise the use of memory on the card - while minimising transfers to/from general memory - but such GPU memory has it's own specific rules of use ( which can be traced to it's graphics oriented design ).

In short I don't think the E@H computational problems are as well served by massive parallelism compared with other projects.

Alex, the numbers on the screen image you mention - do they refer to threads or time fractions or memory fractions or what ?

[ I'd thought - please correct me if I'm wrong - that 1.00 CPU meant 1 CPU thread ( which is timesliced into a physical core's activity in the usual way according to the OS's scheduling algorithm ) - so I'm not sure what 0.55 NVIDIA GPU or 0.05 CPU means ]

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

tolafoph
tolafoph
Joined: 14 Sep 07
Posts: 122
Credit: 74659937
RAC: 0

Hi, I changed the

Hi,
I changed the client_state.xml and set the values to 0.22. At first I also had the problems with resetting to 1.00 after the WU has been restarted, but than it worked. My three CUDA WUs started with 1 CPU + 0.22 GPU. The WUs finished and are waiting for validation.
WU1,WU2,WU3

I have a dual core CPU and BOINC runs four WUs with the 1 CPU + 0.22 GPU setting. The CPU usage is ca. 24% for each CPU according to the windows task manager. GPU-Z shows GPU load between 28% and 32%, which doubled from the 14-16 % with only one CUDA WU. 551 MB of memory are used.

but like Gary said

Quote:
I could imagine that each time your client contacts the server and exchanges information, the edited value of this parameter might clash with what the server expects.


the value of was reset after downloading the ne WUs. I had to change it to run the four WUs.
An other problem ist that the GPU usage depends on how fast your CPU ist. Alex wrote that his GTX 260 has a load of 9% and I always had >13%, when I OC my dualcore from 2.66GHz to 3.2 GHz.

Sascha

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 500247903
RAC: 211609

Hi Mike, Hi Gary. right

Message 99216 in response to message 99214

Hi Mike, Hi Gary.

right now I'm trying different setups to get everything working.
At the moment I can test it only with MW as 'User' of the remaining GPU-cycles since SETI is down for project-internal reasons. MW wu's finish correct in a setup 1 Einstein / 1 MW cuda wu, but time out (restart) if running 2 Einstein / 1 MW. At the moment I disabled internet access to avoid remote changes in the settings. In three or four hours I'l reconnect and upload my work done, I keep you informed.

... to comments elsewhere (often inaccurate and ill-informed) that imply ...
This could easyly changed if the responsible persons post accurate infos. If you take a look into Milyway you find that the responsile persons are 'live' there. And look at the performance of that project!

[ I'd thought - please correct me if I'm wrong - that 1.00 CPU meant 1 CPU thread ( which is timesliced into a physical core's activity in the usual way according to the OS's scheduling algorithm ) - so I'm not sure what 0.55 NVIDIA GPU or 0.05 CPU means ]
Using a setting of 1 CPU / 1GPU is pretty the same as having a meeting with two people an booking a room that is suitable for 20 persons.
BOINC (or Windows, I don't really know) has a function to switch tasks even on an GPU. AFAIK the setting of 0.05CPU's mean that only the 'overhead' is done by the cpu. BOINC can assign a task like this in addition to a 'normal' CPU-task.
The screen shows my second system, a 4-core Intel with one GTX260.
I try to get a better explanation for that, please give me some time.

Kind regards,

Alexander

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 500247903
RAC: 211609

RE: Alex wrote that his GTX

Message 99217 in response to message 99215

Quote:
Alex wrote that his GTX 260 has a load of 9% and I always had >13%, when I OC my dualcore from 2.66GHz to 3.2 GHz.

There is a difference in overclocking the CPU and overclocking the GPU.
Many Threads in Milkyway and GPUGRID give information about that.

A GPU-usage of 9%: It depends on the graphic card you use. Some guys posted a usage of only 5%. And running two, three or four Einstein wu's at a time does not really help; the usage is still below 40%. This is why I want to combine it with SETI or Milkyway.
At the moment I have a very good GPU-usage. (MSI Afterburner)

Sorry, until now I have no solution for the reset of the personal settings.

Alexander

tolafoph
tolafoph
Joined: 14 Sep 07
Posts: 122
Credit: 74659937
RAC: 0

Hi Alex, I also have a GTX

Hi Alex,
I also have a GTX 260 but with 216 shaders. The CUDA app uses 1 CPU core and the GPU has to wait until the CPU has finished the calculation. Now I run the CPU at stock 2.66 GHz. and the GPU load is between 24 and 26 %. With the CPU at 3.2 GHz the GPU load was between 28 and 32%. I saw that your GTX 260 is paired with a quadcore with stock clock @ 2.5 GHz. I know that with a GTX 480 the GPU load would be less and with a GT 240 much high. So the problem is that there can´t be a fixed value for GPU usage build in the app.
But I really like your solution for manually changing the client_state.xml to run more than 1 CUDA app on one GPU.

Sascha

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6534
Credit: 284730859
RAC: 105773

RE: Using a setting of 1

Message 99219 in response to message 99216

Quote:
Using a setting of 1 CPU / 1GPU is pretty the same as having a meeting with two people an booking a room that is suitable for 20 persons.
BOINC (or Windows, I don't really know) has a function to switch tasks even on an GPU. AFAIK the setting of 0.05CPU's mean that only the 'overhead' is done by the cpu. BOINC can assign a task like this in addition to a 'normal' CPU-task.


Perhaps I should be more clearer .... what I meant was how were the 0.22, 0.55 and 0.05 figures arrived at, and what does it mean/imply to vary that from 'standard' defaults ??

Cheers, Mike

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 500247903
RAC: 211609

RE: Perhaps I should be

Message 99220 in response to message 99219

Quote:

Perhaps I should be more clearer .... what I meant was how were the 0.22, 0.55 and 0.05 figures arrived at, and what does it mean/imply to vary that from 'standard' defaults ??

Hi Mike,

I cannot tell you how MW came to setting of 0.05CPU's. This is how they deliver their wu's.
But I can tell you, why I've choosen the values 0.22 and 0,55:
On my system one Einstein cuda-wu uses 9% of the GPU (OK, some wu's use up to 12%, but not the full time). This would give the possibility to run 4 task of that kint at a time, assuming to have a CPU like mine (Intel 4-core).
But this makes no sense, the overall GPU-load would still be below 50%. This is why I want to run 2 Einstein wu's and fill the rest of the free GPU-cycles with another app like MW or SETI.
What I've tested: running one single SETI cuda-wu with the setting of 0.55GPU's results in a GPU-usage of ~90%. So no performance decrease is evident.
Starting a second cuda-task with a setting of 0.22 cuts off some cycles from SETI. The runtime for the SETI-wu increases, but the second task runs as fast as it would run alone.
So the setting of 55% for SETI or MW assures, that only one task of that kind will be loaded, leaving the option to load one or two Einstein CUDA-Tasks. If none are available - no problem, MW or SETI runs at full speed.
Setting the limit to 0.22 + 0.55 always leaves two cores free for non-cuda apps like GC's or what ever.
As I've understood, a setting of 55% allows a task a maximum of 55% usage, IF another tasks is also running.
For me the only remaining problem is the reset of my personal setting. Let's see, what comes up.

Regards,

Alexander

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.