Letting cuda WUs use a full thread/core

Dirk
Dirk
Joined: 4 Jun 08
Posts: 35
Credit: 88,264,743
RAC: 0

RE: Then do what was told

Quote:

Then do what was told earlier in this thread and tell BOINC to use all but one CPU. With the one less CPU doing science, that one CPU will be able to feed the GPU exclusively. One CPU is all you need, you do not need as many CPUs as you have GPUs, you do not need to use as many CPUs as you have tasks to feed to the one GPU that you have.

By telling BOINC to use all but one CPU, the one free CPU will automatically be used by intensive programs such as the GPU science programs as they run at slightly higher than low priority, to keep the GPU fed.

Yes, it's very much possible that if you let your anti-virus check the system constantly, that this CPU will be used for that as well. Or that other very CPU intensive programs running on your system find it a good time to start using this CPU, such as the multiple Windows indexing programs. But that's for you to figure out, before you can do a complete run as you'd want.

Do know that you're using both BOINC and the GPU in a way that neither was intended to be used in, or programmed to be used as such. Any weird artifacts are very probably due to quirks in your own system or because of your strange use of the system.

Man... I'm sorry but I'm starting to get annoyed right now. It's almost like you didn't read any of my posts. I'm not going to explain again.

FrankHagen
FrankHagen
Joined: 13 Feb 08
Posts: 102
Credit: 250,075
RAC: 0

RE: Man... I'm sorry but

Quote:
Man... I'm sorry but I'm starting to get annoyed right now. It's almost like you didn't read any of my posts. I'm not going to explain again.

sorry if we try to help you..

you got a HT-CPU there - so in fact you only got 4 real cores. and you want to feed more than one GPU-task. go figure yourself..

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2,557,091
RAC: 0

RE: Man... I'm sorry but

Quote:
Man... I'm sorry but I'm starting to get annoyed right now. It's almost like you didn't read any of my posts. I'm not going to explain again.

If you want to increase GPU usage then get hold of the source code and rewrite the app to use more GPU, if you optimise the CPU portions so less time is used by the CPU, ie so that portion runs quicker, the GPU will be fed more often and GPU usage will go up,

or Overclock your CPU to the Max, so less time will be spent on the CPU portions of the code, this will lead to increased GPU load as the GPU will be fed more often,

it's just how the app code is written, if it isn't coded to use a full core, then it won't, you'll just have to tweak your settings to try and maximise what you're got already, as eveyone else has been trying to tell you, or wait for the next app.

Claggy

There is a way to probably make it use a full core, but since it'll bring your system to it's knees, and make my name Mud, i won't post it......

Jord
Joined: 26 Jan 05
Posts: 2,952
Credit: 5,766,850
RAC: 352

I read all your posts, and

I read all your posts, and thought I'd chime in to explain to you how the stuff works, so that for you a light went up telling you that what you want isn't going to happen. Ever.
The problem is that the science software isn't written to do what you want. It's written in such a way that it will use all the shader processors inside your GPU at the same time.

See, you seem to think that all the tasks run at the same time on your GPU. They don't. They run in sequence.
The software is written to use all the shader processors of the GPU that is detected by BOINC, and to use all of them at the same time. So even if you run 3 tasks on the GPU, they don't run at the same time. They will not divide the amount of shader processors amongst themselves and work in parallel. Because the science software isn't written in that way and thus it won't be doing things that way.

So these tasks run in sequence, just not in normal sequence where they wait for one task to end before the next is run. No, it'll be more like this:
- task 1: 5% of the task is converted by CPU to GPU kernels that the GPU can read, the CPU will use 5% of its load to make this conversion and transport the kernels to the GPU, it'll then go in waiting mode. The GPU will run the kernels on all its shader processors, then ask the CPU to come pick up the results, which the CPU will do, before reconverting it to data the humans can read and writing that to disk. Then the CPU will point its attention to task 2.
- task 2: 5% of the task is converted by CPU to GPU kernels that the GPU can read, the CPU will use 5% of its load to make this conversion and transport the kernels to the GPU, it'll then go in waiting mode. The GPU will run the kernels on all its shader processors, then ask the CPU to come pick up the results, which the CPU will do, before reconverting it to data the humans can read and writing that to disk. Then the CPU will point its attention to task 3.
- task 3: 5% of the task is converted by CPU to GPU kernels that the GPU can read, the CPU will use 5% of its load to make this conversion and transport the kernels to the GPU, it'll then go in waiting mode. The GPU will run the kernels on all its shader processors, then ask the CPU to come pick up the results, which the CPU will do, before reconverting it to data the humans can read and writing that to disk. Then the CPU will point its attention back to task 1.
- task 1: A further 5% of the task is converted by CPU to kernels the GPU will understand.... etc.

This goes in a loop until one of the tasks gets to 100%, after which it'll be written to disk and ended as a task that both the CPU & GPU have eye for. The science application is ended. A new science application running task 4 is started, its first conversion is 5% of the tasks into kernels the GPU will understand. That it appears as if all three tasks run at the same time on the GPU is because it runs so damn fast. That's the speed of parallel processing on many shader processors. And yet that's all it is, an appearance...

If you understand this loop-de-loop, then you will also understand why the load on the CPU is only 5%. This is because at any time, only one science application is used by the CPU. Not 2, not 3, not 50. Just one. So even running 50 tasks at the same time, if that were possible, will only give the CPU a load of 5%.

What about the GPU load factor then? The GPU load isn't to be converted that when it shows 100% that then all your shader processors are being used. Or conversely, that when it doesn't show 100% that only so many of your shader processors are being used. All of the shader processors are used whenever there are kernels to be processed. But at the times that the CPU is unloading data that has just been crunched and loading new data onto the GPU, the shader processors are waiting. This isn't a constant load process that they're under. You only reach 100% GPU load when there is software constantly pumping data into the shader processors, which will happen with heavy 3D (OpenGL) gaming, where each pixel's colour and sharpness needs to be recalculated many a times per second. Einstein's science application just doesn't do that.

Dirk
Dirk
Joined: 4 Jun 08
Posts: 35
Credit: 88,264,743
RAC: 0

Ok maybe it's the heat but

Ok maybe it's the heat but from what I understand you are saying it only runs one app at a time, meaning it should only be using 5% for one cuda task. But, it does in fact use 5% for all 4 running tasks at the same time, giving a total CPU load of 20%.

I also realize that the apps aren't written to my desires but here's the thing. While crunching other CPU tasks the cuda apps sometimes have to wait for CPU cycles, this is clearly visible in not just GPU usage but also completion times. Now, adjusting the priority of the cuda apps does help a bit but it's still being slowed down, I still think it would help avoid such slowdowns if the apps asked for a full thread or core. I know they won't fully use all those CPU cycles but if the CPU is sitting there waiting for the GPU there should be less chance of other processes causing delays.

@Frank

Quote:
Quote:
Man... I'm sorry but I'm starting to get annoyed right now. It's almost like you didn't read any of my posts. I'm not going to explain again.

sorry if we try to help you..

you got a HT-CPU there - so in fact you only got 4 real cores. and you want to feed more than one GPU-task. go figure yourself..

Of course I know I only got 4 physical cores but it can run 4 GPU tasks + 4 CPU tasks perfectly fine. Right now with some fiddling around my cuda tasks complete in about 92 mins while I am also running 4 CPU tasks. Total CPU load is 71%.

The reason I got annoyed with that post I responded to was first of all the "do as you're told" part. Secondly, the thing he wants me to do is to configure boinc to keep a thread free even though I explained several times that I only let boinc run 4 CPU tasks along with my 4 cuda ones (right now I do this by telling boinc each cuda task will take 1.00 cpu in the app_info).

I'd just like this to be an interesting discussion so I'm sorry for that post, I shouldn't have posted that the way I did.

Happy crunching people!

Jord
Joined: 26 Jan 05
Posts: 2,952
Credit: 5,766,850
RAC: 352

Oh heck, it's not worth it. I

Oh heck, it's not worth it. I was trying to teach you how BOINC does things, you want to do things your own way. Just ignore everything I said and say in this thread. I was wrong to try to teach you how BOINC works, to suggest that BOINC & the project's science apps need to be able to run on a wide variety of systems without interference. There's no gain to be had from this. It'll only end in a fight.

Dirk
Dirk
Joined: 4 Jun 08
Posts: 35
Credit: 88,264,743
RAC: 0

RE: RE: Ok maybe it's the

Quote:
Quote:
Ok maybe it's the heat but from what I understand you are saying it only runs one app at a time, meaning it should only be using 5% for one cuda task. But, it does in fact use 5% for all 4 running tasks at the same time, giving a total CPU load of 20%.

The 5% I gave was an example. But other than that, it's correct. I also said that the process of transporting to & from the videocard + translating into kernels and back is a fast process. Even the percentage of the tasks that gets translated at the same time is an example, a guess. I never checked in what swats the app translates the tasks into kernels that the GPU can understand.

Quote:
I still think it would help avoid such slowdowns if the apps asked for a full thread or core. I know they won't fully use all those CPU cycles but if the CPU is sitting there waiting for the GPU there should be less chance of other processes causing delays.

Which is exactly what the other helpful people in this thread have been trying to tell you to do: Tell BOINC to use 1 CPU less. Set "On multiprocessors, use at most 87.5% of the processors" and BOINC will only use 7 CPU cores (real and virtual) leaving one CPU core free for lots of other things, including the OS and those GPU apps.

Yes, it sounds counter-intuitive to tell BOINC to use one less CPU so that CPU can be used by the GPU apps running under BOINC, but it works. One CPU core is all you need to run all those GPU apps, even if you had 4 GPUs it would be enough. Reserving 4 CPUs for this task is overkill. It won't speed up anything, but rather slow you down, as you do less science in the same time as 3 cores will be doing nothing. It will save you on electricity, even though there's enough of that in the air as I type this... ;-)

But if you want to continue using 4 cores per GPU app, be my guest. I can just tell you that this will never going to be a feature around here. Just as running multiple tasks on the same GPU isn't something that neither software was written for. You're not running multiple tasks at the same time per CPU either, are you?

Well I'm sorry but you're wrong, just freeing one thread slows the cuda tasks down way, way too much. Btw, my GPU does a ton of science this way. Just look at the completion times of BRPcuda compared to the standard BRP tasks running on the CPU. My GPU runs 4 of those at a time completing them (if I let it crunch without me using the PC) in about 92 minutes. This means my GPU could complete about 62 BRPcuda tasks in 24 hours. Of course I do use my PC so it can't quite reach that numer but still.

edit to add an example

Here's an example of one of my BRPcuda's compared to a CPU BRP wu.
http://einsteinathome.org/workunit/99728713

That CPU wu was run by a i7-2600 and it still took 12.5 hours hours to complete it.

Dirk
Dirk
Joined: 4 Jun 08
Posts: 35
Credit: 88,264,743
RAC: 0

RE: Oh heck, it's not worth

Quote:
Oh heck, it's not worth it. I was trying to teach you how BOINC does things, you want to do things your own way. I give up.

Well I'm sorry but if I were to do it your way my tasks would take way too long. I'll repost my GPU usage graph, it's a pretty good indicator of completion times. If I were to only free 1 core the GPU is starved badly. Just look at the area of the graph where it is running 7 CPU tasks leaving just 1 thread free to feed the GPU.

Perhaps I'll make a new graph soon where I let it run a bit longer in the beginning.

FrankHagen
FrankHagen
Joined: 13 Feb 08
Posts: 102
Credit: 250,075
RAC: 0

RE: My GPU runs 4 of those

Quote:
My GPU runs 4 of those at a time completing them (if I let it crunch without me using the PC) in about 92 minutes. This means my GPU could complete about 62 BRPcuda tasks in 24 hours. Of course I do use my PC so it can't quite reach that numer but still.

well - this looks like it works exactly for your configuration. the i7-2600 seems to be fast enough to do this. things might even change if you run other apps as CPU-tasks. HT-machines usually improve if you mix 2 or 3 apps with different demands.

anyway it would be interesting to see what happens if you disable HT via bios..

mikey
mikey
Joined: 22 Jan 05
Posts: 7,341
Credit: 614,925,434
RAC: 11,355

RE: Well I'm sorry but

Quote:
Well I'm sorry but you're wrong

Dirk have you signed up to the Boinc Developers Mailing List yet? That is maintained by Dr. David Anderson of Seti and the original writer and maintainer of the Boinc software. Perhaps you and he, and his team, can get into this discussion. I am sure someone can give you a link to it if you don't already have it.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.