Interesting. So I need to understand this, when you run the CPU at .5 that means you're allowing 50% utilization, correct? And when you set the GPU utilization factor of BRP apps at .5, it's doing the same thing right? I'm pretty sure I have the settings set to 100% as I'm at work most of the day so I just let it run as fast as it wants.
No, the setting for the GPU is to tell Boinc how much of the GPU will be utilized by one task, when you set it to 0.5 you tell Boinc that the task will us half of the GPU and thus Boinc will start 2 tasks to fully utilize the GPU. And with that follows the possibility to set 0.33 for 3 task and 0.25 for 4 task and so on...
The setting exist because often one GPU-task can't make full use of the GPUs power and running 2 parallel is often more efficient.
Remember to watch the memory usage when increasing the number of parallel task so you don't run out of video-RAM. GPU-z is a good program for checking on the status of a GPU. When GPU-z shows >90% GPU-load then the card is fully utilized and adding more parallel tasks won't gain anything further.
And to answer your first question from the initial post:
SLI won't help at all. But adding another GPU will definitely help, if it's another 670 you should expect about double the performance form only running one. That is if you CPU and other system components is up to the task of feeding 2 670s.
Finally, thank you, that makes perfect sense. I was looking at the numbers completely backwards.
Could someone have the answer and explain or give a clue how to configurate?, why the GTX690 is so ineficient in E@H? I have few 690/590 and runs 2 WU (4 on each GPU - CPU Usage >85%) and comparing for an 2x560 host the diference is incredible against the 690. The same not happens on SETI, GPUGrid or Collantz where a 690 is equivalent to almost 2x580 (i have few too) in processing power.
To obtain a little gain i was forced to divide the 2x690host in 2 690+670 diferent hosts.
The Einstein BRP4 CUDA application uses significant PCI-E bandwidth and is a hybrid type application that does processing on both the CPU and GPU. The GTX 690 is equipped with a PLX PEX 8747 which is a 48-lane PCI-E 3.0 switch. The PEX provides 16 PCI-E 3.0 lanes to each of the two GPUs. This will give the GPUs ample bandwidth to the PEX but the pipe between the PEX and CPU or NB PCI-E controller is also important. Since the two GPUs have to share the same pipe to the CPU or NB PCI-E controller, ideally you would install the cards in dedicated slots of at least PCI-E 2.0 x16. PCI-E 3.0 x16 would likely be the most optimal solution and will give roughly double the bandwidth of a PCI-E 2.0 x16 slot.
If I were to setup two GTX 690s for Einstein@Home, I would go with an x79 board and Intel 3820 processor. This particular CPU has 40 PCI-E 3.0 lanes available so that each 690 can have its own PCI-E 3.0 x16 slot.
I´m not sure i understand all, but my MB have PCI-e 3.0 slots (x16 if i not wrong and 2x8 if in a double set up) so i don´t spect that kind of low performance.
Could someone have the answer and explain or give a clue how to configurate?, why the GTX690 is so ineficient in E@H? I have few 690/590 and runs 2 WU (4 on each GPU - CPU Usage >85%) and comparing for an 2x560 host the diference is incredible against the 690. The same not happens on SETI, GPUGrid or Collantz where a 690 is equivalent to almost 2x580 (i have few too) in processing power.
To obtain a little gain i was forced to divide the 2x690host in 2 690+670 diferent hosts.
The Einstein BRP4 CUDA application uses significant PCI-E bandwidth and is a hybrid type application that does processing on both the CPU and GPU. The GTX 690 is equipped with a PLX PEX 8747 which is a 48-lane PCI-E 3.0 switch. The PEX provides 16 PCI-E 3.0 lanes to each of the two GPUs. This will give the GPUs ample bandwidth to the PEX but the pipe between the PEX and CPU or NB PCI-E controller is also important. Since the two GPUs have to share the same pipe to the CPU or NB PCI-E controller, ideally you would install the cards in dedicated slots of at least PCI-E 2.0 x16. PCI-E 3.0 x16 would likely be the most optimal solution and will give roughly double the bandwidth of a PCI-E 2.0 x16 slot.
If I were to setup two GTX 690s for Einstein@Home, I would go with an x79 board and Intel 3820 processor. This particular CPU has 40 PCI-E 3.0 lanes available so that each 690 can have its own PCI-E 3.0 x16 slot.
I´m not sure i understand all, but my MB have PCI-e 3.0 slots (x16 if i not wrong and 2x8 if in a double set up) so i don´t spect that kind of low performance.
Your 690 is really 2 680's on a single card, so GPU each is only getting the equivalent of an x8 slot. Most of the time this doesn't matter because an 8x 3.0 slot (or even an x8 2.0 slot) gives a lot more bandwidth than is needed. The E@H app is one of the rare exceptions (because it swaps a lot of data back and forth between CPU and GPU parts of the app); and only having 8 lanes/GPU does hurt it.
Could someone have the answer and explain or give a clue how to configurate?, why the GTX690 is so ineficient in E@H? I have few 690/590 and runs 2 WU (4 on each GPU - CPU Usage >85%) and comparing for an 2x560 host the diference is incredible against the 690. The same not happens on SETI, GPUGrid or Collantz where a 690 is equivalent to almost 2x580 (i have few too) in processing power.
To obtain a little gain i was forced to divide the 2x690host in 2 690+670 diferent hosts.
The Einstein BRP4 CUDA application uses significant PCI-E bandwidth and is a hybrid type application that does processing on both the CPU and GPU. The GTX 690 is equipped with a PLX PEX 8747 which is a 48-lane PCI-E 3.0 switch. The PEX provides 16 PCI-E 3.0 lanes to each of the two GPUs. This will give the GPUs ample bandwidth to the PEX but the pipe between the PEX and CPU or NB PCI-E controller is also important. Since the two GPUs have to share the same pipe to the CPU or NB PCI-E controller, ideally you would install the cards in dedicated slots of at least PCI-E 2.0 x16. PCI-E 3.0 x16 would likely be the most optimal solution and will give roughly double the bandwidth of a PCI-E 2.0 x16 slot.
If I were to setup two GTX 690s for Einstein@Home, I would go with an x79 board and Intel 3820 processor. This particular CPU has 40 PCI-E 3.0 lanes available so that each 690 can have its own PCI-E 3.0 x16 slot.
I´m not sure i understand all, but my MB have PCI-e 3.0 slots (x16 if i not wrong and 2x8 if in a double set up) so i don´t spect that kind of low performance.
Your 690 is really 2 680's on a single card, so GPU each is only getting the equivalent of an x8 slot. Most of the time this doesn't matter because an 8x 3.0 slot (or even an x8 2.0 slot) gives a lot more bandwidth than is needed. The E@H app is one of the rare exceptions (because it swaps a lot of data back and forth between CPU and GPU parts of the app); and only having 8 lanes/GPU does hurt it.
Thanks, so the conclusion is the 690 is not good for E@H, that is bad...
What call my atention is few 690 hosts who have a very good performance(go to top hosts and you could easely see that) so there is at lest someone (few) who knows how to bypass the problem or at least make it smaller.
I have 3x690 + 590 running at 85% of gpu usage and and entire CPU free to feed each one of them.
Or maybe what E@H needs is a new optimzed apps for the new generation of 2xGPU´s like the one SETI/Collantz have.
If you look at the cpus of the top hosts, than you will notice this are all i7 3820 or above. This cpus are for the Intel X79 boards and have sufficient bandwidth/ lanes (40 PCIe 3.0 lanes as Jeroen mentioned). Your cpus, the i5/i7 for Intel P67, have only 16 PCIe 2.0 Lanes.
Plug two of your 690 in a x79 board and every 690 is connected with 16 PCIe 3.0 lanes to the cpu.
Two 690 in a p67 board get only 8 PCIe 2.0 lanes per gpu. So in the end your gpus only have 1/4 of the bandwidth compared to the X79 setup.
And einstein loves bandwidth ;)
The E@H app is one of the rare exceptions (because it swaps a lot of data back and forth between CPU and GPU parts of the app); and only having 8 lanes/GPU does hurt it.
I also saw a performance drop in my new GTX580 when I installed my old GTX260 in the second PCI-E slot. She first slot changed from a 2.0 x16 to x8 and the runtime increased about 10 to 20%. With the additional energy needed to run the 2nd card it wasn't worth it. In games I saw no change in the FPS with the GTX580 in the x8 slot and the GTX260 as a Phyx-card.
If you look at the cpus of the top hosts, than you will notice this are all i7 3820 or above. This cpus are for the Intel X79 boards and have sufficient bandwidth/ lanes (40 PCIe 3.0 lanes as Jeroen mentioned). Your cpus, the i5/i7 for Intel P67, have only 16 PCIe 2.0 Lanes.
Plug two of your 690 in a x79 board and every 690 is connected with 16 PCIe 3.0 lanes to the cpu.
Two 690 in a p67 board get only 8 PCIe 2.0 lanes per gpu. So in the end your gpus only have 1/4 of the bandwidth compared to the X79 setup.
And einstein loves bandwidth ;)
Now i finaly understand, but that is bad, i can´t change the MB (the X79 MB are to expensive here - i hate tax hungry countries) so my only option is switch the 590/690 Gpus back to Seti and live with their WU limitation and DL problems (GPUGrid literaly burn these gpus ussing to much power from them - something dangerous in our hot tropical enviroment). By doing that they will work better and i could realy squizze more of it´s power, so bad i like to contribute to E@H, the people here allways wellcome well and plenity of work is allways avaiable but against hardware limitation i can´t do nothing (and we are changing even our 580 for the new 690 due the high electric bill and lack of PCI-e slots we have here).
Anyway thanks for helping me to finaly understand what is realy happens i was thinking i was made something wrong in the config. I understand what you say, but my MB are PCI-e 3.0 (at least from manufacter site specs) so i don´t expect 1/4 of the BW avaiable, at least 1/2 PCI-e 3.0 not to much but not totaly a mess.
Something still bugs my mind, why a 2x560 host (not TI) is faster than a 2x670 host if they ussing the same MB/CPU? I have no more a 2x560 host because i´m in the process to change the GPUs for the new keplers but that happens with me when this hosts (now with 2x670) have 2x560 about 2 weeks ago. Seems like the limitation is bigger on the kepplers GPU´s than the old Fermi models.
Thanks again for your help and time. I realy belive the admins could build a guide about that to make us choose the best right MB/GPU for our host, or at least, know the limitation of our equipment, that will help a lot to avoid waste time trying to fix something "unfixable".
By all accounts Kepler is aimed at best results in games, and has resulted in little - if any - advance over Fermi for computing.
Here's a thread that discusses the issues from a compute-oriented point of view. The top-rated answer describes how on Fermi the cores run at 2x the speed of the logic, whereas on Kepler they run at the same speed (as well as some other factors).
By all accounts Kepler is aimed at best results in games, and has resulted in little - if any - advance over Fermi for computing.
Here's a thread that discusses the issues from a compute-oriented point of view. The top-rated answer describes how on Fermi the cores run at 2x the speed of the logic, whereas on Kepler they run at the same speed (as well as some other factors).
Very, very intersting reading, thanks. I´m realy thinking to change some of my MB for the x79 models, but the price here of this boards "scares" (the cheap one i get is about US$ 600 just for the MB), need some time to get the autorizations. Another option is slit the 690 thru diferent hosts so it will have at least a PCI-e 3.0 16x for each 690. (changing a 2x690 and a 2x670 host for 2 x (690 + 670) hosts) but to do that will require some internal negociations.
I belive something else is happening, the keppler code is high diferent from fermis code, don´t know if you are familiar with the optimized SETI aps build by Jason.
Until the arrival of the keppler version, the kepplers GPUs was slow compared against the fermis, but now, after he delivers a keppler code oriented code, the equation changes, now the kepplers (specialy the 690) is faster than the equivalent 580/590 with a lot less power need. An increase of the output of more than 20% happens just for the first set of "optimizations" on the code.
So is not just the PCI-e lanes (that is crystal clear now for me), as the new generations of kepplers and his predecessors arrives they need a development on the code to make it more friendly to kepplers.
Now i finaly understand, but that is bad, i can´t change the MB (the X79 MB are to expensive here - i hate tax hungry countries) so my only option is switch the 590/690 Gpus back to Seti ...
Actually you have more options - for example,just spread your 690/670 between relatively slow computers. Leave one video card for one motherboard - so you will have full PCI-e speed. Of course,it will increase the size of your Zoo but you can use cheap old fashioned MB instead of expensive modern MB. I'm sure you have some in your mothballs.
RE: RE: Interesting. So I
)
Finally, thank you, that makes perfect sense. I was looking at the numbers completely backwards.
RE: RE: Could someone
)
I´m not sure i understand all, but my MB have PCI-e 3.0 slots (x16 if i not wrong and 2x8 if in a double set up) so i don´t spect that kind of low performance.
RE: RE: RE: Could
)
Your 690 is really 2 680's on a single card, so GPU each is only getting the equivalent of an x8 slot. Most of the time this doesn't matter because an 8x 3.0 slot (or even an x8 2.0 slot) gives a lot more bandwidth than is needed. The E@H app is one of the rare exceptions (because it swaps a lot of data back and forth between CPU and GPU parts of the app); and only having 8 lanes/GPU does hurt it.
RE: RE: RE: RE: Could
)
Thanks, so the conclusion is the 690 is not good for E@H, that is bad...
What call my atention is few 690 hosts who have a very good performance(go to top hosts and you could easely see that) so there is at lest someone (few) who knows how to bypass the problem or at least make it smaller.
I have 3x690 + 590 running at 85% of gpu usage and and entire CPU free to feed each one of them.
Or maybe what E@H needs is a new optimzed apps for the new generation of 2xGPU´s like the one SETI/Collantz have.
If you look at the cpus of
)
If you look at the cpus of the top hosts, than you will notice this are all i7 3820 or above. This cpus are for the Intel X79 boards and have sufficient bandwidth/ lanes (40 PCIe 3.0 lanes as Jeroen mentioned). Your cpus, the i5/i7 for Intel P67, have only 16 PCIe 2.0 Lanes.
Plug two of your 690 in a x79 board and every 690 is connected with 16 PCIe 3.0 lanes to the cpu.
Two 690 in a p67 board get only 8 PCIe 2.0 lanes per gpu. So in the end your gpus only have 1/4 of the bandwidth compared to the X79 setup.
And einstein loves bandwidth ;)
RE: The E@H app is one of
)
I also saw a performance drop in my new GTX580 when I installed my old GTX260 in the second PCI-E slot. She first slot changed from a 2.0 x16 to x8 and the runtime increased about 10 to 20%. With the additional energy needed to run the 2nd card it wasn't worth it. In games I saw no change in the FPS with the GTX580 in the x8 slot and the GTX260 as a Phyx-card.
RE: If you look at the cpus
)
Now i finaly understand, but that is bad, i can´t change the MB (the X79 MB are to expensive here - i hate tax hungry countries) so my only option is switch the 590/690 Gpus back to Seti and live with their WU limitation and DL problems (GPUGrid literaly burn these gpus ussing to much power from them - something dangerous in our hot tropical enviroment). By doing that they will work better and i could realy squizze more of it´s power, so bad i like to contribute to E@H, the people here allways wellcome well and plenity of work is allways avaiable but against hardware limitation i can´t do nothing (and we are changing even our 580 for the new 690 due the high electric bill and lack of PCI-e slots we have here).
Anyway thanks for helping me to finaly understand what is realy happens i was thinking i was made something wrong in the config. I understand what you say, but my MB are PCI-e 3.0 (at least from manufacter site specs) so i don´t expect 1/4 of the BW avaiable, at least 1/2 PCI-e 3.0 not to much but not totaly a mess.
Something still bugs my mind, why a 2x560 host (not TI) is faster than a 2x670 host if they ussing the same MB/CPU? I have no more a 2x560 host because i´m in the process to change the GPUs for the new keplers but that happens with me when this hosts (now with 2x670) have 2x560 about 2 weeks ago. Seems like the limitation is bigger on the kepplers GPU´s than the old Fermi models.
Thanks again for your help and time. I realy belive the admins could build a guide about that to make us choose the best right MB/GPU for our host, or at least, know the limitation of our equipment, that will help a lot to avoid waste time trying to fix something "unfixable".
By all accounts Kepler is
)
By all accounts Kepler is aimed at best results in games, and has resulted in little - if any - advance over Fermi for computing.
Here's a thread that discusses the issues from a compute-oriented point of view. The top-rated answer describes how on Fermi the cores run at 2x the speed of the logic, whereas on Kepler they run at the same speed (as well as some other factors).
RE: By all accounts Kepler
)
Very, very intersting reading, thanks. I´m realy thinking to change some of my MB for the x79 models, but the price here of this boards "scares" (the cheap one i get is about US$ 600 just for the MB), need some time to get the autorizations. Another option is slit the 690 thru diferent hosts so it will have at least a PCI-e 3.0 16x for each 690. (changing a 2x690 and a 2x670 host for 2 x (690 + 670) hosts) but to do that will require some internal negociations.
I belive something else is happening, the keppler code is high diferent from fermis code, don´t know if you are familiar with the optimized SETI aps build by Jason.
Until the arrival of the keppler version, the kepplers GPUs was slow compared against the fermis, but now, after he delivers a keppler code oriented code, the equation changes, now the kepplers (specialy the 690) is faster than the equivalent 580/590 with a lot less power need. An increase of the output of more than 20% happens just for the first set of "optimizations" on the code.
So is not just the PCI-e lanes (that is crystal clear now for me), as the new generations of kepplers and his predecessors arrives they need a development on the code to make it more friendly to kepplers.
RE: Now i finaly
)
Actually you have more options - for example,just spread your 690/670 between relatively slow computers. Leave one video card for one motherboard - so you will have full PCI-e speed. Of course,it will increase the size of your Zoo but you can use cheap old fashioned MB instead of expensive modern MB. I'm sure you have some in your mothballs.