Thought that was going to happen. Saw the 295.33 were the release drivers. Same as how 300.83 worked for PCIe 3 on x79.
Wish you the best of luck.
NVIDIA hasn't posted about how their "certification" was going yet on their forums. So, it would appear we will be in the dark for quite some time. Next driver release isn't scheduled til end of June. NVIDIA released anyways. Sometimes "leaked" ones pop up on Guru3D. Don't usually see too many Linux ones though.
I find this PCIe limitation with NVIDIA's latest drivers to be frustrating considering that PCIe 3 works just fine on the x79 board. There is a considerable performance drop with the slots locked at PCIe 2 via the new Linux driver.
On the upside, at least the source code is available for the Linux driver. I can do a diff between the two versions to see what all has changed and hopefully fix this limitation.
I am hoping the certification will come back positive for x79.
I ran into something a bit weird. I have been looking at adding a 3rd card to my 680 rig for this project. At first I added in a 580 in the middle x8 slot to check temperatures and power draw. As soon as I installed this 3rd card, the execution time on the two 680s in the x16 slots went from 1945 seconds to 1680 seconds while running two tasks at once. It only happens when I have the 580 installed in the 3rd x8 slot and regardless of whether or not the 580 is actually crunching. The 580 shows as being configured as x8 5.0 GT/s as expected. When I install a 3rd 680 in place of the 580, the execution time on the other two cards is back to the normal 1945 seconds and the 680 shows up as x8 8.0 GT/s as expected. I have not been able to explain why there is a performance improvement on the other two cards with the 580 installed.
I ran into something a bit weird. I have been looking at adding a 3rd card to my 680 rig for this project. At first I added in a 580 in the middle x8 slot to check temperatures and power draw. As soon as I installed this 3rd card, the execution time on the two 680s in the x16 slots went from 1945 seconds to 1680 seconds while running two tasks at once. It only happens when I have the 580 installed in the 3rd x8 slot and regardless of whether or not the 580 is actually crunching. The 580 shows as being configured as x8 5.0 GT/s as expected. When I install a 3rd 680 in place of the 580, the execution time on the other two cards is back to the normal 1945 seconds and the 680 shows up as x8 8.0 GT/s as expected. I have not been able to explain why there is a performance improvement on the other two cards with the 580 installed.
First of all, thanks for your support for the project. We appreciate every contribution, big or small of course, but running 3 GPUs in a single host is definitely showing a lot of support for the project.
As for your "problem", this is really interesting. Please let us know anything you find'll find out about this phenomenon, this might even be interesting for NVIDIA.
How does the configuration of your 680s look like when you have plugged in the 580? The first and second PCIe x8 slot should be the same logic lane and together with the first PCIe x16 are coming from the cpu through a multiplexer. The second PCIe x16 is directly connected to the cpu. (according to anandtech)
Putting the 580 in the third PCIe x8 slot should activate the split of the second PCIe x16 (?). I´m not quite sure if mixing PCIe 2 and 3 will result in only PCIe 2. The main difference between PCIe2 and 3 is the encoding scheme, with a much higher overhead for PCIe2.
edit: Your speed increased by 16% which is typical for doubling the bandwidth as we have seen earlier. So maybe there are still some problems with the PCIe3 implementation?
Just my 2ct
First of all, thanks for your support for the project. We appreciate every contribution, big or small of course, but running 3 GPUs in a single host is definitely showing a lot of support for the project.
You're welcome and thanks for the message. I very much enjoy the science of this project and I am more than glad to contribute however I can. I should have the 3rd GPU up and running in the coming days but wanted to do some more testing first.
Quote:
As for your "problem", this is really interesting. Please let us know anything you find'll find out about this phenomenon, this might even be interesting for NVIDIA.
Cheers
HB
Quote:
How does the configuration of your 680s look like when you have plugged in the 580? The first and second PCIe x8 slot should be the same logic lane and together with the first PCIe x16 are coming from the cpu through a multiplexer. The second PCIe x16 is directly connected to the cpu. (according to anandtech)
Putting the 580 in the third PCIe x8 slot should activate the split of the second PCIe x16 (?). I´m not quite sure if mixing PCIe 2 and 3 will result in only PCIe 2. The main difference between PCIe2 and 3 is the encoding scheme, with a much higher overhead for PCIe2.
I have done some more testing and have found out that only when I have a PCI-E 2.0 card installed in the x8 slot does the processing time via the two 680s reduce from ~1945 to ~1680 seconds. I tested this with both a 580 and 8800GT. When I install a 8600GT PCI-E 1.1 card in that slot, the processing time is 1945 seconds. I also tried running the extra 680 with the slot forced to PCI-E 2.0 in BIOS and the processing time was ~1945 seconds on the other two cards. It seems that only native PCI-E 2.0 cards have a positive performance effect over the other two cards.
Looking in my Asus manual, the slot configuration is setup like this:
Slot 1 - PCI-E 3.0 x16_1 -> 680 installed and set at x16 8.0 GT/s
Slot 2 - PCI-E 3.0 x8_2A -> PCI-E 2.0 card installed here and set at x8 5.0 GT/s
Slot 3 - PCI-E 3.0 x8_2B
Slot 4 - PCI-E 3.0 x16/8_3 -> 680 installed and set at x16 8.0 GT/s
Slot 5 - PCI-E 2.0 x1_1 -> PCH
Slot 6 - PCI-E 3.0 x8_4 -> Slot 4 switches to x8 if this slot is occupied
Here are some additonal performance numbers with the 680 via different slot configurations:
This could be an interesting test for anyone else running an Asus x79 board or perhaps any x79 board supporting x16/x8/x16. I would be curious to see if this performance change is something specific to Linux and the NVIDIA driver or to the hardware.
As a followup to my previous posts, I have tested a similar PCI-E setup via my EVGA x79 FTW board. I am seeing the same performance gains with this board by having a GTX 680 installed in slot 1 set at x16 8 GT/s and a PCI-E 2.0 card installed in slot 3 x4 or slot 4 x16. This performance gain does not appear to be specific to the Asus board. In fact, I am seeing even better results with the EVGA board even though I have the CPU OC set less than what is setup on the Asus board.
2-tasks: 1520 seconds per task
3-tasks: 2097 seconds per task averaged
In the case of running 3 tasks at once, that comes out to nearly 124 tasks per day out of a single GPU! I have not had time to test 4-tasks yet.
Unfortunately, the latest drivers up to 304.22 in Linux continue to be capped at 5 GT/s. I have a support ticket open with NVIDIA on this for several weeks now but there is no resolution yet. Driver 295.33 continues to be the only option if you want to take full advantage of Kepler in Linux.
I wanted to pass this information along so that other x79 / Kepler users can get maximum production out of their hardware with this project. Performance-wise, Linux is the way to go with PCI-E 3.0 hardware from what I have seen so far.
BTW we are currently close to finishing a new round of optimization of the BRP app (mostly for the OpenCL app but CUDA app will get some speedup as well). If all goes well in terms of validation, the new app will reduce the amount of data per task that has to be transferred across PCIe.
Huge productivity, Jeroen! So I tried to repeat your experiment, but effect didn't find :( Possibly reason of failure - a small different equipment and software. I would like to find a key difference.
I have tested a similar PCI-E setup via my ASUS RIVE motherboard. Three BIOS versions were checked - 1101, 1101 and 1404. Two additional PCIe-2 cards were tested - GTX 285 and GTX 240. Effect not detected.
Slot 1 - PCI-E 3.0 x16_1 -> 680 installed and set at x16 8.0 GT/s
Slot 2 - PCI-E 3.0 x8_2A -> PCI-E 2.0 card installed here, but NVIDIA Servet Settings can't detect this card!
Slot 4 - PCI-E 3.0 x16/8_3 -> 680 installed and set at x16 8.0 GT/s
Software:
Linux Mint 12 KDE 64-bit 3.0.0-20-generic
NVIDIA Driver 295.33
Boinc Manager 6.12.33 (x86)
3-tasks (0.2 CPUs + 0.33 NVIDIA GPUs): 4500 seconds per task averaged - Very slow!
Jeroen, few question:
What type of Linux do you use?
What type of video cards do you use?
Are you using app_info or "GPU utilization factor of BRP apps" in project setting?
BIOS version? Any special BIOS setting?
Show your log file for "nvidia-settings -q all", please.
I was looking at your host details and noticed that you have the CPU projects running like Gravitational Wave and Gamma Ray Search. Is your CPU at 100% load? If so, I would suggest setting the maximum CPU usage to 37.5 or 50%. This will improve CPU performance for GPU tasks and should significantly lower CPU time for each task.
Driver 295.33 should be able to detect your card in slot 2a. Can you run nvidia-smi to see if the card shows up there? Also, run lspci -vv with GPU load to make sure that your 680 cards are set at 8.0 GT/s.
Regarding my hardware and software setup, here is what I am running presently:
Board: Asus Rampage IV Extreme
Linux: This particular host has a custom Linux 64-bit system that I built a few years ago. It is very small system based on Busybox that boots over the network with PXE and is built for running BOINC. My second host that is also x79 based is running Slackware64 13.37.
Video Cards: 3x EVGA 680 2GB, 1x Zotac ZT-98GES5P-FDL 9800GT via slot 2a
GPU Utilization Factor of BRP apps: 0.5 - I run 0.5 via this host because on one of my cards, I intermittently get errors when running 0.33.
BIOS version: 1005 which has the PCI-E 3.0 fixes. - Most settings are on Auto. The only settings not set to Auto are multipliers, voltages, memory timings, and I have all PCI-E slots set to GEN3. I can post my full BIOS setup if you like.
Log file: Since this system does not have X11, I do not have the ability to run nvidia-settings. However, my second Slackware host does have nvidia-settings (295.33) and this system has similar performance gains from the PCI-E 2.0 card like the Asus system. I put the NVIDIA log up on my site here.
I think with the hardware and software configuration that you have and with GPU utilization factor set to 0.33, that your overall GPU runtime should be around 2000-2300 seconds per task.
RE: Thought that was going
)
I find this PCIe limitation with NVIDIA's latest drivers to be frustrating considering that PCIe 3 works just fine on the x79 board. There is a considerable performance drop with the slots locked at PCIe 2 via the new Linux driver.
On the upside, at least the source code is available for the Linux driver. I can do a diff between the two versions to see what all has changed and hopefully fix this limitation.
I am hoping the certification will come back positive for x79.
I ran into something a bit
)
I ran into something a bit weird. I have been looking at adding a 3rd card to my 680 rig for this project. At first I added in a 580 in the middle x8 slot to check temperatures and power draw. As soon as I installed this 3rd card, the execution time on the two 680s in the x16 slots went from 1945 seconds to 1680 seconds while running two tasks at once. It only happens when I have the 580 installed in the 3rd x8 slot and regardless of whether or not the 580 is actually crunching. The 580 shows as being configured as x8 5.0 GT/s as expected. When I install a 3rd 680 in place of the 580, the execution time on the other two cards is back to the normal 1945 seconds and the 680 shows up as x8 8.0 GT/s as expected. I have not been able to explain why there is a performance improvement on the other two cards with the 580 installed.
RE: I ran into something a
)
First of all, thanks for your support for the project. We appreciate every contribution, big or small of course, but running 3 GPUs in a single host is definitely showing a lot of support for the project.
As for your "problem", this is really interesting. Please let us know anything you find'll find out about this phenomenon, this might even be interesting for NVIDIA.
Cheers
HB
How does the configuration of
)
How does the configuration of your 680s look like when you have plugged in the 580? The first and second PCIe x8 slot should be the same logic lane and together with the first PCIe x16 are coming from the cpu through a multiplexer. The second PCIe x16 is directly connected to the cpu. (according to anandtech)
Putting the 580 in the third PCIe x8 slot should activate the split of the second PCIe x16 (?). I´m not quite sure if mixing PCIe 2 and 3 will result in only PCIe 2. The main difference between PCIe2 and 3 is the encoding scheme, with a much higher overhead for PCIe2.
edit: Your speed increased by 16% which is typical for doubling the bandwidth as we have seen earlier. So maybe there are still some problems with the PCIe3 implementation?
Just my 2ct
RE: First of all, thanks
)
You're welcome and thanks for the message. I very much enjoy the science of this project and I am more than glad to contribute however I can. I should have the 3rd GPU up and running in the coming days but wanted to do some more testing first.
I have done some more testing and have found out that only when I have a PCI-E 2.0 card installed in the x8 slot does the processing time via the two 680s reduce from ~1945 to ~1680 seconds. I tested this with both a 580 and 8800GT. When I install a 8600GT PCI-E 1.1 card in that slot, the processing time is 1945 seconds. I also tried running the extra 680 with the slot forced to PCI-E 2.0 in BIOS and the processing time was ~1945 seconds on the other two cards. It seems that only native PCI-E 2.0 cards have a positive performance effect over the other two cards.
Looking in my Asus manual, the slot configuration is setup like this:
Slot 1 - PCI-E 3.0 x16_1 -> 680 installed and set at x16 8.0 GT/s
Slot 2 - PCI-E 3.0 x8_2A -> PCI-E 2.0 card installed here and set at x8 5.0 GT/s
Slot 3 - PCI-E 3.0 x8_2B
Slot 4 - PCI-E 3.0 x16/8_3 -> 680 installed and set at x16 8.0 GT/s
Slot 5 - PCI-E 2.0 x1_1 -> PCH
Slot 6 - PCI-E 3.0 x8_4 -> Slot 4 switches to x8 if this slot is occupied
Here are some additonal performance numbers with the 680 via different slot configurations:
PCI-E 2.0 x8: ~2922 sec
PCI-E 3.0 x8: ~2280 sec
PCI-E 3.0 x16 with slot 2 unoccupied: ~1945 sec
PCI-E 3.0 x16 with slot 2 containing PCI-E 2.0 card: ~1680 sec
This could be an interesting test for anyone else running an Asus x79 board or perhaps any x79 board supporting x16/x8/x16. I would be curious to see if this performance change is something specific to Linux and the NVIDIA driver or to the hardware.
As a followup to my previous
)
As a followup to my previous posts, I have tested a similar PCI-E setup via my EVGA x79 FTW board. I am seeing the same performance gains with this board by having a GTX 680 installed in slot 1 set at x16 8 GT/s and a PCI-E 2.0 card installed in slot 3 x4 or slot 4 x16. This performance gain does not appear to be specific to the Asus board. In fact, I am seeing even better results with the EVGA board even though I have the CPU OC set less than what is setup on the Asus board.
2-tasks: 1520 seconds per task
3-tasks: 2097 seconds per task averaged
In the case of running 3 tasks at once, that comes out to nearly 124 tasks per day out of a single GPU! I have not had time to test 4-tasks yet.
Unfortunately, the latest drivers up to 304.22 in Linux continue to be capped at 5 GT/s. I have a support ticket open with NVIDIA on this for several weeks now but there is no resolution yet. Driver 295.33 continues to be the only option if you want to take full advantage of Kepler in Linux.
I wanted to pass this information along so that other x79 / Kepler users can get maximum production out of their hardware with this project. Performance-wise, Linux is the way to go with PCI-E 3.0 hardware from what I have seen so far.
Thanks for the update,
)
Thanks for the update, amazing performance.
BTW we are currently close to finishing a new round of optimization of the BRP app (mostly for the OpenCL app but CUDA app will get some speedup as well). If all goes well in terms of validation, the new app will reduce the amount of data per task that has to be transferred across PCIe.
Stay tuned
HB
That is very good news. I am
)
That is very good news. I am looking forward to trying out the new applications. Thanks.
Huge productivity, Jeroen! So
)
Huge productivity, Jeroen! So I tried to repeat your experiment, but effect didn't find :( Possibly reason of failure - a small different equipment and software. I would like to find a key difference.
I have tested a similar PCI-E setup via my ASUS RIVE motherboard. Three BIOS versions were checked - 1101, 1101 and 1404. Two additional PCIe-2 cards were tested - GTX 285 and GTX 240. Effect not detected.
Computer - http://einsteinathome.org/host/5427033
Hardware:
Motherboard - http://www.asus.com/Motherboards/Intel_Socket_2011/Rampage_IV_Extreme/
CPU - i7-3820. Overclocking 4.4HGz
RAM - 4x4GB DDR3-2133
GPU - 2 x Palit GTX 680 in PCIe - http://www.palit.biz/palit/vgapro.php?id=1864
+ GTX 285 PCI-E 2.0 or GTX 240 PCI-E 2.0
Slot 1 - PCI-E 3.0 x16_1 -> 680 installed and set at x16 8.0 GT/s
Slot 2 - PCI-E 3.0 x8_2A -> PCI-E 2.0 card installed here, but NVIDIA Servet Settings can't detect this card!
Slot 4 - PCI-E 3.0 x16/8_3 -> 680 installed and set at x16 8.0 GT/s
Software:
Linux Mint 12 KDE 64-bit 3.0.0-20-generic
NVIDIA Driver 295.33
Boinc Manager 6.12.33 (x86)
3-tasks (0.2 CPUs + 0.33 NVIDIA GPUs): 4500 seconds per task averaged - Very slow!
Jeroen, few question:
What type of Linux do you use?
What type of video cards do you use?
Are you using app_info or "GPU utilization factor of BRP apps" in project setting?
BIOS version? Any special BIOS setting?
Show your log file for "nvidia-settings -q all", please.
Hello Grey, I was looking
)
Hello Grey,
I was looking at your host details and noticed that you have the CPU projects running like Gravitational Wave and Gamma Ray Search. Is your CPU at 100% load? If so, I would suggest setting the maximum CPU usage to 37.5 or 50%. This will improve CPU performance for GPU tasks and should significantly lower CPU time for each task.
Driver 295.33 should be able to detect your card in slot 2a. Can you run nvidia-smi to see if the card shows up there? Also, run lspci -vv with GPU load to make sure that your 680 cards are set at 8.0 GT/s.
Regarding my hardware and software setup, here is what I am running presently:
Board: Asus Rampage IV Extreme
Linux: This particular host has a custom Linux 64-bit system that I built a few years ago. It is very small system based on Busybox that boots over the network with PXE and is built for running BOINC. My second host that is also x79 based is running Slackware64 13.37.
Video Cards: 3x EVGA 680 2GB, 1x Zotac ZT-98GES5P-FDL 9800GT via slot 2a
GPU Utilization Factor of BRP apps: 0.5 - I run 0.5 via this host because on one of my cards, I intermittently get errors when running 0.33.
BIOS version: 1005 which has the PCI-E 3.0 fixes. - Most settings are on Auto. The only settings not set to Auto are multipliers, voltages, memory timings, and I have all PCI-E slots set to GEN3. I can post my full BIOS setup if you like.
Log file: Since this system does not have X11, I do not have the ability to run nvidia-settings. However, my second Slackware host does have nvidia-settings (295.33) and this system has similar performance gains from the PCI-E 2.0 card like the Asus system. I put the NVIDIA log up on my site here.
I think with the hardware and software configuration that you have and with GPU utilization factor set to 0.33, that your overall GPU runtime should be around 2000-2300 seconds per task.