Well, the second result on the VM has been completed and reported. 48212 seconds. The slowdown continued for quite some time. I don't know if this is only natural or not. I went from that 42,502 second estimate back up to an astonishing 49800ish second estimate before it finally turned around and started going back down gradually. One change that I did make right before it started decreasing again was to change the VM settings for "input ungrabbed" (meaning the host OS or something other than VMware Workstation has focus) from "normal" to "low". Once I did that, the performance turned around. In addition, I was able to do things over in Windows and have it act like BOINC was natively running in Windows, so that appears to be how to "idle priority" set the VM.
Also, per system logs, the memory module (vmmemctl) and the faster network driver (vmxnet) are indeed loading, so vmtools is working as best as it can, I think. The version of vmtools is "not supported" for this particular release / kernel, but I think they are ok...
I'm starting up the 3rd result now...
Oh, and in case anyone is looking through the text output of the result, the "signal 15" entries are where I shut down the VM. I've learned several things in this venture, one of which is that when you shut down the manager in Linux, it doesn't shut down the client. The client and the application remain loaded in memory and will still be running. This is totally different from Windows, where if you unload the manager, the client unloads with it...
OK, well I should've stated that the SINGLE-USER installation on Windows ends the client along with the manager. I was just using my Intel system, which is a service installation, and shutting down the manager in the Administrator account didn't stop BOINC or SETI, so apparently the installation in Linux is akin to a "Service Installation" in Windows, although I still am irked at the people who put together the 5.10.8 installation package in Ubuntu, as all they provided was the manager itself, no client...
Estimation times are continuing to drop on the 3rd result. I'll note if times start going back up. Current estimation is at 49,339 seconds.
Estimation times are continuing to drop on the 3rd result. I'll note if times start going back up. Current estimation is at 49,339 seconds.
Well, there are definitely "phases" of performance in the application. After getting down as low as an estimated 46,800ish second time, I'm now back up to around 49,200. Since I have an AMD processor, and the actual processor is seen by the VM, I can't run vTune, and don't know enough about Linux to know if there is another app that can take a look at it.
Unless something dramatic changes when I switch to the VM that I personally made, then either my original assumption was false or the overhead of the VM is still eating away any gain that I may have seen and/or is causing a loss in performance, as I guess I'm 3-5% slower than running natively in Windows.
I attached the VM that I created just a little while ago. It looks like it is not going to be any faster.
So, this makes me wonder:
Was my original assumption that the Linux app is faster than the Windows app on the same AMD hardware an incorrect assumption?
I don't know. Just an "off the cuff" look at the performance of the system in general seems to me to indicate that performance is slower overall inside the VM, which is somewhat interesting because of the huge amount of memory it is using (572,244K right now per Task Manager).
The way virtualization works, as I understand it, is that it will try to pass native instructions to be handled by the hardware as much as it can, letting the hypervisor handle only what it has to. The big question that I don't know how to come up with an answer for is how much different would it be running native...?
I think I'm within 5-8% of native Windows performance of the application, but if I'm getting 10-15% performance drop by the VM, then I simply can't draw any conclusion at all because the performance is muddied by the virtualization.
I'll try doing some things to benchmark inside of the VM and see how much they differ with the same benchmark in Windows.
Edit: A good test of this would be something like SuperPI, perhaps. If I show substantial differences in how long it takes to calculate, say, 4 million digits, then I'd need to check into paravirtualization again perhaps...
Time for more Crystal Light, then I'll start working on figuring out if I can do it (mainly if SuperPI can be run in Linux!)
OK. Very bad performance for 1M superpi calculation. Same cacluation in the version of SuperPI that I have for Windows takes only 30 seconds:
Version 2.0 of the super_pi for Linux OS
Fortran source program was translated into C program with version 19981204 of
f2c, then generated C source program was optimized manually.
pgcc 3.2-3 with compile option of "-fast -tp px -Mbuiltin -Minline=size:1000 -Mnoframe -Mnobounds -Mcache_align -Mdalign -Mnoreentrant" was used for the
compilation.
------ Started super_pi run : Tue Dec 25 21:27:55 EST 2007
Start of PI calculation up to 1048576 decimal digits
End of initialization. Time= 1.708 Sec.
I= 1 L= 0 Time= 3.664 Sec.
I= 2 L= 0 Time= 3.708 Sec.
I= 3 L= 1 Time= 3.736 Sec.
I= 4 L= 2 Time= 3.728 Sec.
I= 5 L= 5 Time= 3.716 Sec.
I= 6 L= 10 Time= 3.656 Sec.
I= 7 L= 21 Time= 3.652 Sec.
I= 8 L= 43 Time= 3.688 Sec.
I= 9 L= 87 Time= 3.656 Sec.
I=10 L= 174 Time= 3.684 Sec.
I=11 L= 349 Time= 3.668 Sec.
I=12 L= 698 Time= 3.648 Sec.
I=13 L= 1396 Time= 3.644 Sec.
I=14 L= 2794 Time= 3.644 Sec.
I=15 L= 5588 Time= 3.604 Sec.
I=16 L= 11176 Time= 3.652 Sec.
I=17 L= 22353 Time= 3.656 Sec.
I=18 L= 44707 Time= 3.532 Sec.
I=19 L= 89415 Time= 3.344 Sec.
End of main loop
End of calculation. Time= 74.061 Sec.
End of data output. Time= 0.288 Sec.
Total calculation(I/O) time= 74.349( 41.391) Sec.
------ Ended super_pi run : Tue Dec 25 21:29:15 EST 2007
I have no idea where to go from here. Most benchmarking tools I see for Linux are from the mid to late 1990s... A tool that you can get through Ubuntu and/or Linux channels, Hardinfo, has some benchmarking capability, but the performance numbers are mixed. I seem to be mostly ok on "CPU" benchmarks, but when it comes to the FPU Raytracing benchmark, it is not doing so good.
Larger values are slower. I should be approximately equal to the FX-55 that's in the list...
I don't know what all this means though...and, frankly, I'm getting a headache trying to deal with reading through all this stuff...
My gut feeling is that there is a substantial penalty based on certain CPU/FPU and/or I/O functions, but not all of them.
If anyone running a slower processor, like say a 3000+, 3200+, 3400+, or 3500+, wouldn't mind running Hardinfo, that would be great... Be forewarned that it crashes sometimes. Don't know if that is because of the vm or because of the application itself...
OK. Very bad performance for 1M superpi calculation. Same cacluation in the version of SuperPI that I have for Windows takes only 30 seconds:
Total calculation(I/O) time= 74.349( 41.391) Sec.
Man, my head hurts. I'm surprised I had this thought sitting here with this pounding of a headache... LOL.
So, is that trying to say that the total runtime was 74.349 seconds and that 41.391 seconds of that was I/O? That's what it seems to indicate to me. If so, then I/O means disk or what? 74.4 - 41.4 = 33.0, so that would bring it into some sort of reasonable proximity to native Windows performance... I need to figure out what those figures mean.
Grrr...
Bernd, I at least want a power-users app for all this trouble!!!!! :-P
I've followed most of your trials and tribulations with the Linux testing, but I know I haven't caught all of it...
Have you considered doing a dual-boot install of Ubuntu on your system? The install partition-manager can slice off a minimal piece of one of your partitions (assuming you don't have a spare one to contribute to the cause) to load and configure Ubuntu. When I did a basic dual-boot install, it found enough usable drivers to run my hardware without searching the web for hours on end.
Quote:
Quote:
OK. Very bad performance for 1M superpi calculation. Same cacluation in the version of SuperPI that I have for Windows takes only 30 seconds:
Total calculation(I/O) time= 74.349( 41.391) Sec.
Man, my head hurts. I'm surprised I had this thought sitting here with this pounding of a headache... LOL.
So, is that trying to say that the total runtime was 74.349 seconds and that 41.391 seconds of that was I/O? That's what it seems to indicate to me. If so, then I/O means disk or what? 74.4 - 41.4 = 33.0, so that would bring it into some sort of reasonable proximity to native Windows performance... I need to figure out what those figures mean.
Grrr...
Bernd, I at least want a power-users app for all this trouble!!!!! :-P
I've followed most of your trials and tribulations with the Linux testing, but I know I haven't caught all of it...
Have you considered doing a dual-boot install of Ubuntu on your system? The install partition-manager can slice off a minimal piece of one of your partitions (assuming you don't have a spare one to contribute to the cause) to load and configure Ubuntu. When I did a basic dual-boot install, it found enough usable drivers to run my hardware without searching the web for hours on end.
The problem is, I don't trust the partitioning tools. I simply cannot afford to have my hard drive fouled up. Also, with the VM, there is no "searching for drivers". You get a fixed set of virtualized hardware that is roughly equivalent to a Pentium II / early Pentium III class system with a virtual SCSI disk. The abstraction layer (hypervisor and/or binary translation) then takes care of things.
Several questions I have is how in the world can it be taking advantage of HyperTransport, is the virtual SCSI anywhere close to performance of native SATA/DMA, and is the virtual memory driver (vmmemctl) adding in large amounts of latency otherwise not normally there.
The thought I had real early this morning was to create another VM, but this time Windows. Microsoft has a trial VM for Windows Server 2003 R2 that you can download. If I can't get that downloaded, then I can always get my Windows XP media and install a virtualized XP. I only need a basic installation; enough to get SuperPI running. I then compare the times...
The only other way is to take the chance and do a dual boot, which, given my lack of income and my need to have this system for school that starts on January 7th, I'm very hesitant about doing that...
The problem is, I don't trust the partitioning tools. I simply cannot afford to have my hard drive fouled up. Also, with the VM, there is no "searching for drivers". You get a fixed set of virtualized hardware that is roughly equivalent to a Pentium II / early Pentium III class system with a virtual SCSI disk. The abstraction layer (hypervisor and/or binary translation) then takes care of things.
Several questions I have is how in the world can it be taking advantage of HyperTransport, is the virtual SCSI anywhere close to performance of native SATA/DMA, and is the virtual memory driver (vmmemctl) adding in large amounts of latency otherwise not normally there.
The thought I had real early this morning was to create another VM, but this time Windows. Microsoft has a trial VM for Windows Server 2003 R2 that you can download. If I can't get that downloaded, then I can always get my Windows XP media and install a virtualized XP. I only need a basic installation; enough to get SuperPI running. I then compare the times...
The only other way is to take the chance and do a dual boot, which, given my lack of income and my need to have this system for school that starts on January 7th, I'm very hesitant about doing that...
For VMware-server i can say: A VM will use all your installed hardware and something more. It's running like any other program. If your VM needs memory or harddisk-access, VM-manager claimes memory from your host-system, same with hdd. But a really cool thing is that you can switch normal, but not all (also depended from used vm-program), hardware into your VM.
A dedicated SCSI-Scanning-VM with your favorite picture-manager or a Work-VM with your favorite OS and his Office- and Web-programs on an USB-stick/-disk...
But remember, you will ever have little impacts in performance and are independed from host-system, if you don't try to use extended CPU-features like SSE or 3DNow.
This host is a dedicated Crunching-VM with 5 active projects from 9 total. You should find enough not virtualized hosts to compare to it.
Well, the second result on
)
Well, the second result on the VM has been completed and reported. 48212 seconds. The slowdown continued for quite some time. I don't know if this is only natural or not. I went from that 42,502 second estimate back up to an astonishing 49800ish second estimate before it finally turned around and started going back down gradually. One change that I did make right before it started decreasing again was to change the VM settings for "input ungrabbed" (meaning the host OS or something other than VMware Workstation has focus) from "normal" to "low". Once I did that, the performance turned around. In addition, I was able to do things over in Windows and have it act like BOINC was natively running in Windows, so that appears to be how to "idle priority" set the VM.
Also, per system logs, the memory module (vmmemctl) and the faster network driver (vmxnet) are indeed loading, so vmtools is working as best as it can, I think. The version of vmtools is "not supported" for this particular release / kernel, but I think they are ok...
I'm starting up the 3rd result now...
Oh, and in case anyone is looking through the text output of the result, the "signal 15" entries are where I shut down the VM. I've learned several things in this venture, one of which is that when you shut down the manager in Linux, it doesn't shut down the client. The client and the application remain loaded in memory and will still be running. This is totally different from Windows, where if you unload the manager, the client unloads with it...
OK, well I should've stated
)
OK, well I should've stated that the SINGLE-USER installation on Windows ends the client along with the manager. I was just using my Intel system, which is a service installation, and shutting down the manager in the Administrator account didn't stop BOINC or SETI, so apparently the installation in Linux is akin to a "Service Installation" in Windows, although I still am irked at the people who put together the 5.10.8 installation package in Ubuntu, as all they provided was the manager itself, no client...
Estimation times are continuing to drop on the 3rd result. I'll note if times start going back up. Current estimation is at 49,339 seconds.
RE: Estimation times are
)
Well, there are definitely "phases" of performance in the application. After getting down as low as an estimated 46,800ish second time, I'm now back up to around 49,200. Since I have an AMD processor, and the actual processor is seen by the VM, I can't run vTune, and don't know enough about Linux to know if there is another app that can take a look at it.
Unless something dramatic changes when I switch to the VM that I personally made, then either my original assumption was false or the overhead of the VM is still eating away any gain that I may have seen and/or is causing a loss in performance, as I guess I'm 3-5% slower than running natively in Windows.
I attached the VM that I
)
I attached the VM that I created just a little while ago. It looks like it is not going to be any faster.
So, this makes me wonder:
Was my original assumption that the Linux app is faster than the Windows app on the same AMD hardware an incorrect assumption?
I don't know. Just an "off the cuff" look at the performance of the system in general seems to me to indicate that performance is slower overall inside the VM, which is somewhat interesting because of the huge amount of memory it is using (572,244K right now per Task Manager).
The way virtualization works, as I understand it, is that it will try to pass native instructions to be handled by the hardware as much as it can, letting the hypervisor handle only what it has to. The big question that I don't know how to come up with an answer for is how much different would it be running native...?
I think I'm within 5-8% of native Windows performance of the application, but if I'm getting 10-15% performance drop by the VM, then I simply can't draw any conclusion at all because the performance is muddied by the virtualization.
I'll try doing some things to benchmark inside of the VM and see how much they differ with the same benchmark in Windows.
Edit: A good test of this would be something like SuperPI, perhaps. If I show substantial differences in how long it takes to calculate, say, 4 million digits, then I'd need to check into paravirtualization again perhaps...
Time for more Crystal Light, then I'll start working on figuring out if I can do it (mainly if SuperPI can be run in Linux!)
OK. Very bad performance for
)
OK. Very bad performance for 1M superpi calculation. Same cacluation in the version of SuperPI that I have for Windows takes only 30 seconds:
Version 2.0 of the super_pi for Linux OS
Fortran source program was translated into C program with version 19981204 of
f2c, then generated C source program was optimized manually.
pgcc 3.2-3 with compile option of "-fast -tp px -Mbuiltin -Minline=size:1000 -Mnoframe -Mnobounds -Mcache_align -Mdalign -Mnoreentrant" was used for the
compilation.
------ Started super_pi run : Tue Dec 25 21:27:55 EST 2007
Start of PI calculation up to 1048576 decimal digits
End of initialization. Time= 1.708 Sec.
I= 1 L= 0 Time= 3.664 Sec.
I= 2 L= 0 Time= 3.708 Sec.
I= 3 L= 1 Time= 3.736 Sec.
I= 4 L= 2 Time= 3.728 Sec.
I= 5 L= 5 Time= 3.716 Sec.
I= 6 L= 10 Time= 3.656 Sec.
I= 7 L= 21 Time= 3.652 Sec.
I= 8 L= 43 Time= 3.688 Sec.
I= 9 L= 87 Time= 3.656 Sec.
I=10 L= 174 Time= 3.684 Sec.
I=11 L= 349 Time= 3.668 Sec.
I=12 L= 698 Time= 3.648 Sec.
I=13 L= 1396 Time= 3.644 Sec.
I=14 L= 2794 Time= 3.644 Sec.
I=15 L= 5588 Time= 3.604 Sec.
I=16 L= 11176 Time= 3.652 Sec.
I=17 L= 22353 Time= 3.656 Sec.
I=18 L= 44707 Time= 3.532 Sec.
I=19 L= 89415 Time= 3.344 Sec.
End of main loop
End of calculation. Time= 74.061 Sec.
End of data output. Time= 0.288 Sec.
Total calculation(I/O) time= 74.349( 41.391) Sec.
------ Ended super_pi run : Tue Dec 25 21:29:15 EST 2007
I have no idea where to go
)
I have no idea where to go from here. Most benchmarking tools I see for Linux are from the mid to late 1990s... A tool that you can get through Ubuntu and/or Linux channels, Hardinfo, has some benchmarking capability, but the performance numbers are mixed. I seem to be mostly ok on "CPU" benchmarks, but when it comes to the FPU Raytracing benchmark, it is not doing so good.
FPU Raytracing
This Machine 20.130
4x Intel(R) Xeon(R) CPU5160@ 3.00GHz 7.166
8x Intel(R) Xeon(R) CPU X5365@ 3.00GHz 7.415
4x Intel(R) Xeon(R) CPU X5355@ 2.66GHz 8.072
2x AMD Athlon(tm) 64 X2 Dual Core Processor 6400+ 8.765
AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ 9.023
AMD Athlon(tm) 64 FX-55 Processor 10.014
2x AMD Athlon(tm) 64 X2 Dual Core Processor 5400+ 10.327
4x AMD Phenom(tm) X4 Quad-Core Processor GP-9500 10.860
Mobile AMD Athlon(tm) 64 Processor 4000+ 10.990
4x Dual Core AMD Opteron(tm) Processor 275 10.992
4x Intel(R) Core(TM)2 Quad CPU @ 2.40GHz 11.706
2x Intel(R) Core(TM)2 Duo CPU E6850@ 3.00GHz 11.915
2x Intel(R) Core(TM)2 Duo CPU E4500@ 2.20GHz 12.170
2x Dual-Core AMD Opteron(tm) Processor 1216 12.189
2x AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ 12.191
2x AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ 12.586
2x Intel(R) Core(TM)2 CPU T7600@ 2.33GHz 12.776
2x AMD Athlon(tm) X2 Dual Core Processor BE-2400 12.858
2x AMD Opteron(tm) Processor 248 12.902
2x Dual Core AMD Opteron(tm) Processor 180 12.903
AMD Turion(tm) 64 Mobile Technology MK-38 13.168
2x Dual Core AMD Opteron(tm) Processor 185 13.244
AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ 13.388
2x AMD Turion(tm) 64 X2 Mobile Technology TL-64 13.619
2x Intel(R) Xeon(TM) CPU 3.40GHz 13.648
2x AMD Opteron(tm) Processor 246 13.757
2x AMD Processor model unknown 13.844
2x Intel(R) Xeon(R) CPU3070@ 2.66GHz 13.914
2x AMD Athlon(tm)64 X2 Dual Core Processor3800+ 13.960
4x Intel(R) Core(TM)2 Quad CPU @ 2.66GHz 14.056
2x Intel(R) Core(TM)2 Duo CPU T7500@ 2.20GHz 14.290
2x Dual Core AMD Opteron(tm) Processor 170 14.692
2x Dual Core AMD Opteron(tm) Processor 165 14.810
2x Intel(R) Core(TM)2 Duo CPU E6750@ 2.66GHz 15.008
Genuine Intel(R) CPU2160@ 1.80GHz 15.238
2x Intel(R) Core(TM)2 CPU6600@ 2.40GHz 15.300
2x AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ 15.335
AMD Sempron(tm) Processor 3500+ 15.391
8x Dual-Core AMD Opteron(tm) Processor 8212 15.439
4x Intel(R) Core(TM)2 Quad CPUQ6600@ 2.40GHz 15.502
2x AMD Athlon(tm) 64 X2 Dual Core Processor 4600+ 15.639
2x Intel(R) Core(TM)2 Duo CPU E4400@ 2.00GHz 15.653
Intel(R) Core(TM)2 CPU6600@ 2.40GHz 15.719
2x AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ 16.139
2x Intel(R) Core(TM)2 Extreme CPU X7900@ 2.80GHz 16.310
2x Dual-Core AMD Opteron(tm) Processor 2210 16.518
2x AMD Athlon(tm) 64 X2 Dual Core Processor 4800+ 16.579
2x AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ 16.724
AMD Processor model unknown 16.946
2x Intel(R) Core(TM)2 CPU T7400@ 2.16GHz 17.385
Larger values are slower. I should be approximately equal to the FX-55 that's in the list...
I don't know what all this means though...and, frankly, I'm getting a headache trying to deal with reading through all this stuff...
My gut feeling is that there is a substantial penalty based on certain CPU/FPU and/or I/O functions, but not all of them.
If anyone running a slower processor, like say a 3000+, 3200+, 3400+, or 3500+, wouldn't mind running Hardinfo, that would be great... Be forewarned that it crashes sometimes. Don't know if that is because of the vm or because of the application itself...
Brian (aka "frustrated")
RE: OK. Very bad
)
Man, my head hurts. I'm surprised I had this thought sitting here with this pounding of a headache... LOL.
So, is that trying to say that the total runtime was 74.349 seconds and that 41.391 seconds of that was I/O? That's what it seems to indicate to me. If so, then I/O means disk or what? 74.4 - 41.4 = 33.0, so that would bring it into some sort of reasonable proximity to native Windows performance... I need to figure out what those figures mean.
Grrr...
Bernd, I at least want a power-users app for all this trouble!!!!! :-P
Brian, I've followed most
)
Brian,
I've followed most of your trials and tribulations with the Linux testing, but I know I haven't caught all of it...
Have you considered doing a dual-boot install of Ubuntu on your system? The install partition-manager can slice off a minimal piece of one of your partitions (assuming you don't have a spare one to contribute to the cause) to load and configure Ubuntu. When I did a basic dual-boot install, it found enough usable drivers to run my hardware without searching the web for hours on end.
Seti Classic Final Total: 11446 WU.
RE: Brian, I've followed
)
The problem is, I don't trust the partitioning tools. I simply cannot afford to have my hard drive fouled up. Also, with the VM, there is no "searching for drivers". You get a fixed set of virtualized hardware that is roughly equivalent to a Pentium II / early Pentium III class system with a virtual SCSI disk. The abstraction layer (hypervisor and/or binary translation) then takes care of things.
Several questions I have is how in the world can it be taking advantage of HyperTransport, is the virtual SCSI anywhere close to performance of native SATA/DMA, and is the virtual memory driver (vmmemctl) adding in large amounts of latency otherwise not normally there.
The thought I had real early this morning was to create another VM, but this time Windows. Microsoft has a trial VM for Windows Server 2003 R2 that you can download. If I can't get that downloaded, then I can always get my Windows XP media and install a virtualized XP. I only need a basic installation; enough to get SuperPI running. I then compare the times...
The only other way is to take the chance and do a dual boot, which, given my lack of income and my need to have this system for school that starts on January 7th, I'm very hesitant about doing that...
RE: The problem is, I
)
For VMware-server i can say: A VM will use all your installed hardware and something more. It's running like any other program. If your VM needs memory or harddisk-access, VM-manager claimes memory from your host-system, same with hdd. But a really cool thing is that you can switch normal, but not all (also depended from used vm-program), hardware into your VM.
A dedicated SCSI-Scanning-VM with your favorite picture-manager or a Work-VM with your favorite OS and his Office- and Web-programs on an USB-stick/-disk...
But remember, you will ever have little impacts in performance and are independed from host-system, if you don't try to use extended CPU-features like SSE or 3DNow.
This host is a dedicated Crunching-VM with 5 active projects from 9 total. You should find enough not virtualized hosts to compare to it.
[edit]
quoting
[/edit]