From the Desk of Michael Lazar, CTO Veloxum

Technical information related to Veloxum's Active Continuous Optimization (ACO) process

Wednesday, November 23, 2011

So, you want to optimize iSCSI? Things to think about...

(Learn more about automatic optimization and Veloxum's optimization software.)

As with all optimization discussions, how you optimize iSCSI and what your assumptions are will affect the optimization’s success. Theory aside, in regard to iSCSI and increased throughput you, can gain faster speeds with Jumbo frames but you need to take the following into consideration:  
The number of spindles available and the type of RAID level in use may be more of a factor than anything else.  If you have a few spindles or created RAID (disk) groups that only use a few spindles your limiting factor will likely have nothing to do with the frame size. I have seen companies create RAID-6 groups with four spindles then wonder why performance is so poor.

Do your switches properly support Jumbo frames (do they have large enough buffers)? Some switches officially support jumbo frames (and VLANS) but the buffers are simply not adequate – there is “support” and there are products designed for Jumbo frames.
TCP and SCSI considerations:  What type of TCP congestion control is your SAN vendor using vs. what is ESX using (it is not a straightforward as you may think).
How many NICs are connected to your iSCSI SAN? Are you using MPIO?  Has MPIO been setup to change IOPS per round robin session? There are some excellent articles on iSCSI (and MPIO) that can be found in the following places that talk about these issues (I’ve included the older posts as the diagrams and explanations are helpful.):
Now assuming you have many spindles available and your switches are properly handling larger MTU -  what difference can it make?
Potentially reduces packet-processing operations by a factor of six.  A MTU size of 9000 is large enough to accommodate 8K of data and overhead.  If your VM is database, this can make a big difference.
Keep in mind that on top of pure cpu overhead in packet assembly and disassembly there is also introduced latency of the operation itself
Depending on your iSCSI vendor delayed TCP acks may not be a good setting.  Additionally consider that “something” (cpu) needs to keep track of the delayed acks.  You should check with your SAN vendor on their recommended setting.
Vendors also confirm that Jumbo MTU has a large impact.  Here is a reference http://www.netapp.com/us/library/technical-reports/tr-3409.html  (claims up to 30% throughput improvement - a bit dated but still relevant)
A seminal paper by Matthew Mathis, Jeffrey Semke, Jamshid Mahdavi, and Teunis Ott found here: http://www.psc.edu/networking/papers/model_abstract.html shows TCP throughput following the following formula
Throughput <= ~0.7 * MSS / (rtt * sqrt(packet_loss))
The equation above tells us that everything being equal, you can double your throughput by doubling the packet size.  Of course you need to enough spindles and aggregate bandwidth available to handle it.
I happen to be helping a client this weekend convert to Jumbo MTU.  They are on ESX 4.1 using Broadcom nics and Dell MD3200 storage systems.  Assuming they do not mind, I will post the results next week.

Monday, November 7, 2011

Optimizing Server Performance and Capacity: 5 Free Methods for both Physical and Virtual (VMware) Servers

Since Veloxum makes its business around server optimization to increase the performance and capacity of physical and virtual servers, people often ask us, “Do you have any free advice on how we can increase the performance and capacity of our servers?” For these people I often provide the following five (5) often overlooked free methods.

1.      Make sure your VMware tools are up-to-date on all guests. VMware’s engineers, like many others, continually improve VMware’s performance release-to-release. You cannot take advantage of those improvements if you run older versions.
2.      Place identical operating systems on the same host.  This simple tip can increase memory sharing significantly.
3.      Using iSCSI?  Have you setup your environment for JUMBO MTUs?  If your environment supports it, you can expect a 40% or greater improvement in sustainable I/O.
4.      Using shared storage?  Use vStorage APIs for Array Integration (VAAI) if possible. Review  http://kb.vmware.com/kb/1005009  and  http://kb.vmware.com/kb/1021976 for more details
5.      Another shared storage tip, perform regular virtual machine snapshot maintenance. Keep only the ones that you need - remove them if at all possible.

Lastly, one monitoring tip I always recommend: enable logging level “3” in vCenter. Several latency (delay) metrics are not available unless you do this. As you search for your own ways to increase performance these metrics will help you spot trouble areas.