Known is a Drop; Unknown is an Ocean

Friday, December 17, 2010

Enhanced vMotion Compatibility

Enhanced vMotion Compatibility (EVC) is available for a while now, but it seems to be slowly adopted. Recently VMguru.nl featured an article “Challenge: vCenter, EVC and dvSwitches” which illustrates another case where the customer did not enable EVC when creating the cluster. There seem to be a lot of misunderstanding about EVC and the impact it has on the cluster when enabled.

What is EVC?
VMware Enhanced VMotion Compatibility (EVC) facilitates VMotion between different CPU generations through use of Intel Flex Migration and AMD-V Extended Migration technologies. When enabled for a cluster, EVC ensures that all CPUs within the cluster are VMotion compatible.

What is the benefit of EVC?
Because EVC allows you to migrate virtual machines between different generations of CPUs, with EVC you can mix older and newer server generations in the same cluster and be able to migrate virtual machines with VMotion between these hosts. This makes adding new hardware into your existing infrastructure easier and helps extend the value of your existing hosts. EVC forces newer processors to behave like old processors Well, this is not entirely true; EVC creates a baseline that allows all the hosts in the cluster that advertises the same feature set. The EVC baseline does not disable the features, but indicates that a specific feature is not available to the virtual machine.

Now it is crucial to understand that EVC only focuses on CPU features, such as SSE or AMD-now instructions and not on CPU speed or cache levels. Hardware virtualization optimization features such as Intel VT-Flexmigration or AMD-V Extended Migration and Memory Management Unit virtualization such as Intel EPT or AMD RVI will still be available to the VMkernel even if EVC is enabled. As mentioned before EVC only focuses of the availability of features and instructions of the existing CPUs in the cluster. For example features like SIMD instructions such as the SSE instruction set.

Let’s take a closer look, when selecting an EVC baseline, it will apply a baseline feature set of the selected CPU generation and will expose specific features. If an ESX host joins the cluster, only those CPU instructions that are new and unique to that specific CPU generation are hidden from the virtual machines. For example; if the cluster is configured with an Intel Xeon Core i7 baseline, it will make the standard Intel Xeon Core 2 feature plus SSE4.1., SSE4.2, Popcount and RDTSCP features available to all the virtual machines, when an ESX host with a Westmere (32nm) CPU joins the cluster, the additional CPU instruction sets like AES/AESNI and PCLMULQDQ are suppressed.

As mentioned in the various VMware KB articles, it is possible, but unlikely, that an application running in a virtual machine would benefit from these features, and that the application performance would be lower as the result of using an EVC mode that does not include the features.

DRS-FT integration and building block approach
When EVC is enabled in vSphere 4.1, DRS is able to select an appropriate ESX host for placing FT-enabled virtual machines and is able to load-balance these virtual machines, resulting in a more load-balanced cluster which likely has positive effect on the performance of the virtual machines. More info can be found in the article “DRS-FT integration”.

Equally interesting is the building block approach, by enabling EVC, architects can use predefined set of hosts and resources and gradually expand the ESX clusters. Not every company buys computer power per truckload, by enabling EVC clusters can grow clusters by adding ESX host with new(er) processor versions.

One potential caveat is mixing hardware of different major generations in the same cluster, as Irfan Ahmad so eloquently put it “not all MHz are created equal”. Meaning that newer major generations offer better performance per CPU clock cycle, creating a situation where a virtual machine is getting 500 MHz on a ESX host and when migrated to another ESX host where that 500 MHz is equivalent to 300 MHz of the original machine in terms of application visible performance. This increases the complexity of troubleshooting performance problems.

Recommendations?
No performance loss will be likely when enabling EVC. By enabling EVC, DRS-FT integration will be supported and organizations will be more flexible with expanding clusters over longer periods of time, therefor recommending enabling EVC on clusters. But will it be a panacea to stream of new major CPU generation releases? Unfortunately not! A possibility is to treat the newest hardware (Major releases) as a higher service as the older hardware and because of this create new clusters

vSphere cluster: max 4 ESX hosts per “location” because of HA limitations?

Duncan Epping has a couple of extremely interesting posts on Yellow-Bricks.com concerning HA even when it comes down to selecting or promoting the HA status of ESX nodes (a must read!), but I want more …

Let’s start with what I assume to know about HA:

- HA works with primary and secondary HA nodes
- The primary nodes are aware of the states and configs of all nodes in an HA cluster
- The secondary nodes depend on the primary nodes
- There is an supported limit of 5 primary HA nodes per cluster
- The first 5 ESX hosts that are added in a HA cluster are initially defined as primary HA nodes
- All the other hosts that are added to the HA cluster are configured as secondary HA nodes
- There’s a way to configure a HA node as primary or secondary, however it’s not possible to configure an ESX host as a “fixed” primary HA node:

/opt/vmware/aam/bin/Cli

AAM> promotenode (Configure host as a primary HA node)

/opt/vmware/aam/bin/Cli

AAM> demotenode (Configure host as a secondary HA node)

- One primary HA node is the Active Primary HA node; this node coordinates the restarts of the VM’s that went down with “crashed” host.
- When the Active Primary HA node goes down, another primary is (s)elected as Active Primary HA node” and takes over the coordinating role.
- A new primary is chosen when another primary is disconnected from the cluster in one of these situations:

(Re)configuring HA on a host
Disconnecting a host from the cluster (manually or by failure)
Removing a host from the cluster
In case of a HA failure
Putting a host into maintenance mode

Especially when you read the last bullet we can establish that HA roles are really dynamic in a VI/vSphere environment. This means that you have no control over the physical location of the primary and secondary roles.

And this is what my post is about:

This situation freaks me out because when you have a larger environment with a couple of possible failure domains as I’d like to call them (represented by any physically separated group of hosts within an HA cluster like different blade chassis or different server rooms) you want to have control over the placement of these HA roles.

And as I stated earlier Duncan Epping has some interesting articles like the HA deep dive and the Primary and Secondary nodes, pick one! which describe how to select a role for a host but this selection is not static; whenever a primary host is disconnected (Maintenance mode, Reconfigure HA and so on) there is a reelection and you lose control over the role placement.

So what if all 5 primaries HA nodes are on the same “possible failure domain” (say blade chassis) and that goes down? Well you just lost all your HA nodes that know what to do in case of a host-failure, so HA won’t work!

We’ll have to nuance the drama a bit: if 5 hosts of a “10 ESX host cluster” go down, you have a major issue anyway, if HA works or not, because you lost half of your capacity.

But you do have to realize that if HA is configured correctly, the 5 remaining hosts have some resources available, you have your primaries separated over the 2 locations and you have defined the start-up rules for the most important VM’s, these important VM’s will be booted up.

If you have the same situation as above but with all 5 primary HA nodes down because they were physically grouped, HA won’t work and none of the crashed VM’s will be booted up automatically!

During VMworld 2009 Marc Sevigny from VMware explained that they were looking into an option which would enable you to pick your primary hosts.This would solve the problem but until then the only solution is to keep your clusters limited to a total of 8 ESX hosts , 4 ESX hosts per “possible failure domain”.

I’m curious if I’m the only one running into this challenge; please let me know!

P.S. Special kudo’s go to Remon Lam from vminfo.nl who discovered this “feature” and reviewed the article.

Thursday, August 26, 2010

DSR Direct Routing Loopback Adapters in Windows Server 2008

Recently I ran across an interesting networking issue when using a DSR load balancing setup when converting from Windows Server 2003 to Windows Server 2008. It seems that microsoft has changed the way that the TCP/IP stack functions, so that when we went to configure the loopback adapters and setup IIS to run our websites, the server was unreachable. Worse it had a negative effect on routing to our production servers, since the server in question runs as an auxillary server for the production sites.

Anyhow, we finally found a great article that chronicles the issues, and how to correct them. It all boils down to a couple of commands on the command line to correct it:

netsh interface ipv4 set interface "net" weakhostreceive=enabled

netsh interface ipv4 set interface "loopback" weakhostreceive=enabled

netsh interface ipv4 set interface "loopback" weakhostsend=enabled