Enhanced vMotion Compatibility – Why your identical hardware may not be the same.

In my previous post I alluded to the possibility that vMotion might fail even when you have hosts with identical hardware.  VMware says that vMotion should work between identical hardware – but… there’s always a but isn’t there?… If your hardware was:

1.  Made in different assembly lines

2.  Made in different factories

3.  Was manufactured several months apart

You could be affected by slight changes in Intel/AMD’s manufacturing process which prevents vMotion from executing!  See the following KB article:

http://kb.vmware.com/kb/1029785

In some cases a BIOS update will resolve this issue, but what good does that do once you have running workloads on these hosts?  You still need to bring down the host, and if there aren’t any other hosts that can accept a VM through vMotion, your workloads will need to come down too.   Not a good day.

Ok. So the soapbox is out, i’m standing on it and saying always enable EVC at the highest level possilbe for your cluster.  You won’t regret it.

vSysEng

Enhanced vMotion Compatibility – Time to break out the soapbox.

Ok, the resounding response to my first blog may not be what I had hoped.  Maybe I need to go into a bit of further detail on EVC and why using this feature is important.  The soap box is out.

It seems to be a somewhat overlooked feature of vSphere, but in the managed hosting arena it is important to use EVC or it can cause problems for your deployments down the road.

For those not familiar, EVC allows the vMotion of virtual machines between hosts with different revisions of processors.  So for instance, an cluster can have a mix of Intel procs or AMD (not Intel & AMD however).

For instance a cluster with 3 hosts using the following processor chipsets:

Cluster A

Host1:  Xeon E5450

Host2:  Xeon E3120

Host3:  Xeon E5-2440

Typically, a mixed processor chipset cluster without EVC will give errors when attempting to vMotion VM’s between hosts.  This is no secret and should be basically understood by even the most vanilla vSphere admins.

It should also be known by anyone who has used EVC before, it is critical to enable this during the initial deployment of the cluster.  If not engaged after the first host has been added to the cluster, you will need to take an outage against all VM’s in the cluster in order to get it enabled.  Imagine that you are trying to add 1 new host to an existing cluster of 16 older hosts with 160+ running VMs and you find out to use that host all 160+ VM’s need to take an outage.  Bad news.

I hear it already though – “I don’t need to use EVC because I always by the same type of hosts with the identical hardware.   EVC doesn’t matter in those types of deployments”.

R’uh R’oh – you may need EVC when you add that host with “identical hardware” to the cluster.  Yes, you may need EVC.

Like I said – The soap box is out, but I haven’t stood on it yet.  Get ready for that in a later post.