Thursday, August 20, 2015

Nexus 1000v - will the network fail if the VSMs fail?

At current client there has been some concern regarding the robustness of the network in relation to Nexus 1000v. It's a vSphere 5.5 environment running in Vblock with Nexus 1000v switches (bundled with Vblock).

The question was whether the network on the ESXi hosts is dependent on the two management 1KV VMs and if the network will fail entirely if these two VMs are down.

Furthermore, there was a question whether all ESXi traffic flows through these two VMs, the management VMs were being perceived as actual switches.

The answer is, for most, probably pretty straight forward but I decided to verify anyway.

Two notes first:

1) By adding Nexus 1000v to your environment you may receive some benefits. But you also add complexity. Through the looking glass of vCenter, it is simply easier to understand and manage a virtual distributed switch (vDS). Some network admins may disagree of course.

2) From Googling a bit and also from general experience, it doesn't seem like that many people are actually using the 1KV's. There is not much info to be found online and the stuff there is seems a bit outdated.

That said, let's get to it:

The Nexus 1000v infrastructure consists of two parts.

1) Virtual Supervisor Module (VSM). This is a small virtual appliance for management and configuration. You can have one or two. With two VMs, they run in active/passive mode

2) Virtual Ethernet Modules (VEM). These modules are installed/pushed to each of the ESXi hosts in the environment

All configuration of networks/VLANs is done in the VSMs (NX-OS interface) and then pushed to the VEMs. From vCenter it looks like a regular vDS but you can see in the description that it is an 1KV, see below:

Even if both VMSs should fail, the network will continue to work as before on all ESXi hosts. The network state and info/configuration is kept separately on all ESXi hosts in the VEM. However, control is lost and no changes can be done before the VSMs are up and sync'ed again.

For the other question, no, the VM traffic does not flow through the VSM, they are only for management. The VM ethernet traffic flows through the pNICs in the ESXi hosts and on to the network infrastructure. The same as with standard virtual switches and vDS'es. This means that the VSMs cannot be a bandwidth bottleneck or single point of failure in that sense.

For documentation: See 45 seconds of this video from time: 14.45 to 15.30.

Below are two diagrams that show the overview of VSM and VEM: