Thursday, November 5, 2009

Configuration notes for HA

A while back, we experienced a number of inconvient HA failover false positives where several hundred VMs were powered down even though there was nothing wrong with the hosts. The cause of these incidents were apparently a hick-up in the network lasting more than 15 seconds. To avoid such issues, we decided to disable HA until we were absolutely that we had a proper HA configuration.

In the following, there is a quick guide to the HA settings, that we use. These correspond to current best practice.

For reference, we have used the HA deepdive article from Yellow-bricks and article by Scott Lowe on HA configuration notes.

Das.failuredetectiontime
the default timeout for HA is 15 seconds. Best practice is to increase this to 60 seconds or 60.000 miliseconds. To do this, add the following entry under VMware HA -> Advanced options:

Option: das.failuredetectiontime
Value: 60.000

The input is validated, so if you spell it wrong you will be prompted with an error.

Das.isolationaddress
The default isolation address is the default gateway which is pinged if there is no contact between the hosts. However, the default gateway can be some arbitrary place in the network, so it can sometimes be useful to insert one or more extre isolation addresses. It makes sense to add an IP as close to the host as possible e.g. a virtual IP on a switch.

Option: das.isolationaddressX (X=1,2,3,...9)
Value: IP address

Host isolation response
For fibre channel storage, we choose "leave powered on". In a HA failover situation, the active primary node in the cluster will try to boot the VM on the failed host. However, if the host is not down, there will be a vmfs file lock on the VMs and therefore they can't be restarted. HA will try to restart VMs five times. Worst case scenario is that VMs on a host loose network connection... (in vSphere, default response has been changed to "shut down").
For iSCSI storage and other storage over IP, the best practice isolation response is power off to avoid split brain situations (two hosts having write access to a vmdk at the same time).

Cisco switches and port fast
In a Cisco network environment, make sure that 'spanning-tree port fast trunk' is configured on all physical switch ports connected to the ESX host. This ensures that ports are never in 'listen' or 'learn' state - only in 'forwarding' state. So if e.g. one of the uplinks to the COS goes down, you don't risk an isolation response because the delay to put the other port/uplink into forwarding state is longer than the isolation timeout.

Example on a configured interface on a Catalyst IOS based switch:

interface GigabitEthernet0/1
description #VMWare ESX trunk port#
no ip address switchport
switchport trunk encapsulation dot1q
switchport trunk allowed vlan
switchport mode trunk
switchport nonegotiate
spanning-tree portfast trunk

HP Blade enclosures - primary and secondary nodes
Due to the fact that there can be no more than five primary nodes in a cluster, a basic design rule is that there should be no more than a maximum of four hosts in a Blade enclosure per cluster. If five or more hosts (and they all happen to be primary nodes) are located in an enclosure and it fails (which happens...), then no VMs will be started. This matter is explained well in the Yellow-bricks article mentioned above. Furthermore, clusters should be spread over a minimum of two enclosures.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.