Thursday, July 8, 2010

Site redundancy with manual breaking of storage mirror

We have just installed a site redundant cluster for a customer. The cluster consists of two ESX4 hosts on replicated EMC CLariion storage. The ESX servers as well as the storage reside on different locations (preferrably we would have liked to have done it with storage virtualisation and seamless storage failover ala Datacore or SVC, but this was not an option..).

The site redundancy is enabled by using replicated storage. Should the site with the active LUNs fail, then the storage mirror can be broken manually and operation can be resumed on the remaining site.

One thing we discovered we that resignaturing of the LUNs is no longer necessary, as it was in previous versions when a mirror had been broken. This means that LUNs can be remounted directly without modifications, see Fibre Channel SAN Configuration guide pp. 74-76.
Earlier, you had to first break the mirror, then resignature your LUNs with the advanced feature LVM.resignature and then add the LUNs. This changed the UUID (and the label on the LUNs for that matter) which means that all VM had to be manually reregistered in virtualcenter. This is a bit time consuming and not something you want to spend yor time on in a disaster scenario.

In vCenter, you can use the "add storage" wizzard to remount the LUNs. However, there's a known bug in the software so it does not work. In stead, it has has to be done from command line with the following command (rescan the HBAs first. if it hangs, then reboot):

# esxcfg-volume -l (to list available volumes)
# esxcfg-volume -M (to persistently mount volume)

See this post for example site recovery procedure

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.