Thursday, April 2, 2015

Dead paths in ESXi 5.5 on LUN 0

At a client recently, going over the ESXi logs, I found that a certain entry was spamming the /var/log/vmkwarning logs. This was not just on one host but on all hosts. The entry was:

Warning: NMP: nmpPathClaimEnd:1192: Device, seen through path vmhba1:C0:T1:L0 is not registered (no active paths)


As it was on all hosts, the indication was that the error or misconfiguration is not in the ESXi hosts but probably at the storage layer.

In vCenter, two dead paths for LUN 0 were shown on each host under Storage Adapters. However, it didn't seem to affect any LUNs actually in use:


The environment is running Vblock with Cisco UCS hardware and VNX7500 storage. ESXi hosts boots from LUN. UIM is used to deploy both LUNs and hosts. VPLEX is used for active-active between sites (Metro cluster)

The ESXi boot LUN has id 0 and is provisioned directly via VNX. The LUNs for virtual machines are provisioned via the VPLEX and their id's starts from 1.

However, ESXi still expects a LUN with id 0 from the VPLEX. If not, the above error will show.

Fix

To fix the issue, present a small "dummy" LUN to all the hosts via the VPLEX with LUN id 0. It can be a thin provisioned 100 MB LUN. Rescan the hosts. But don't add the datastore to the hosts, just leave it presented to the hosts but not visible/usable in vCenter. This will make the error go away.

When storage later has to be added, the dummy LUN will show as an available 100 MB LUN and likely operations guys will know not to add this particular LUN.

From a storage perspective the steps are the following:


  • Manually create a small thin lun on the VNX array 
  • Present to VPLEX SG on the VNX
  • Claim  the device on VPLEX
  • Create virtual volume
  • Present to Storage-views with LUN ID 0
  • Note.  Don’t create datastore on the lun.
Update 2015.07.21; 
According to VCE, adding this LUN 0 is not supported with UIM(P (provisioning tool for Vblock). We started seeing issues with the re-adapt function for UIM/P and storage issues after that. So we had to remove the LUN 0. So far, there is no fix if using UIM/P.