Thursday, April 2, 2015

Dead paths in ESXi 5.5 on LUN 0

At a client recently, going over the ESXi logs, I found that a certain entry was spamming the /var/log/vmkwarning logs. This was not just on one host but on all hosts. The entry was:

Warning: NMP: nmpPathClaimEnd:1192: Device, seen through path vmhba1:C0:T1:L0 is not registered (no active paths)


As it was on all hosts, the indication was that the error or misconfiguration is not in the ESXi hosts but probably at the storage layer.

In vCenter, two dead paths for LUN 0 were shown on each host under Storage Adapters. However, it didn't seem to affect any LUNs actually in use:


The environment is running Vblock with Cisco UCS hardware and VNX7500 storage. ESXi hosts boots from LUN. UIM is used to deploy both LUNs and hosts. VPLEX is used for active-active between sites (Metro cluster)

The ESXi boot LUN has id 0 and is provisioned directly via VNX. The LUNs for virtual machines are provisioned via the VPLEX and their id's starts from 1.

However, ESXi still expects a LUN with id 0 from the VPLEX. If not, the above error will show.

Fix

To fix the issue, present a small "dummy" LUN to all the hosts via the VPLEX with LUN id 0. It can be a thin provisioned 100 MB LUN. Rescan the hosts. But don't add the datastore to the hosts, just leave it presented to the hosts but not visible/usable in vCenter. This will make the error go away.

When storage later has to be added, the dummy LUN will show as an available 100 MB LUN and likely operations guys will know not to add this particular LUN.

From a storage perspective the steps are the following:


  • Manually create a small thin lun on the VNX array 
  • Present to VPLEX SG on the VNX
  • Claim  the device on VPLEX
  • Create virtual volume
  • Present to Storage-views with LUN ID 0
  • Note.  Don’t create datastore on the lun.

Thursday, March 26, 2015

Moving EMC VPLEX Witness server to the cloud - vCloud Air

To have a full active-active storage setup with live fail-over in case of a site failure for EMC VPLEX (with e.g. VNX or VMAX below), a Witness server is required. This is a small OVF Linux appliance (based on SLES). The witness server must be placed in a third failure domain, ie a third physical site.

If this is not done, then manual intervention is required to activate remaining site. This is described here (EMC documentation) and here (VMware documentation).

I have seen at multiple clients that a third site is not available and then the Witness server is placed on one of the two sites.

I looked into whether the Witness server can be moved to a cloud provider. Apparently it cannot be moved to Amazon AWS due to a specific kernel parameter set in the appliance (SLES) that doesn't match with the underlying AWS hypervisor, which is based on XenServer (this is what I've been told).

My thought was that the new VMware vCloud Air IaaS solution could be used as it is based on VMware ESXi and the Witness server normally runs on an ESXi host. Contacting EMC both in Denmark and in Sweden did not give a result. They didn't know whether this could be done and the official VPLEX documentation doesn't specify anything in this regard (link above).

However, after a bit of digging I found an EMC whitepaper that describes this exact situation (it is from 2015 and probably quite new)

It is technically possible and supported by EMC. The white paper includes documentation, installation steps, and security details. EMC professional services can assist with install/config if required.

Link to white paper





Saturday, August 30, 2014

VMworld account - change email and update account status (partner, alumni, VCP, and VMUG member)

Changing email

For the VMworld account, it is possible to change most information via the 'edit profile', but not the email.
To change the email, you have to open a ticket with VMworld support.

Here is info and link:


"Due to security protocol, the registration team are unable to make changes to VMworld.com account details.  The VMworld.com account support team will be able to action this for you.  They can be contacted via an online line found here http://www.vmworld.com/community/contact/access/"

Updating Partner, Alumni, VCP status

After changing my email address, all info connected to the VMworld account about Partner, VCP, VMUG, and VCP status were gone (this is required to obtain e.g. Alumni discount). To update this info, another ticket has to be created. It took about one working day.

Here is info and link (same link as above):

"In order to validate your Alumni, VCP, Partner and VMug you need to ensure that your previously used e-mail addresses are all linked.

Please contact the VMworld.com account support team quoting your Partner ID and years you have previously attended on the link below who will be able to assist.


Sunday, December 22, 2013

VMFS heap depletion

Over the past couple of days, we've had a VM that has crashed a number of times. When you try to open the VM console you get a black screen and a yellow MKS error at the top of the console. Strangely enough the VM can still be pinged. After powering it off and on again it boots but after not too long the same thing happens again. Also, vMotion did not work for a number of VMs and fails with the following error:

"The operation is not allowed in the current state"

In the vmkwarning log there are the following entries:

"WARNING: HBX: 1889: Failed to initialize VMFS3 distributed locking on volume 50d136be-62d92875-869a-10604bace2cc: Out of memory"

"WARNING: Fil3: 2034: Failed to reserve volume f530 28 1 50d136be 62d92875 6010869a cce2ac4b 0 0 0 0 0 0 0"

"WARNING: Heap: 2525: Heap vmfs3 already at its maximum size. Cannot expand."

"WARNING: Heap: 2900: Heap_Align(vmfs3, 6160/6160 bytes, 8 align) failed.  caller: 0x41800d8e84e9"

I found a KB article and a post from Cormac Hogan that explains the issue.

In ESXi 5.0 U1 the default VMFS heap size is set to 80 which means that the maximum total size of open vmdk files is 8 TB. When that limit is reached, then VMs can't access their disks.

There are two ways to fix this:

  • Upgrade to ESXi 5.1 U1 (or ESX 5.0 patch 5)
  • Increase the VMFS3.MaxHeapSizeMB to 256 (default is 80) in Configuration -> Advanced Settings and reboot the host

Upgrading to ESXi 5.1 U1 increases the maximum file total size of open vmdk's to 60 TB in stead of 8 TB.

Increasing the heap size to 256 will increase the maximum to 25 TB.




Monday, July 22, 2013

Practical commands for the ESXi console

For troubleshooting on ESX(i), I always end up in the console even on ESXi 5.x. There are a number of practical commands that I can usually remember, but not always. So here they are for reference in random order:

# tail -n 50 'file name'
Shows last 50 lines of a file

# tail -f 'file name'
Outputs continuously what is being written to the file

# more 'file name' | grep -C 10 'search string'
Outputs the line with the word you search for including 10 lines on each side of the entry.

# less 'file name'
A good alternative to 'more' and 'cat'. Lets you navigate back and forth with the keyboard arrows. Use 'w' for page up and 'z' for page down

# find / -name 'search string'
Search for something. Further described in this post.

# find -iname "*-flat.vmdk" -mtime +7 |  xargs ls –alh
Finds files older than 7 days and list them including when they've last been changed. See this post for more info on xargs.

Using the vi text editor. See this post.

Typen characters with ASCII code (hold the Alt key while inputting the number on the numeric keyboard). See more here.
@ - Alt+64
| - Alt+124

# esxcfg-scsidevs –a
# esxcli storage core adapter list
Both commands show info on the SCSI controller type, HBA type, WWNs. Here's more info.

# esxcfg-scsidevs –l
# esxcli storage core device list
Both commands show various info on LUNs including exact size

# esxcfg-mpath -l
# esxcli storage core path list
Shows info about the storage paths. Will show naa device id, LUN id, and state of the paths. You can grep for the word 'dead' for finding dead paths.

# dcui
Will show the yellow/grey ESXi console menu in a Putty session

# esxcli software vib list
# esxcli software vib update -v 'VIB file name'
Updates the VIB package. See here for more info

# pwd
Show working directory

# passwd
Change password for current user (can be used for root as well)

# date
Show date and time.


Friday, July 19, 2013

False CIM warning on the SCSI controller - vSphere 5.

We have had a number of CIM warnings in the Hardware Status tab in vCenter on the ESXi 5 hosts related to the SCSI controller. We have seen it on HP servers with a P220i and a P410i SCSI controller.



However, there is nothing wrong with the SCSI controller.

Turns out it is a false alarm generated by the hp-smx-provider and it can be fixed by applying a driver update (which effectively changes the hp-smx-provider with the hp-smx-limited driver, both are CIM drivers). The difference between the two is, to my understanding, that with the limited driver you cannot update the ILO firmware via the ESXi console.

There are two ways of installing the four VIBs in the driver package. Either you can uninstall the existing four VIBs and then install the driver package, the zip file, which will install all four VIBs. This requires two reboots.

Or you can unpack the zip file and update only the VIBs that are actually newer than the ones you already have installed. In our case, we only needed to update two of the VIBs. This requires one reboot.

Method 1 (uninstall/install)

List the relevant VIBs

# esxcli software vib list | grep hp

Remove the installed VIBs

# esxcli software vib remove -n hp-smx-provider
# esxcli software vib remove -n char-hpcru
# esxcli software vib remove -n char-hpilo
# esxcli software vib remove -n hp-ams

Reboot

Download the driver package fromt the HP Support Center. The file is called "hp-ams-esxi5.0-bundle-9.3.5-3.zip".

Upload it to the ESXi host and place it in /tmp/.

# cd /tmp/


# esxcli software vib install -d /tmp/hp-ams-esxi5.0-bundle-9.3.5-3.zip


Reboot

Verify that the new VIBs have been installed

# esxcli software vib list | grep hp

Method 2 (update)

List the relevant VIBs

# esxcli software vib list | grep hp

Unpack the "hp-ams-esxi5.0-bundle-9.3.5-3.zip" file and copy the following VIB files to the /var/log/vmware folder:

  • hp-ams-esx-500.9.3.5-02.434156.vib
  • hp-smx-limited-500.03.02.10.3-434156.vib


If you unsure about whether the other two VIBs are newer than the ones already installed, then just copy all four VIBs.

Update the VIBs

# esxcli software vib update -v hp-ams-esx-500.9.3.5-02.434156.vib
# esxcli software vib update -v hp-smx-limited-500.03.02.10.3-434156.vib



Reboot

Verify that the new VIBs have been installed

# esxcli software vib list | grep hp

When the host comes back online after the reboot it will still have the warning. Leave it for about 10-15 minutes and the warning will disappear. Also, you can try to reset the sensors and refresh the page.

Thanks to Anders Mikkelsen for finding this fix and writing the guide to method 1 above.

Thursday, July 18, 2013

Updating the firmware and driver on an HBA on ESXi 5.0

When updating firmware on an HBA, it will update the driver at the same time as both parts are included in the installation VIB file.

First step is to verify the type of HBA in your ESXi host:

Run the following command:

# esxcfg-scsidevs -a

Or see this KB article for more info

In the example below, the HBA is a Qlogic ISP2532 PCI-Express



However, this name will not always be used to (at least not in the case of HP) in the documentation when you're looking for a specific model.

Another way to get info on the HBA type can be combined with locating the firware version (see next step)

Next step is to identify the current firmware level of the HBA:

For a Qlogic HBA, use the following command:

# more /proc/scsi/qla2xxx/0
(The '0' can also be a '1' or a '2')

For other types of HBAs, see this KB article


In the screendump above, you can see firmware and driver version and also another name for the HBA, QMH2562.

Now that we know the model and firmware version, we can look for the latest recommended version.

Go to this KB article to find the latest version.

For HP, it takes you to the following site: http://vibsdepot.hp.com/hpq/recipes/ where you should choose the PDF called: "HP-VMware-Recipe.pdf" which is the latest one.

In the PDF, you can search for you model, in this case QMH2562:


Find your model and follow the link. It will let you download a zip file. In this example the file is called:

qla2xxx-934.5.6.0-887798.zip

Unpack the zip file and locate the .vib file which will be used for the update.

In this case the .vib file is called:

scsi-qla2xxx-934.5.6.0-1OEM.500.0.0.472560.x86_64.vib

Now, transfer the .vib file to the following folder on the ESXi host:

/var/log/vmware

Run the following command:

# esxcli software vib update -v scsi-qla2xxx-934.5.6.0-1OEM.500.0.0.472560.x86_64.vib


Reboot to finish updating.

And then verify that the update has properly installed by re-running:

# more /proc/scsi/qla2xxx/0

You can also get detailed VIB package info by running:

# esxcli software vib list | grep scsi
# esxcli software vib get -n scsi-qla2xxx