Monday, July 22, 2013

Practical commands for the ESXi console

For troubleshooting on ESX(i), I always end up in the console even on ESXi 5.x. There are a number of practical commands that I can usually remember, but not always. So here they are for reference in random order:

# tail -n 50 'file name'
Shows last 50 lines of a file

# tail -f 'file name'
Outputs continuously what is being written to the file

# more 'file name' | grep -C 10 'search string'
Outputs the line with the word you search for including 10 lines on each side of the entry.

# less 'file name'
A good alternative to 'more' and 'cat'. Lets you navigate back and forth with the keyboard arrows. Use 'w' for page up and 'z' for page down

# find / -name 'search string'
Search for something. Further described in this post.

# find -iname "*-flat.vmdk" -mtime +7 |  xargs ls –alh
Finds files older than 7 days and list them including when they've last been changed. See this post for more info on xargs.

Using the vi text editor. See this post.

Typen characters with ASCII code (hold the Alt key while inputting the number on the numeric keyboard). See more here.
@ - Alt+64
| - Alt+124

# esxcfg-scsidevs –a
# esxcli storage core adapter list
Both commands show info on the SCSI controller type, HBA type, WWNs. Here's more info.

# esxcfg-scsidevs –l
# esxcli storage core device list
Both commands show various info on LUNs including exact size

# esxcfg-mpath -l
# esxcli storage core path list
Shows info about the storage paths. Will show naa device id, LUN id, and state of the paths. You can grep for the word 'dead' for finding dead paths.

# dcui
Will show the yellow/grey ESXi console menu in a Putty session

# esxcli software vib list
# esxcli software vib update -v 'VIB file name'
Updates the VIB package. See here for more info

# pwd
Show working directory

# passwd
Change password for current user (can be used for root as well)

# date
Show date and time.


Friday, July 19, 2013

False CIM warning on the SCSI controller - vSphere 5.

We have had a number of CIM warnings in the Hardware Status tab in vCenter on the ESXi 5 hosts related to the SCSI controller. We have seen it on HP servers with a P220i and a P410i SCSI controller.



However, there is nothing wrong with the SCSI controller.

Turns out it is a false alarm generated by the hp-smx-provider and it can be fixed by applying a driver update (which effectively changes the hp-smx-provider with the hp-smx-limited driver, both are CIM drivers). The difference between the two is, to my understanding, that with the limited driver you cannot update the ILO firmware via the ESXi console.

There are two ways of installing the four VIBs in the driver package. Either you can uninstall the existing four VIBs and then install the driver package, the zip file, which will install all four VIBs. This requires two reboots.

Or you can unpack the zip file and update only the VIBs that are actually newer than the ones you already have installed. In our case, we only needed to update two of the VIBs. This requires one reboot.

Method 1 (uninstall/install)

List the relevant VIBs

# esxcli software vib list | grep hp

Remove the installed VIBs

# esxcli software vib remove -n hp-smx-provider
# esxcli software vib remove -n char-hpcru
# esxcli software vib remove -n char-hpilo
# esxcli software vib remove -n hp-ams

Reboot

Download the driver package fromt the HP Support Center. The file is called "hp-ams-esxi5.0-bundle-9.3.5-3.zip".

Upload it to the ESXi host and place it in /tmp/.

# cd /tmp/


# esxcli software vib install -d /tmp/hp-ams-esxi5.0-bundle-9.3.5-3.zip


Reboot

Verify that the new VIBs have been installed

# esxcli software vib list | grep hp

Method 2 (update)

List the relevant VIBs

# esxcli software vib list | grep hp

Unpack the "hp-ams-esxi5.0-bundle-9.3.5-3.zip" file and copy the following VIB files to the /var/log/vmware folder:

  • hp-ams-esx-500.9.3.5-02.434156.vib
  • hp-smx-limited-500.03.02.10.3-434156.vib


If you unsure about whether the other two VIBs are newer than the ones already installed, then just copy all four VIBs.

Update the VIBs

# esxcli software vib update -v hp-ams-esx-500.9.3.5-02.434156.vib
# esxcli software vib update -v hp-smx-limited-500.03.02.10.3-434156.vib



Reboot

Verify that the new VIBs have been installed

# esxcli software vib list | grep hp

When the host comes back online after the reboot it will still have the warning. Leave it for about 10-15 minutes and the warning will disappear. Also, you can try to reset the sensors and refresh the page.

Thanks to Anders Mikkelsen for finding this fix and writing the guide to method 1 above.

Thursday, July 18, 2013

Updating the firmware and driver on an HBA on ESXi 5.0

When updating firmware on an HBA, it will update the driver at the same time as both parts are included in the installation VIB file.

First step is to verify the type of HBA in your ESXi host:

Run the following command:

# esxcfg-scsidevs -a

Or see this KB article for more info

In the example below, the HBA is a Qlogic ISP2532 PCI-Express



However, this name will not always be used to (at least not in the case of HP) in the documentation when you're looking for a specific model.

Another way to get info on the HBA type can be combined with locating the firware version (see next step)

Next step is to identify the current firmware level of the HBA:

For a Qlogic HBA, use the following command:

# more /proc/scsi/qla2xxx/0
(The '0' can also be a '1' or a '2')

For other types of HBAs, see this KB article


In the screendump above, you can see firmware and driver version and also another name for the HBA, QMH2562.

Now that we know the model and firmware version, we can look for the latest recommended version.

Go to this KB article to find the latest version.

For HP, it takes you to the following site: http://vibsdepot.hp.com/hpq/recipes/ where you should choose the PDF called: "HP-VMware-Recipe.pdf" which is the latest one.

In the PDF, you can search for you model, in this case QMH2562:


Find your model and follow the link. It will let you download a zip file. In this example the file is called:

qla2xxx-934.5.6.0-887798.zip

Unpack the zip file and locate the .vib file which will be used for the update.

In this case the .vib file is called:

scsi-qla2xxx-934.5.6.0-1OEM.500.0.0.472560.x86_64.vib

Now, transfer the .vib file to the following folder on the ESXi host:

/var/log/vmware

Run the following command:

# esxcli software vib update -v scsi-qla2xxx-934.5.6.0-1OEM.500.0.0.472560.x86_64.vib


Reboot to finish updating.

And then verify that the update has properly installed by re-running:

# more /proc/scsi/qla2xxx/0

You can also get detailed VIB package info by running:

# esxcli software vib list | grep scsi
# esxcli software vib get -n scsi-qla2xxx


Wednesday, July 17, 2013

Locating orphaned vmdk's

When you have had a VMware environment running for several years and have had many admins interacting with this environment, there is a fair chance that there will be a number of orphaned vmdk's on the LUNs which are taking up valuable space.

I found a post on yellow-bricks.com explaining two different ways of locating these vmdk's:

The first option is to simply look for all flat files that haven't been changed during the past seven days. This is easy and it works on ESXi 5.0. However, be careful before you start deleting files as a powered off VM, for example, also will show up in such a search (as well as probably also VMs with snapshots).

# find -iname "*-flat.vmdk" -mtime +7


Alternatively, you can pipe the output of the find command to ls see when each file has been modifed:

# find -iname "*-flat.vmdk" -mtime +7 | xargs ls –alh

And if you feel very safe that all the files can be deleted, you delete them all at once:

# find -iname "*-flat.vmdk" -mtime +7 | xargs rm -f

The second option which is a more thorough way of doing it is to run a powershell script (can also be found in the post) which looks for vmdk's which are not registered to a .vmx file. The script has been slightly modified by Danni Finne and can be found here:

Thursday, July 11, 2013

ESXi 5.0 host disconnects due to CPU crossdup error

If a host is disconnected in vCenter and you cannot reconnect the host and you see the following entries in the /var/log/vmkwarning file:

2013-07-05T06:54:02.084Z cpu7:7757089)WARNING: UserObj: 675: Failed to crossdup fd 13, /vmfs/devices/char/vob/External type CHAR: Busy
2013-07-05T06:54:02.084Z cpu7:7757089)WARNING: UserObj: 675: Failed to crossdup fd 14, /vmfs/devices/char/vob/iScsi type CHAR: Busy
2013-07-05T06:54:02.084Z cpu7:7757089)WARNING: UserObj: 675: Failed to crossdup fd 15, /vmfs/devices/char/vob/Migrate type CHAR: Busy
2013-07-05T06:54:02.084Z cpu7:7757089)WARNING: UserObj: 675: Failed to crossdup fd 16, /vmfs/devices/char/vob/PageReti type CHAR: Busy
2013-07-05T06:54:02.084Z cpu7:7757089)WARNING: UserObj: 675: Failed to crossdup fd 17, /vmfs/devices/char/vob/Visorfs type CHAR: Busy
2013-07-05T06:54:02.084Z cpu7:7757089)WARNING: UserObj: 675: Failed to crossdup fd 18, /vmfs/devices/char/vob/Hardware type CHAR: Busy
2013-07-05T06:54:02.084Z cpu7:7757089)WARNING: UserObj: 675: Failed to crossdup fd 19, /vmfs/devices/char/vob/Vfat type CHAR: Busy
2013-07-05T06:54:02.084Z cpu7:7757089)WARNING: UserObj: 3232: Unimplemented operation on 0x410028907210/SOCKET_UNIX_SERVER


It can be fixed by following this KB:


So, SSH to the host and edit the file (/etc/vmware/vpxa/vpxa.cfg) using vi text editor.

The following line should be changed, the value should be increased to 256 from originally 128:

'ThreadStackSizeKb'256'/ThreadStackSizeKb'

Vpxa.cfg is read-only so you can change it with:

# chmod 744 vpxa.cfg

And when done – to get file back to read only:

# chmod 444 vpxa.cfg

Restart the vpxa service and reconnect the host.

# /etc/init.d/vpxa restart

Manually recreating a vmdk descriptor file

Yesterday, I had an svMotion that crashed as the vCenter server (vSphere 5.0 u1) was rebooted while the svMotion task was in progress. Afterwards, a copy of the VM named "servername (2)" had been created in vCenter and a number of empty vmdk files were created at the destination LUNs. There were a file lock on these files so they could not be deleted. and I wasn't able to vMotion, snapshot or continue the svMotion on the source VM.

I tried moving just one of the vmdks with svMotion which resulted in the VM crashing and after that it wouldn't power back on.

Below is the error:


The error states that it cannot find one of the vmdk's. I looked in the relevant folder and discovered that the vmdk descriptor file was missing, the folder only contained the vmdk-flat file (the data file).

Under Edit Settings for the VM, the disk was still there but its size was set to 0 MB, see below:



I found a KB from VMware on how to recreate this descriptor file. And it worked fine.

Loosely, the steps were the following (the KB explains it in details):

Identify scsi controller type, lsilogic in this case.

Identify the size of the vmdk-flat file

Create the vmdk as a thin vmdk

Delete the empty vmdk data file

Edit the vmdk descriptor file with the vi text editor. This included the vmdk flat file name, the vHW version, and deleting the line about thin provisioning.

Verify the consistency of the descriptor with vmkfstools -e: