Friday, April 27, 2012

View HBA firmware version from service console

To view HBA firmware version from service console (ESX classic) go to /proc/scsi/qla or lpfc820.
Here you will typically find to text files, e.g. '2' and '3'. Run a 'cat' or 'more' on the files (see screendump below). See this post for more info

For ESXi v5.x, see this link.

Large VM crashes during snapshot commit

Snapshots can be your friend but they can most certainly also make your life miserable. The other day we had a rather large VM (with 20 GB mem, 8 vCPUs and 28 TB storage divided on 22 .vmdk's) that crashed during a snapshot commit. The error stated: "Performing disk cleanup. Cannot power off." The snapshot had been taken while the VM was powered off and only a few changes had been made to the VM before the snapshot was committed.

After the crash, the VM would not power on. The error stated: "Reason: Cannot allocate memory" and in the error  description (see screendump below) there's an indication of disk a lock or disk error. Fortunately, the VM could be started from the service console (ESX 4.1 classic) with 'vmware-cmd'.

After boot, vCenter stated that there was no snapshots on the VM. However, 22 delta files on a single LUN was telling otherwise.

A normal procedure to do cleanup is to power off VM and clone it. However, with 28 TB storage in the VM, this was not an option.

Instead, the following did the trick: Log on to the service console, change directory to the folder where the .vmx file for the VM resides, take a new snapshot and then do a remove all snapshots (see this KB article for more info). This removes the new snapshot as well as the 'defect' snapshot.

To see if any snapshots exist (that will probably not be the case): 

vmware-cmd vmname.vmx hassnapshot

To take new snapshot (with no quiesce and no memory, see this KB article for details)

vmware-cmd vmname.vmx createsnapshot snapshot-name description 0 0

As you can see in screen dump below at first I tried to run the command without the two boolean arguments that relates to QuiesceFilesystem and IncludeMemory. 

To remove all snapshots:

 vmware-cmd vmname.vmx removesnapshots

In the screendump above the removesnapshots command returns an error code '1' which means that all is well and snapshots are gone.

Tuesday, April 17, 2012

Could not power on VM - lock was not free

The other day we experienced an incident on the SAN storage with high latency and even loss of connection to the SAN. This can generate a lot of really unpleasant errors on the ESX hosts. Even after the SAN is brought back to a stable state we've seen hosts that won't boot, VM's that won't vMotion and VMs that won't power on due to file locks. 

If you receive a 'locked file error' (like screendump below) and your VM won't boot there are a couple of ways to go about it. This VMware KB article explains it quite well. Either you can cold migrate the VM to the  other hosts in the cluster (to find the ESX host with the lock) and then try to boot it from there or you can try to locate specifically which host has the lock.

If the vCenter log does not tell you specifically which files are locked, this can be viewed in the vmware.log which is located in the VM folder. If you just tried to power on the VM, then relevant info should be at the end of the log file.

In the example below, it is the swap that is still locked.

This can be verified by running the touch command on the locked file.
With vmkfstools you can get the mac address that has the lock:

# vmkfstools -D /vmfs/volumes///

In the screendump below, the MAC address has been highlighted.

The same info can be found in the /var/log/vmkernel log

Once you have the MAC address you can find a match by, for example, logging in to vCenter or onto the Blade enclosure. When you have a match, cold migrate the VM to the relavant ESX host and boot it.

Thursday, April 12, 2012

Boot directly into BIOS - Workstation 8

In Workstation 8, there's a nice new little feature that let's you boot directly into BIOS from the power on button. Instead of having this option under edit settings it has been moved to the power on button itself.

A small thing perhaps, but quite practical.