Virtual Infrastructure Tips - Azure and VMware

Wednesday, June 6, 2012

P2V error - BlockLevelVolumeCloneMgr and Sysimgbase_DiskLib_Write

The other day we had to do a number of hot P2V's on some Citrix servers running Win2k3. I had succesfully completed a test migration a week before (with VMware Converter Standalone 5 installed locally on the source) on one of the same servers but when we re-initiated the P2V in the planned maintenance window, both servers failed at 90-something percent with an error stating the following:

SingleVolumeCloneTask:DoRun: Volume cloning failed with clone error BlockLevelVolumeCloneMgr::CloneVolume: Detected a write error during the cloning of volume \WindowsBitmapDriverVolumeId=[08-03-AE-BE-00-40-00-00-00-00-00-00]. Error: 37409 (type: 1, code: 2338)

This log entry is found by right clicking the job in Converter and choosing 'export logs'. Locate the file called vmware-converter-worker-X.log (where X is an incremental integer).

The above error message seems to indicate that there is a problem on the source disk. We tried running checkdisk which showed no errors and we defragmented all drives. Same error occurred.

Looking a bit more at the logs, I found the following entries which pointed towards a network error:

[NFC ERROR] NfcSendMessage: send failed: NFC_NETWORK_ERROR
[NFC ERROR] NfcFssrvr_IO: failed to send io message
Sysimgbase_DiskLib_Write failed with 'NBD_ERR_NETWORK_CONNECT' (error code:2338)

By searching a bit on the above entries, I was pointed towards a relevant KB article from VMware. As it turns out, this is not network related at all, it is a known error in the Converter Standalone (both v4 and v5) software. The KB simply states that VMware is aware of this issue... I've done a ton of P2V's but this error I've never seen before...

The good news is that there is a workaround:

The trick is to only transfer one drive at a time. This means that if the source has a C and a D drive you'll be P2V'ing this machine twice creating to seperate VMs - one only containing the C drive including the system partition and another VM (which I just called 'servername_Ddrive') containing only the D drive. When both P2V's are done the second one is removed from inventory. For the first VM, go to Edit Settings and attach the disk from the second VM, 'servername_Ddrive'. After that, you can boot the VM now containing both drives. Be aware that the newly attached disk will deafult to drive letter D. This means that if it had another drive letter before, you'll have to change it manually.

An important point to mention in this process is that when transferring the second VM only containing the D drive, the transfer will fail with an error around 98% stating something like "An error ocurred during reconfiguration...". This is ok - as long as the drive has been succesfully cloned, this is what matters (see below).

An alternative workaround that will most likely work as well is to do a cold clone.

Below is a screen dump of the releant entries in the log file.

Monday, June 4, 2012

8-way VM on ESX 4.1 - Win2k8 R2 Standard edition

Today, I had to configure a VM with 8 vCPU's on an ESX 4.1 cluster. The guest OS was Windows Server 2008 R2 Standard Edition. However, after configuring the VM with 8 vCPU's it still only registered 4 vCPU's in the guest OS.

The reason is that both Win2k3 and Win2k8 R2 standard edition only can be configured with four physical CPUs or sockets. By default vSphere 4.0 and 4.1 presents 1 vCPU as one socket.

To bypass this, there is a feature (experimental in 4.0 but supported in 4.1) allowing you to configure multiple cores per socket. See this KB article. This feature has been included in vSphere 5.0 in the GUI.

See this article for comparison of Windows 2003 editions.

See this link for comparison of Windows 2008 edition (look for 'Editions Guide' at the bottom of the page). Or download PDF here.

Monday, May 21, 2012

Install app on Citrix server - change user

This is mostly a reminder to myself as I seem to forget the syntax.

When installing applications on a Citrix server in production, you need to change to install mode. After installation the mode has to be changed back to 'execute'.

From CMD prompt:

Install mode:

change user /install

Execute mode:

change user /execute

Friday, April 27, 2012

View HBA firmware version from service console

To view HBA firmware version from service console (ESX classic) go to /proc/scsi/qla or lpfc820.
Here you will typically find to text files, e.g. '2' and '3'. Run a 'cat' or 'more' on the files (see screendump below). See this post for more info

For ESXi v5.x, see this link.

Large VM crashes during snapshot commit

Snapshots can be your friend but they can most certainly also make your life miserable. The other day we had a rather large VM (with 20 GB mem, 8 vCPUs and 28 TB storage divided on 22 .vmdk's) that crashed during a snapshot commit. The error stated: "Performing disk cleanup. Cannot power off." The snapshot had been taken while the VM was powered off and only a few changes had been made to the VM before the snapshot was committed.

After the crash, the VM would not power on. The error stated: "Reason: Cannot allocate memory" and in the error description (see screendump below) there's an indication of disk a lock or disk error. Fortunately, the VM could be started from the service console (ESX 4.1 classic) with 'vmware-cmd'.

After boot, vCenter stated that there was no snapshots on the VM. However, 22 delta files on a single LUN was telling otherwise.

A normal procedure to do cleanup is to power off VM and clone it. However, with 28 TB storage in the VM, this was not an option.

Instead, the following did the trick: Log on to the service console, change directory to the folder where the .vmx file for the VM resides, take a new snapshot and then do a remove all snapshots (see this KB article for more info). This removes the new snapshot as well as the 'defect' snapshot.

To see if any snapshots exist (that will probably not be the case):

vmware-cmd vmname.vmx hassnapshot

To take new snapshot (with no quiesce and no memory, see this KB article for details)

vmware-cmd vmname.vmx createsnapshot snapshot-name description 0 0

As you can see in screen dump below at first I tried to run the command without the two boolean arguments that relates to QuiesceFilesystem and IncludeMemory.

To remove all snapshots:

vmware-cmd vmname.vmx removesnapshots

In the screendump above the removesnapshots command returns an error code '1' which means that all is well and snapshots are gone.

Tuesday, April 17, 2012

Could not power on VM - lock was not free

The other day we experienced an incident on the SAN storage with high latency and even loss of connection to the SAN. This can generate a lot of really unpleasant errors on the ESX hosts. Even after the SAN is brought back to a stable state we've seen hosts that won't boot, VM's that won't vMotion and VMs that won't power on due to file locks.

If you receive a 'locked file error' (like screendump below) and your VM won't boot there are a couple of ways to go about it. This VMware KB article explains it quite well. Either you can cold migrate the VM to the other hosts in the cluster (to find the ESX host with the lock) and then try to boot it from there or you can try to locate specifically which host has the lock.

If the vCenter log does not tell you specifically which files are locked, this can be viewed in the vmware.log which is located in the VM folder. If you just tried to power on the VM, then relevant info should be at the end of the log file.

In the example below, it is the swap that is still locked.

This can be verified by running the touch command on the locked file.
With vmkfstools you can get the mac address that has the lock:

# vmkfstools -D /vmfs/volumes///

In the screendump below, the MAC address has been highlighted.

The same info can be found in the /var/log/vmkernel log

Once you have the MAC address you can find a match by, for example, logging in to vCenter or onto the Blade enclosure. When you have a match, cold migrate the VM to the relavant ESX host and boot it.

Thursday, April 12, 2012

Boot directly into BIOS - Workstation 8

In Workstation 8, there's a nice new little feature that let's you boot directly into BIOS from the power on button. Instead of having this option under edit settings it has been moved to the power on button itself.

A small thing perhaps, but quite practical.