Virtual Infrastructure Tips - Azure and VMware: 2012

Thursday, October 4, 2012

Passed the VCP5 exam today!

I finally got around to taking the VCP5 exam today and passed with 472 out of 500 points (94%). That's one more for the collection, VCP3-4-5, not too shabby! I should go out and buy something...

Sunday, September 16, 2012

Most important new features in vSphere 5.1

I was going over the "What's new in vSphere 5.1" sheet and wanted to point out the, from an operational standpoint, what is the most important changes.

Improved vMotion which lets you vMotion even without having shared storage (vMotion+svMotion). This is described in this post. For customer transition projects, this can probably come in handy.
vSphere web client: This is now the default interface for managing vSphere - it will probably take a little getting used to for the server admins.
Zero-downtime upgrade for VMware tools: Not having to reboot the VMs after tools upgrade is a big step forward (as an IT service provider, it can be close to impossible getting a maintenance window for all your VMs)
Larger VMs - up to 64 vCPUs (you will have to have sufficient underlying hardware though, so unfortunately it can't be simulated in the home lab :))
Virtual hardware v9. Upgrading will require VM downtime. One can only hope that, in future releases, vHW upgrades can be done in-place.

Improved vMotion in vSphere 5.1 - data moving vMotion

I heard about the new and improved data moving vMotion in the VMworld keynote and wanted to try it out in the home lab. The improvement consists of vSphere being able to perform a simultaneous vMotion+svMotion so you can change both datastore and host at the same time.

I was expecting this feature to be available from the vSphere client by right clicking the VM and choosing 'migrate'. However, this is not the case. The option is there but it is greyed out stating that the VM has to be powered off to perform this action, see screenshot below:

I found an article on yellow-bricks pointing towards the vSphere web client. And for a deep dive, see this post by Frank Denneman.

From the vSphere web client the option is available by right-clicking the VM and choosing 'Migrate', see below.

One apparent limitation is that you cannot migrate between Datacenters, only between cluster within a given Datacenter.

Other than that, the feature works as expected. I did a vMotion plus datastore move from local storage to shared storage. This is the second feature (here's the first one) I've found that is only available in the vSphere web client and not in the vSphere client which leads one to assume that VMware is actually serious about moving future administration away from the vSphere client.

Saturday, September 15, 2012

Applying a default host profiles in vSphere 5.1 cluster fails

I was playing around with host profiles in my vSphere 5.1 home lab today. It was easy enough to create a baseline by selecting a given host in a cluster. But, without having changed anything, when I tried to check for compliance I received the following error:

"A general system error occurred: Failed to run Execute operation on esxi-hostname.domain.net: IP address '192.168.1.x' is used for multiple virtual NICs"

I was pretty sure that I had only used that IP address for the service console, or the management interface, for one host.

To fix it, it is necessary to modify the profile as it is trying to apply the same IP address to the vmk0 (the management interface) of the other host(s) in the cluster.

Go to Network configuration -> Host virtual NIC -> dvSwitch ->IP address settings ->IPv4 address (assuming you are using a dvSwitch for vmk0) and change the option to:

'User specified IPv4 address to be used while applying the configuration', see screenshot below.

Then update the answer file for each host and rerun the compliance check.

Thursday, September 13, 2012

Enabling 64-bit VMs on nested ESXi 5.1

In my home lab, I have a 2-node cluster with two virtual ESXi 5.1. When I tried to boot a 64-bit on these hosts I received the following error:

"Longmode is unsupported. It is required for 64-bit guest OS support. On Intel systems, longmode requires VT-x to be enabled in the BIOS. On nested virtual ESX hosts, longmode requires the "Virtualized Hardware Virtualization" flag to be enabled on the outer VM."

I seem to remember that in version 5.0 you had to configure a given parameter in the ESXi console. For ESXi 5.1 this has changed according to this VMware KB.

It states the following:

"Virtualized HV is fully supported for virtual hardware version 9 VMs on hosts that support Intel VT-x and EPT or AMD-V and RVI. To enable virtualized HV, use the web client and navigate to the processor settings screen. Check the box next to "Expose hardware-assisted virtualization to the guest operating system." This setting is not available under the traditional C# client."

So, access the web client, locate the VM, right click -> Edit settings, and check the box as mentioned (for the parent VM, not the virtual ESXi...). It works like a charm, see screendump below:

vSphere web client - failed to connect to VMware lookup service

Yesterday, I installed the vCenter 5.1 vCenter Virtual Appliance in my home lab. It went fairly smooth, however, I couldn't connect to the vSphere web client. I received the following error:

Failed to connect to VMware Lookup Service - https://localhost:7444/lookupservice/sdk

I found a VMware KB indicating that there could be something wrong with the SSL certificate - because I had changed the FQDN of the appliance after initial setup.

That seemed a little overkill as the appliance should work or at least you should be able to reconfigure it.

The solution was to log into the administration web interface, https://vcenter-server-name:5480, and re-run the configuration wizard with default settings. That fixed the problem and it didn't delete the cluster and folder settings that I had already configured for this given vCenter server. The vSphere web client can be reached at the following address: https://vcenter-server-name:9443/vsphere-client/#

Btw: the deafult login for the vCenter 5.1 virtual appliance is user: root and password: vmware

Thursday, September 6, 2012

Activating and using VMware PSO credits

For the second time, in my company, we've negotiated a rather large ELA agreement with VMware (ultimately via a reseller) which includes buying a bunch of new licenses and then renewing SnS for the existing ones. With this ELA, there are quite a lot of PSO (Professional Services Organisation) credits that come with the agreement. First time it took us a while to figure out what to use them for, and now the second time it still creates confusion in regards to activating and using them.

After entering into the ELA, we received an activation email to an email address that we had specified (we had just told it to the VMware sales guys). Once the credits were activated, we received a confirmation email that they were indeed activated.

From here on, it is possible to buy different products and services with the credits.

To use the PSO credits, log in to:

https://mylearnssl.vmware.com

Use the email address that the license activation mail was sent to. If there's no account associated with this email address, then create one.

Once logged in, you can add multiple users so that they can log in with their own account and book training courses on their own: Home -> Services -> VMware Training -> myPaymentAccounts -> Edit (or go to My account -> myPaymentAccounts). Here you can also see how many points you have available and what you have used your points for.

From the mylearn site it's fairly easy to browse for course training and then paying with the credits. But the credits can also be used for other things such as paying for your VMworld ticket, for consulting services (PSO), and to pay for the VCP exam.

To pay for the VCP exam you need to retrieve a voucher first on the mylearn portal: Home -> Services -> VMware Training -> VMware Consulting and Training Credits -> Continue. Or go directly to this link. Going through this process will generate a voucher code which costs some credits. When you book the exam at Pearson VUE (requires a seperate account) you can use your voucher code to pay for the exam.

Saturday, August 11, 2012

Resetting the root password on ESXi 5 (and ESXi 4)

Yesterday, we had a fairly nasty situation at work where a standalone ESXi 4.1 host had to be rebooted. After reboot it did not automatically reconnect to vCenter and so a manual reconnect was done which prompted for root password. Unfortunately, we did not have the root password (don't ask). The host was joined to a domain but it could not be added to vCenter by using domain credentials and ssh to host with domain credentials did not work either. So, having 19 VMs down and no way to power them back on, I was basically screwed (all VMs were residing on DAS).

According to this VMware KB article there is no supported way to reset the root password on an ESXi v4 or v5 other than to reinstall it (or do a repair). I contacted VMware support and they sent me a guide for doing it in an unsupported way. This method is similar to what other guides on the web suggests. I went through the process but unfortunately i didn't work.

Finally, I mounted the ESXi 4.1 install ISO and did a repair. This resets most host configurations such a root password, network configuration, ntp settings, domain etc. After this I could set the password, reconnect to vCenter and then I had to reconfigure the host. Fortunately, the VMs were not completely gone from vCenter but were presented as greyed out orphaned VMs. So I could still see which LUNs the VMs were residing on. That way, the .vmx files could be located (except for one VM that had been renamed in vCenter without svMotioning or migrating it to another LUN afterwards...), the orphaned VM could be removed and the VM could be readded. It was quite a boring process but a least it worked.

Today, I wanted to recreate the password reset method in my home lab to see if I had actually done it in the correct way. I can confirm that, at least, on a virtual ESXi 5 it works and it is possible to reset the password to blank.

These are the steps (alternately, you can try this guide):

Download a Linux live bootable ISO. I used KNOPPIX. Mount the ISO and boot the host.

Once booted into KNOPPIX, open a shell.

Run the following set of commands:

# fdisk -l
# mkdir /mnt/disk
# mount /dev/sda5 /mnt/disk

(Mounting the correct device is the tricky part. To me, it was rather confusing which one to choose. For both the Fujitsu server that I dealt with and for the virtual ESXi, though, it was in sda5 that the state.tgz file was located. VMware suggested using the following command for HP servers: # mount /dev/cciss/c0d0p5 /mnt/disk - c0d0p5 is controller 0, disk 0, partition 5)

# cd /mnt/disk
# ls -al
# cp state.tgz state.tgz.bak
# cd /ramdisk
# mkdir temp
# cd temp
# tar zxf /mnt/disk/state.tgz
# ls -al
# tar zxf local.tgz
# cd etc
# nano shadow

Blank out the encrypted password. For example change root:$1$ywxtUqvn$9e1iXjGVd45T5IAgRxAuV.:13358:0:99999:7:::
to root::13358:0:99999:7:::

See below screendumps for before and after:

Save the shadow file.

Run the following commands to repackage everything:

# cd ..
# rm -rf local.tgz
# tar zcf local.tgz *
# chmod 755 local.tgz
# rm -rf /mnt/disk/state.tgz
# tar zcf /mnt/disk/state.tgz local.tgz
# ls -al /mnt/disk/
# umount /mnt/disk
# shutdown -r now

When ESXi boots up it has no root password set (blank)

Thursday, June 7, 2012

Basic vi text editor commands

As the Nano editor unfortunately has been taken out of the ESXi 5.0 shell, we're left with good old vi.

Here's some basic commands:

Opening a text file:

vi filename

vi opens in Command mode. You can move the cursor around in the file but not edit it. To switch edit mode press 'i' (you can see that a '-' symbol at the bottom of the console changes to an 'I'). To switch back to Command mode press 'Esc'.

To save:

:w

To quit:

:q!

See this link for more info

Wednesday, June 6, 2012

P2V error - BlockLevelVolumeCloneMgr and Sysimgbase_DiskLib_Write

The other day we had to do a number of hot P2V's on some Citrix servers running Win2k3. I had succesfully completed a test migration a week before (with VMware Converter Standalone 5 installed locally on the source) on one of the same servers but when we re-initiated the P2V in the planned maintenance window, both servers failed at 90-something percent with an error stating the following:

SingleVolumeCloneTask:DoRun: Volume cloning failed with clone error BlockLevelVolumeCloneMgr::CloneVolume: Detected a write error during the cloning of volume \WindowsBitmapDriverVolumeId=[08-03-AE-BE-00-40-00-00-00-00-00-00]. Error: 37409 (type: 1, code: 2338)

This log entry is found by right clicking the job in Converter and choosing 'export logs'. Locate the file called vmware-converter-worker-X.log (where X is an incremental integer).

The above error message seems to indicate that there is a problem on the source disk. We tried running checkdisk which showed no errors and we defragmented all drives. Same error occurred.

Looking a bit more at the logs, I found the following entries which pointed towards a network error:

[NFC ERROR] NfcSendMessage: send failed: NFC_NETWORK_ERROR
[NFC ERROR] NfcFssrvr_IO: failed to send io message
Sysimgbase_DiskLib_Write failed with 'NBD_ERR_NETWORK_CONNECT' (error code:2338)

By searching a bit on the above entries, I was pointed towards a relevant KB article from VMware. As it turns out, this is not network related at all, it is a known error in the Converter Standalone (both v4 and v5) software. The KB simply states that VMware is aware of this issue... I've done a ton of P2V's but this error I've never seen before...

The good news is that there is a workaround:

The trick is to only transfer one drive at a time. This means that if the source has a C and a D drive you'll be P2V'ing this machine twice creating to seperate VMs - one only containing the C drive including the system partition and another VM (which I just called 'servername_Ddrive') containing only the D drive. When both P2V's are done the second one is removed from inventory. For the first VM, go to Edit Settings and attach the disk from the second VM, 'servername_Ddrive'. After that, you can boot the VM now containing both drives. Be aware that the newly attached disk will deafult to drive letter D. This means that if it had another drive letter before, you'll have to change it manually.

An important point to mention in this process is that when transferring the second VM only containing the D drive, the transfer will fail with an error around 98% stating something like "An error ocurred during reconfiguration...". This is ok - as long as the drive has been succesfully cloned, this is what matters (see below).

An alternative workaround that will most likely work as well is to do a cold clone.

Below is a screen dump of the releant entries in the log file.

Monday, June 4, 2012

8-way VM on ESX 4.1 - Win2k8 R2 Standard edition

Today, I had to configure a VM with 8 vCPU's on an ESX 4.1 cluster. The guest OS was Windows Server 2008 R2 Standard Edition. However, after configuring the VM with 8 vCPU's it still only registered 4 vCPU's in the guest OS.

The reason is that both Win2k3 and Win2k8 R2 standard edition only can be configured with four physical CPUs or sockets. By default vSphere 4.0 and 4.1 presents 1 vCPU as one socket.

To bypass this, there is a feature (experimental in 4.0 but supported in 4.1) allowing you to configure multiple cores per socket. See this KB article. This feature has been included in vSphere 5.0 in the GUI.

See this article for comparison of Windows 2003 editions.

See this link for comparison of Windows 2008 edition (look for 'Editions Guide' at the bottom of the page). Or download PDF here.

Monday, May 21, 2012

Install app on Citrix server - change user

This is mostly a reminder to myself as I seem to forget the syntax.

When installing applications on a Citrix server in production, you need to change to install mode. After installation the mode has to be changed back to 'execute'.

From CMD prompt:

Install mode:

change user /install

Execute mode:

change user /execute

Friday, April 27, 2012

View HBA firmware version from service console

To view HBA firmware version from service console (ESX classic) go to /proc/scsi/qla or lpfc820.
Here you will typically find to text files, e.g. '2' and '3'. Run a 'cat' or 'more' on the files (see screendump below). See this post for more info

For ESXi v5.x, see this link.

Large VM crashes during snapshot commit

Snapshots can be your friend but they can most certainly also make your life miserable. The other day we had a rather large VM (with 20 GB mem, 8 vCPUs and 28 TB storage divided on 22 .vmdk's) that crashed during a snapshot commit. The error stated: "Performing disk cleanup. Cannot power off." The snapshot had been taken while the VM was powered off and only a few changes had been made to the VM before the snapshot was committed.

After the crash, the VM would not power on. The error stated: "Reason: Cannot allocate memory" and in the error description (see screendump below) there's an indication of disk a lock or disk error. Fortunately, the VM could be started from the service console (ESX 4.1 classic) with 'vmware-cmd'.

After boot, vCenter stated that there was no snapshots on the VM. However, 22 delta files on a single LUN was telling otherwise.

A normal procedure to do cleanup is to power off VM and clone it. However, with 28 TB storage in the VM, this was not an option.

Instead, the following did the trick: Log on to the service console, change directory to the folder where the .vmx file for the VM resides, take a new snapshot and then do a remove all snapshots (see this KB article for more info). This removes the new snapshot as well as the 'defect' snapshot.

To see if any snapshots exist (that will probably not be the case):

vmware-cmd vmname.vmx hassnapshot

To take new snapshot (with no quiesce and no memory, see this KB article for details)

vmware-cmd vmname.vmx createsnapshot snapshot-name description 0 0

As you can see in screen dump below at first I tried to run the command without the two boolean arguments that relates to QuiesceFilesystem and IncludeMemory.

To remove all snapshots:

vmware-cmd vmname.vmx removesnapshots

In the screendump above the removesnapshots command returns an error code '1' which means that all is well and snapshots are gone.

Tuesday, April 17, 2012

Could not power on VM - lock was not free

The other day we experienced an incident on the SAN storage with high latency and even loss of connection to the SAN. This can generate a lot of really unpleasant errors on the ESX hosts. Even after the SAN is brought back to a stable state we've seen hosts that won't boot, VM's that won't vMotion and VMs that won't power on due to file locks.

If you receive a 'locked file error' (like screendump below) and your VM won't boot there are a couple of ways to go about it. This VMware KB article explains it quite well. Either you can cold migrate the VM to the other hosts in the cluster (to find the ESX host with the lock) and then try to boot it from there or you can try to locate specifically which host has the lock.

If the vCenter log does not tell you specifically which files are locked, this can be viewed in the vmware.log which is located in the VM folder. If you just tried to power on the VM, then relevant info should be at the end of the log file.

In the example below, it is the swap that is still locked.

This can be verified by running the touch command on the locked file.
With vmkfstools you can get the mac address that has the lock:

# vmkfstools -D /vmfs/volumes///

In the screendump below, the MAC address has been highlighted.

The same info can be found in the /var/log/vmkernel log

Once you have the MAC address you can find a match by, for example, logging in to vCenter or onto the Blade enclosure. When you have a match, cold migrate the VM to the relavant ESX host and boot it.

Thursday, April 12, 2012

Boot directly into BIOS - Workstation 8

In Workstation 8, there's a nice new little feature that let's you boot directly into BIOS from the power on button. Instead of having this option under edit settings it has been moved to the power on button itself.

A small thing perhaps, but quite practical.

Virtual Infrastructure Tips - Azure and VMware