Monday, February 15, 2010

Howto: Installing VMware tools in a Linux VM

Installing VMware tools in a Linux VM take a few more steps than on a Windows VM. This is done the following way (tested on VMware Workstation 7 and Ubuntu Desktop 9.04 VM appliance).
  • install the guest OS (click here to see if guest OS is supported)
  • to exit the gui to simulate no X server: sudo service gdm stop and then alt+f1 to get console
  • right click the VM and choose install/update VMware tools. This will connect the cdrom with the VMware tools ISO file (if files are not already available, they will be downloaded) but you still need to mount the cdrom manually: sudo mount /dev/scd0 /media/cdrom (if folder don't exist, create it first)
  • copy the tar file to /tmp folder and untar it: tar -xvf VMware-tools-vXX.tar.gz
  • ls to the untar'ed folder and run vmware-install.pl: sudo ./vmware-install.pl
  • start the gui: sudo service gdm start or simply startx
  • verify that VMware tools are running: sudo ps -auxwww 'pipe-symbol' grep vm (look for /usr/bin/vmtoolsd and you will also find the balloon driver vmmemctl). You can also check if the vmtools startup script has been put into the startup folder /etc/rc0.d/
link to VMware KB article on installing VMtools

Thursday, February 11, 2010

Example of an HA error - and a fix

The other day, I got an HA error when trying to add a new host into a cluster. It was weird, as the host was identical to the others - same model, same installation procedure, and everything. In VirtualCenter, the error looked like this:

This piece of information did not help much in relation to troubleshooting.

The only thing that was different with the new host was that is was configured from the service console (COS) as its NICs were DOA. I had used my own guide for this, so I thought I was in good shape ;-).

A more descriptive error was to be found in the VirtualCenter agent log file on the host (/var/log/vmware/vpx/vpxa.log). Grepping for the word "error" gave the following output:

errorcat = "hostipaddrsdiffer",
errotext = "cmd addnote failed for primary node: Host misconfigured. IP address of ... not found on local interface"

Earlier on, I had changed the IP address, as the first one assigned was already in use, but I'd forgotten to change the IP address in the /etc/hosts file. After doing that and restarting the network (service network restart), everything worked fine.

As a side node, I can mention that it can be pretty confusing manoeuvering through the various log files. Check this post by Eric Siebert for further explanation of VMware log files on VI3.

Wednesday, February 3, 2010

Differences between Windows Server 2008, SP2, and R2

So what are the differences between win2k8, win2k8 SP2, and win2k8 R2? These naming conventions and differences between versions are a constant cause for confusion. So here's the short take:

Win2k8 was first released with SP1. Later on came Win2k8 SP2.
Win2k8 R2 is the new version of the OS that introduces several new features. It has the look and feel of Win7, it is only x64 bit, and Hyper-V Quick migration (~VMotion) is introduced.

There's no SP2 installed on top of win2k8 R2. R2 is a clean install or you can upgrade from SP2 to R2. In any case, the SP2 will disappear and it will only be called R2.

The reason for pointing this out is that it was a bit different with win2k3. Here, you installed SP2 and then you installed R2 on top of SP2 and the result was win2k3 SP2 R2 - so service pack and R2 at the same time.

I found this comparison somewhere and I quite like it (not quite sure how correct it is, though..)

Windows Vista SP1 ~ Windows Server 2008 SP1

Windows Vista SP2 ~ Windows Server 2008 SP2

Windows 7 ~ Windows Server 2008 R2

Thursday, January 14, 2010

Finally the VCP4 certification Welcome Kit arrived

Today, the official VCP4 certification arrived in the mail. Six months(!) did it take for VMware to send the papers. There's no visible sign on the certification to indicate that it was taken as part of the beta exam - except for the date in the lower left corner (July, 16th 2009) indicating that it was achieved before the official release in August 2009.

As a bonus, a free license for Workstation 7 was included in the package which is pretty cool.




Thursday, December 24, 2009

P2V pre-migration checklist - and considerations

My prevoius post was a P2V post migration checklist. This post is a pre-migration checklist which is about all the information that should gathered and checked before doing any P2V conversions.

I have been involved in a number of larger P2V projects (+50 P2V's) and, in my experience, proper planning is a key element for a succesful project. Typically, you, as a VMware- or P2V person, have no real knowledge of the Windows servers to be converted - their just another server. This means that you rely on other people to collect relevant data on your behalf. Such a setup has an important implication. As you have no knowledge of the server, it cannot be released into production by yourself, you should let a Windows guy verify the OS after which it can be handed over for application testing. Resources for both tests should be allocated up front by the project manager and they should be standing by in the agreed maintenance window.

In regards to the length of maintenance windows, we have had the most succes with long time frames during weekends - e.g. 36 hours from Saturday 08.00 a.m. to Sunday 08.00 p.m. Obviously, such a window can be difficult to obtain, but it has two significant advantages: 1) Specifying the actual conversion time can be tricky - it happens that a 30 GB server takes 12 hours to convert for one reason or the other. 2) It is less stressfull to do P2V's during weekends and a long window will let you work at your own pace, Furthermore, conversions can run over night if they have large disks (e.g. + 200 GB).

Now, a few words about the checklist. Over time, it has been gradually extended as we have learned important lessons - some of them the hard way where. For example, a server that hadn't been checked for hardware dongles, then you need to roll back - or e.g. a VLAN that hadn't been properly trunked... A specific list will match a specific scenario so, typically, the list will be modified to some degree for each project. However, a large part of the list will remain the same, so hopefully it can be used for inspiration. We use Sharepoint 2007 to organise the lists. These can be dynamically updated, which is practical when multiple persons have to update at the same time.

  • Servername
  • OS type
  • Server model
  • Has Capacity Planner run for this server?
  • # of CPU sockets
  • # of CPU cores
  • Amount of physical memory installed
  • Physical disk capacity (C-drive, D-drive, etc.)
  • Current CPU usage (preferably from cap. planner)
  • Current memory usage (preferably from cap. planner)
  • Current physical disk usage (C-drive, D-drive, etc.)
  • # vCPU’s that should be assigned
  • Amount of memory to be assigned to VM
  • Sizes of vDisks after resizing (C-drive, D-drive, etc. – remember separate .vmdk’s for each logical volume)
  • Total size of vDisks (then you can sum up total disk capacity needed and ask for storage up front)
  • Local administrator credentials (local windows accounts are recommended)
  • “Ipconfig /all” screendump attached to list (this is to ensure you have the right IP and mac address)?
  • ILO-information (address, credentials) (if you have to do cold migration)
  • Has server been defragmented (this can significantly speed up conversion rates)?
  • Has server been checked for hardware dongles?
  • Has VLAN been trunked?
  • Remote access type (RDP, Netop)? (for stopping services up front)
  • Physical server location
  • Applications on server
  • What services to stop on server before conversion
  • OS tester contact info
  • Application tester contact info
  • Server to be converted by (employee)
  • Date for conversion
  • Conversion progress/status (not begun, P2V begun, handed over to OS testing, released to production, etc.)
  • Has physical server been shut down?
  • Notes

Sunday, December 13, 2009

P2V post migration checklist

When doing P2V projects, I usually have a short, written checklist on my desk to make sure I remember everything. The list is as follows:

For hot migration:

  1. Disable relevant services on the source machine
  2. When configuring the P2V, don't set the VM for autoboot upon completion
  3. Adjust hardware on the VM before first boot (remove serial ports etc)
  4. Check that VMware Tools installs correctly
  5. Adjust the HAL if needed
  6. Uninstall HP software
  7. Remove hidden NIC's
  8. Set IP - if static IP (Start -> Run -> ncpa.cpl)
  9. Check services.msc to ensure that all automatic services are running (and that you re-enabled the ones that you disabled to begin with)
  10. Shutdown the physical server (shutdown /s /t 0 from CMD)
  11. Ping -t the physical server and when it stops responding, then enable the NIC on the VM
  12. Reboot the VM

After this, I typically handover the VM to the Windows Operations team which check the eventlog and such, and then they hand it over to the application testers before releasing it into production.

Sunday, November 22, 2009

VLAN testing in ESX 3.5

In larger organisations, typically, the network department and the VMware group are seperated in different teams. So as a VMware administrator you need to ask the network department to trunk VLANs to the physical switch ports that your ESX is connected to. It happens that the network department misses a port or a VLAN which means that you can end up with a VM loosing network connection after e.g. a VMotion. Unfortunately, the responsibility can land on the VMware administrator for putting a host into production without testing VLAN connectivity. Unfair, but that's life.

But testing VLANs the manual way is rather time consuming. Especially if you have multiple hosts with multiple nics and multiple VLANs. The number of test cases quickly amount to the impossible. If, for example, you have five hosts, five VLANs and 4 NICs in each host, that means (5 x 5 x 4) 100 test cases.

The traditional way of testing is to create a vSwitch with only one vmnic connected. Then connect a VM on that vSwitch with one of the VLANs. Configure an IP address in the address space of the VLAN and ping the gateway. Do this for all the VLANs, and then connect the next vmnic to the vSwitch and start over.

The following method speeds up VLAN testing significantly (in this case from 100 to 16 test cases). It is not totally automated, but I have found it very useful nonetheless.

The basics of it is that you configure a port group to listen on all available VLANs and then you enable VLAN tagging inside the VM and do your testing from there:

1. Create a port group on the vSwitch with ID 4095. This will allow the VM to connect to all available VLANs available to the host.

2. Enable VLAN tagging from inside the VM. This only works with the E1000 intel driver which only ships with 64 bit Windows. So if you have a 32 bit Windows server, then you need to first modify the .vmx file and then download and install the intel E1000 driver from within Windows. This link describes how this is done. Note that when modifying the .vmx, add the following line:

Ethernet0.virtualDev = "e1000"

Note that if you use the default Flexible nic to begin with, there's no existing entry for the nic in the .vmx, so just add the new entry.

Under Edit Settings for the VM, attach the NIC to the VLAN with id 4095.

3. Now you can add VLANs in the VM. Go to the Device Manager and then Properties for the E1000 NIC. There's a tab that says VLANs (see screendump below). As you add VLANs, a seperate NIC or "Local Area Connection" is created for each VLAN. It is set for DHCP, so if there's a DHCP server on that network it will receive an IP automatically. If not, you will need to configure an IP for that interface manually (e.g. by requesting a temporary IP from the network department.). For quickly configuring the IP, you can run the following command from CMD or a batch (.cmd) script:

netsh int ip set address "local area connection 1" static 192.168.1.100 255.255.255.0 192.168.1.254 1

4. Now we will use the Tracert (traceroute) command to test connectivity. The reason that we can't use Ping is the following: If you have multiple VLANs configured and you ping a gateway on a given VLAN - and the VLANs happen to be routable - then you will recieve a response from one of the other VLANs even though the one your are testing is not necessarily working.

But when using Tracert, then you can be sure that if the gateway is reached in the first jump, then the VLAN works. If the VLAN doesn't work, then you will see Tracert doing multiple jumps (via one of the other VLANs) before reaching the gateway (or it will fail if there's no connectivity at all). You can create a simple .cmd file with a list of gateways that you execute from the CMD prompt. Example file:

tracert 192.168.1.254
tracert 10.10.1.254
tracert 10.10.2.254

See below for example screendump.

Before running the batch script you need to have only one physical nic connected to the vSwitch. You can do this in one of two ways. 1) create a seperate vSwitch and connect only one vmnic at a time. Then you control it from VC. Or 2) you unlink all vmnics but one from the service console (COS) with the following commands:

ssh to the ESX host
esxcfg-vswitch -l (to see current configuration)
esxcfg-vswitch -U vmnic1 vSwitch0 (this unlinks vmnic1 from vSwitch0)
esxcfg-vswitch -L vmnic0 vSwitch0 (this links vmnic0 to vSwitch0)

These commands work instantaneously so you don't have to restart the network or anything. Then you run through the test on one vmnic at a time. When done with a host, you VMotion the VM to the next host in the cluster and continue the test from there.