Thursday, December 24, 2009

P2V pre-migration checklist - and considerations

My prevoius post was a P2V post migration checklist. This post is a pre-migration checklist which is about all the information that should gathered and checked before doing any P2V conversions.

I have been involved in a number of larger P2V projects (+50 P2V's) and, in my experience, proper planning is a key element for a succesful project. Typically, you, as a VMware- or P2V person, have no real knowledge of the Windows servers to be converted - their just another server. This means that you rely on other people to collect relevant data on your behalf. Such a setup has an important implication. As you have no knowledge of the server, it cannot be released into production by yourself, you should let a Windows guy verify the OS after which it can be handed over for application testing. Resources for both tests should be allocated up front by the project manager and they should be standing by in the agreed maintenance window.

In regards to the length of maintenance windows, we have had the most succes with long time frames during weekends - e.g. 36 hours from Saturday 08.00 a.m. to Sunday 08.00 p.m. Obviously, such a window can be difficult to obtain, but it has two significant advantages: 1) Specifying the actual conversion time can be tricky - it happens that a 30 GB server takes 12 hours to convert for one reason or the other. 2) It is less stressfull to do P2V's during weekends and a long window will let you work at your own pace, Furthermore, conversions can run over night if they have large disks (e.g. + 200 GB).

Now, a few words about the checklist. Over time, it has been gradually extended as we have learned important lessons - some of them the hard way where. For example, a server that hadn't been checked for hardware dongles, then you need to roll back - or e.g. a VLAN that hadn't been properly trunked... A specific list will match a specific scenario so, typically, the list will be modified to some degree for each project. However, a large part of the list will remain the same, so hopefully it can be used for inspiration. We use Sharepoint 2007 to organise the lists. These can be dynamically updated, which is practical when multiple persons have to update at the same time.

  • Servername
  • OS type
  • Server model
  • Has Capacity Planner run for this server?
  • # of CPU sockets
  • # of CPU cores
  • Amount of physical memory installed
  • Physical disk capacity (C-drive, D-drive, etc.)
  • Current CPU usage (preferably from cap. planner)
  • Current memory usage (preferably from cap. planner)
  • Current physical disk usage (C-drive, D-drive, etc.)
  • # vCPU’s that should be assigned
  • Amount of memory to be assigned to VM
  • Sizes of vDisks after resizing (C-drive, D-drive, etc. – remember separate .vmdk’s for each logical volume)
  • Total size of vDisks (then you can sum up total disk capacity needed and ask for storage up front)
  • Local administrator credentials (local windows accounts are recommended)
  • “Ipconfig /all” screendump attached to list (this is to ensure you have the right IP and mac address)?
  • ILO-information (address, credentials) (if you have to do cold migration)
  • Has server been defragmented (this can significantly speed up conversion rates)?
  • Has server been checked for hardware dongles?
  • Has VLAN been trunked?
  • Do server application licenses have any binding to MAC or IP address?
  • Remote access type (RDP, Netop)? (for stopping services up front)
  • Physical server location
  • Applications on server
  • What services to stop on server before conversion
  • OS tester contact info
  • Application tester contact info (for pre- and post migration test)
  • Server to be converted by (employee)
  • Date for conversion
  • Conversion progress/status (not begun, P2V begun, handed over to OS testing, released to production, etc.)
  • Has physical server been shut down?
  • Notes

Sunday, December 13, 2009

P2V post migration checklist

When doing P2V projects, I usually have a short, written checklist on my desk to make sure I remember everything. The list is as follows:

For hot migration:

  1. Disable relevant services on the source machine
  2. When configuring the P2V, don't set the VM for autoboot upon completion
  3. Adjust hardware on the VM before first boot (remove serial ports etc)
  4. Check that VMware Tools installs correctly
  5. Adjust the HAL if needed
  6. Uninstall HP software
  7. Remove hidden NIC's
  8. Set IP - if static IP (Start -> Run -> ncpa.cpl)
  9. Check services.msc to ensure that all automatic services are running (and that you re-enabled the ones that you disabled to begin with)
  10. Shutdown the physical server (shutdown /s /t 0 from CMD)
  11. Ping -t the physical server and when it stops responding, then enable the NIC on the VM
  12. Reboot the VM

After this, I typically handover the VM to the Windows Operations team which check the eventlog and such, and then they hand it over to the application testers before releasing it into production.

Sunday, November 22, 2009

VLAN testing in ESX 3.5

In larger organisations, typically, the network department and the VMware group are seperated in different teams. So as a VMware administrator you need to ask the network department to trunk VLANs to the physical switch ports that your ESX is connected to. It happens that the network department misses a port or a VLAN which means that you can end up with a VM loosing network connection after e.g. a VMotion. Unfortunately, the responsibility can land on the VMware administrator for putting a host into production without testing VLAN connectivity. Unfair, but that's life.

But testing VLANs the manual way is rather time consuming. Especially if you have multiple hosts with multiple nics and multiple VLANs. The number of test cases quickly amount to the impossible. If, for example, you have five hosts, five VLANs and 4 NICs in each host, that means (5 x 5 x 4) 100 test cases.

The traditional way of testing is to create a vSwitch with only one vmnic connected. Then connect a VM on that vSwitch with one of the VLANs. Configure an IP address in the address space of the VLAN and ping the gateway. Do this for all the VLANs, and then connect the next vmnic to the vSwitch and start over.

The following method speeds up VLAN testing significantly (in this case from 100 to 16 test cases). It is not totally automated, but I have found it very useful nonetheless.

The basics of it is that you configure a port group to listen on all available VLANs and then you enable VLAN tagging inside the VM and do your testing from there:

1. Create a port group on the vSwitch with ID 4095. This will allow the VM to connect to all available VLANs available to the host.

2. Enable VLAN tagging from inside the VM. This only works with the E1000 intel driver which only ships with 64 bit Windows. So if you have a 32 bit Windows server, then you need to first modify the .vmx file and then download and install the intel E1000 driver from within Windows (Update: Even for Win 64 bit, you need to download and install E1000 manually. The advanced VLAN option is not included in the default driver). This link describes how this is done. Note that when modifying the .vmx, add the following line:

Ethernet0.virtualDev = "e1000"

Note that if you use the default Flexible nic to begin with, there's no existing entry for the nic in the .vmx, so just add the new entry.

Under Edit Settings for the VM, attach the NIC to the VLAN with id 4095.

3. Now you can add VLANs in the VM. Go to the Device Manager and then Properties for the E1000 NIC. There's a tab that says VLANs (see screendump below). As you add VLANs, a seperate NIC or "Local Area Connection" is created for each VLAN. It is set for DHCP, so if there's a DHCP server on that network it will receive an IP automatically. If not, you will need to configure an IP for that interface manually (e.g. by requesting a temporary IP from the network department.). For quickly configuring the IP, you can run the following command from CMD or a batch (.cmd) script:

netsh int ip set address "local area connection 1" static 192.168.1.100 255.255.255.0 192.168.1.254 1

4. Now we will use the Tracert (traceroute) command to test connectivity. The reason that we can't use Ping is the following: If you have multiple VLANs configured and you ping a gateway on a given VLAN - and the VLANs happen to be routable - then you will recieve a response from one of the other VLANs even though the one your are testing is not necessarily working.

But when using Tracert, then you can be sure that if the gateway is reached in the first jump, then the VLAN works. If the VLAN doesn't work, then you will see Tracert doing multiple jumps (via one of the other VLANs) before reaching the gateway (or it will fail if there's no connectivity at all). You can create a simple .cmd file with a list of gateways that you execute from the CMD prompt. Example file:

tracert 192.168.1.254
tracert 10.10.1.254
tracert 10.10.2.254

See below for example screendump.

Before running the batch script you need to have only one physical nic connected to the vSwitch. You can do this in one of two ways. 1) create a seperate vSwitch and connect only one vmnic at a time. Then you control it from VC. Or 2) you unlink all vmnics but one from the service console (COS) with the following commands:

ssh to the ESX host
esxcfg-vswitch -l (to see current configuration)
esxcfg-vswitch -U vmnic1 vSwitch0 (this unlinks vmnic1 from vSwitch0)
esxcfg-vswitch -L vmnic0 vSwitch0 (this links vmnic0 to vSwitch0)

These commands work instantaneously so you don't have to restart the network or anything. Then you run through the test on one vmnic at a time. When done with a host, you VMotion the VM to the next host in the cluster and continue the test from there.


Saturday, November 14, 2009

Howto: Using Find command in Service Console

From time to time you need to locate stuff in the service console and the only command you got is find. 'Locate' unfortunately hasn't been included in the COS. Typically, I forget the syntax and think of another way of locating files - but actually it's pretty simple.

to use the Find command, do the following:

#find / -name searchstring
(#find 'path' -name 'searchstring')

so, if your looking for sshd_config file somewhere in /etc/ it would look like this:

#find /etc/ -name ssh_config

this will be a search on the complete file name. You can use wild cards as well, e.g.:

#find /etc/ -name ssh_co*

Tuesday, November 10, 2009

Prerequisites - Capacity Planner analysis

Before starting a Capacity Planner excercise, there is a number of things that should be in place. The following is typically what I send to customers and ask them to have in place beforehand:

No.

Prerequisite

Status

1.

Account with administrative rights on all servers to be surveyed, so: A Windows account(s) that have administrative privileges on local servers.

Username:

Password:

Domain:

2.

1 x Windows 2k3 or 2k8 server that we can install the Capacity Planner application on: Windows 2003 SP2 standard with min. 1 GB mem, 1 cpu 5 GB free on D-drive. Can be virtual. Should be joined to the domain where we collect data. We should have RDP-access to this server. This server can be virtual.

The server needs internet access as performance data will be uploaded to optimize.vmware.com on port 80 and 443 TCP outbound.

Servername:

Specifications:

Domain:

RDP available:

Internet access:

3.

Local windows firewall should be disabled on clients to be surveyed or the following ports must be opened in local firewalls, inbound: TCP/UDP Ports: 135-139 and 445 (They are used for communication between Capacity Planner Data collector and windows hosts).

Firewalls disabled:

Or ports opened:

4.

WMI and Remote Registry services should be running on all servers to be surveyed (typically they are running by default)

WMI is running:

Remote Registry is running:

5.

A list of servers to be surveyed

List in .csv or .xls:

It is recommended that performance data is collected for a period of minimum 30 days and no less than 14 days.

Thursday, November 5, 2009

Gartner strategic tech list 2010

In 2009, virtualization was way up on Gartner's list of strategic technologies. For 2010, virtualization again sets a big fingerprint on the list. Virtualization has been split up into several sub concepts which are represented indiviually on the list.

Configuration notes for HA

A while back, we experienced a number of inconvient HA failover false positives where several hundred VMs were powered down even though there was nothing wrong with the hosts. The cause of these incidents were apparently a hick-up in the network lasting more than 15 seconds. To avoid such issues, we decided to disable HA until we were absolutely that we had a proper HA configuration.

In the following, there is a quick guide to the HA settings, that we use. These correspond to current best practice.

For reference, we have used the HA deepdive article from Yellow-bricks and article by Scott Lowe on HA configuration notes.

Das.failuredetectiontime
the default timeout for HA is 15 seconds. Best practice is to increase this to 60 seconds or 60.000 miliseconds. To do this, add the following entry under VMware HA -> Advanced options:

Option: das.failuredetectiontime
Value: 60.000

The input is validated, so if you spell it wrong you will be prompted with an error.

Das.isolationaddress
The default isolation address is the default gateway which is pinged if there is no contact between the hosts. However, the default gateway can be some arbitrary place in the network, so it can sometimes be useful to insert one or more extre isolation addresses. It makes sense to add an IP as close to the host as possible e.g. a virtual IP on a switch.

Option: das.isolationaddressX (X=1,2,3,...9)
Value: IP address

Host isolation response
For fibre channel storage, we choose "leave powered on". In a HA failover situation, the active primary node in the cluster will try to boot the VM on the failed host. However, if the host is not down, there will be a vmfs file lock on the VMs and therefore they can't be restarted. HA will try to restart VMs five times. Worst case scenario is that VMs on a host loose network connection... (in vSphere, default response has been changed to "shut down").
For iSCSI storage and other storage over IP, the best practice isolation response is power off to avoid split brain situations (two hosts having write access to a vmdk at the same time).

Cisco switches and port fast
In a Cisco network environment, make sure that 'spanning-tree port fast trunk' is configured on all physical switch ports connected to the ESX host. This ensures that ports are never in 'listen' or 'learn' state - only in 'forwarding' state. So if e.g. one of the uplinks to the COS goes down, you don't risk an isolation response because the delay to put the other port/uplink into forwarding state is longer than the isolation timeout.

Example on a configured interface on a Catalyst IOS based switch:

interface GigabitEthernet0/1
description #VMWare ESX trunk port#
no ip address switchport
switchport trunk encapsulation dot1q
switchport trunk allowed vlan
switchport mode trunk
switchport nonegotiate
spanning-tree portfast trunk

HP Blade enclosures - primary and secondary nodes
Due to the fact that there can be no more than five primary nodes in a cluster, a basic design rule is that there should be no more than a maximum of four hosts in a Blade enclosure per cluster. If five or more hosts (and they all happen to be primary nodes) are located in an enclosure and it fails (which happens...), then no VMs will be started. This matter is explained well in the Yellow-bricks article mentioned above. Furthermore, clusters should be spread over a minimum of two enclosures.

Wednesday, October 14, 2009

Howto: Permission wars in VI3

UPDATE: This setup doesn't entirely work. Templates aren't visible to the users...

This past week, I have been working on an interesting problem. A new internal customer wanted a development environment where they could free hands to deploy and delete VMs, take snapshots etc. To more or less have free hands and the VMware team should provide the virtual infrastructure as a service.

Now, from a virtual infrastructure operations perspective, to give a customer that much freedom is a bit of an administrative nightmare. For example, how do you ensure that a cluster is not overcommitted and how to make sure that all servers are properly registered in the CMDB.

To address the most important issue - from a technical perspective: The customer should not be able to overcommit the cluster. If they have that possibility, then we can't do maintenance, there won't be full failover. The obvious way to go about it is to enable HA and then to check the 'Prevent VMs from being powered on if they violate availability constraints'. However, HA does not have the most sensible way of calculating HA slot sizes and if you only have two hosts in a cluster, then you risk not being able to deploy a new VM even though there are plenty of resources in the cluster.

A colleague of mine suggested that I create a root resource pool in the cluster and then add permissions only on that resource pool and not on the host, cluster, or datacenter level. In theory, this is a pretty good idea, as you can set a hard limit on the resource pool for memory usage (which in my experience is the typical, visible, limiting factor in the cluster). In this case, I set a limit of 50% of available memory and then made the resource pool non-expandable. The resource pool limits in relation to actually used memory - not what is assigned to the VMs, see below.


I created a role similar (I think ;-)) to virtual machine administrator, which can more or less anything at the virtual machine layer (deploy, delete, change, snapshot, mount ISO's etc.) and added this at the resource pool layer. When I started testing, I discovered a number of issues. First, I couldn't create a VM, I couldn't delete a VM, and I couldn't browse the datastore from the VM summary page. But these permissions were already given to the role. If the same role was applied to the cluster or datacenter level, then it worked fine. So it makes a difference at which level the permisssions are applied.

If I apply the role at the cluster level, then everything works in an acces rights perspective, but then the role have too many permissions. Then, they can deploy servers directly in the cluster and will not be forced to deploy into the root resource pool. And then control is lost.

The only way I could work around this issue was to create two seperate role with two different permission sets and then apply them at two different levels of the datacenter.

The first role has very few permissions and is applied at the datacenter level (do not propagate) (this could also be at cluster level, but currently I only have one cluster in the datacenter...). The second role is the actual role that I created in the first place. This role was applied to the cluster level (propagate rights) where a hard limit has been defined for memory.

Below is listed the permission mapping that I have used for both roles.

With setup, the user is completely locked down, so they can only deploy servers in the defined resource pool and they will not be allowed to overcommit. If they do, the VM's won't be able to power on.

In relation to snapshots and running out of space on the LUN, this problems still persists but will not be addressed in this article.

Role 1 (do not propagate rights) - to be applied at datacenter level

Virtual Machine.Inventory.Create

Virtual Machine.Inventory.Remove (otherwise one can’t delete VM from disk)

Virtual Machine.Configuration.Add New Disk

Datastore.Browse Datastore (to be able to browse datastore from VM summary view)


Role 2 (propagate rights) - to be applied at resource pool level

Datastore.Browse Datastore

Datastore.File Management

Virtual Machine.Inventory.Create

Virtual Machine.Inventory.Remove

Virtual Machine.Inventory.Move

Virtual Machine.Interaction.Power On

Virtual Machine.Interaction.Power Off

Virtual Machine.Interaction.Reset

Virtual Machine.Interaction.Answer Question

Virtual Machine.Interaction.Console Interaction

Virtual Machine.Interaction.Device Connection

Virtual Machine.Interaction.Configure CD Media

Virtual Machine.Interaction.Tools Install

Virtual Machine.Configuration.Rename

Virtual Machine.Configuration.Add Existing Disk

Virtual Machine.Configuration.Add New Disk

Virtual Machine.Configuration.Remove Disk

Virtual Machine.Configuration.Change CPU Count

Virtual Machine.Configuration.Memory

Virtual Machine.Configuration.Add or Remove Device

Virtual Machine.Configuration.Modify Device Settings

Virtual Machine.Configuration.Settings

Virtual Machine.Configuration.Change Resource

Virtual Machine.Configuration.Reset Guest Information

Virtual Machine.Configuration.DiskExtend

Virtual Machine.State.Create Snapshot

Virtual Machine.State.Revert to Snapshot

Virtual Machine.State.Remove Snapshot

Virtual Machine.State.Rename Snapshot

Virtual Machine.Provisioning.Customize

Virtual Machine.Provisioning.Clone

Virtual Machine.Provisioning.Create Template From Virtual Machine

Virtual Machine.Provisioning.Deploy Template

Virtual Machine.Provisioning.Clone Template

Virtual Machine.Provisioning.Mark as Template

Virtual Machine.Provisioning.Mark as Virtual Machine

Virtual Machine.Provisioning.Read Customization Specifications

Virtual Machine.Provisioning.Allow Virtual Machine Download

Virtual Machine.Provisioning.Allow Virtual Machine Files Upload

Resource.Assign Virtual Machine to Resource Pool

Resource.Migrate

Resource.Relocate


Thursday, October 8, 2009

Howto: Check if SAN cables are connected in ESX

When installing an ESX host and you have someone other than yourself taking care of the cabling of the host, it is very handy to be able to check wheather this has been done properly. You want to be able to verify that the HBA's have been physically connected to the fabric switches with fibre cables.

Ssh to the ESX host
ls to the /proc/scsi/qla2300 folder (if it's a Qlogic HBA...)
In this folder there are a number of text files named with the numbers 1-x corresponding to the number of HBA ports in your ESX.
Cat the files one at a time:

#cat 1
or
#cat /proc/scsi/qla2300/1

look for the following line in the files:

Host adapter:loop state=READY, flags= 0x8430403

If it says READY, the HBA has been physically connected to the fibre switch. If it says DEAD, then it is not.

Friday, October 2, 2009

VTSP 4 certified

Today, I passed the VTSP 4 (VMware Technical Sales Professional) certification. Apparently, for your company to keep VMware Enterprise Partner status, a minimum of 2 x VCP's, 2 x VSP's, and 2 x VTSP's are required. We have the first two accreditations well covered but needed the VTSP's - so I had to take one for the team together with a couple of the other guys ;-)

The achieve this certification, you need to pass six online tests which you can take at your own pace. These are available through Partner Central. There are self study guides with each test. We received a nice and sweet offer from our distributor to get a two day training session so we got it handled quick and easy...

Monday, September 21, 2009

Supported image formats for Converter 4 Standalone

VMware Converter 4 Standalone supports the following image formats (see screen dump below). Hyper-V is supported but conversion will have to be done as a physical server (see link)



Friday, September 11, 2009

Problems showing performance stats in VC after DB upgrade

We had an incident the other day about a VirtualCenter v2.5 U4 not showing performance statistics for the VM's. It was possible to see live stats for e.g. CPU and memory usage. But changing the chart options to 'weekly' or 'monthly' resulted in a 'Performance data is currently not available for this entity'.

Recently the backend SQL Express server for this VC had been upgraded to a SQL 2005 Standard edition and this was the reason for the error.

In the SQL server, there are three stat-roolup-jobs (which are the ones creating the perf stats in VC) which were not automatically created during the upgrade. These had to be added manually following this KB article from VMware. These are:
  • Past Day stats rollup
  • Past Week stats rollup
  • Past Month stats rollup
After adding the jobs and waiting a couple of hours for all of the jobs to have run, everything worked just fine (The VC DB is called VCDB - UMDB is for Update Manager).

Below is a screendump from the SQL Management Studio after the jobs had been added:

Howto: Removing a disk from a VM - howto identy the right disk?

From time to time, we need to remove disks from a VM. If there's only two or three disks attached to the VM, it's typically not a problem figuring out which one to remove e.g. if the disks have different sizes. But if you have seven or eight disks and they are the same size, then it's a bit more tricky - let's say if you're asked to remove the 'E-drive'. Under 'Edit Settings' for the VM, the disks only have a number which does not necessarily correspond with anything within the VM.

So how to identify exactly which disk that corresponds with a given volume within Windows?


The match can be made by looking at the SCSI target ID for the disk - this can be identified both in WIndows and under 'Edit settings' for the VM (A VM can have four SCSI controllers with up to 15 disks on each controller, so a maximum of 60 disks per VM).

To identify SCSI target ID within the VM:
Go to Computer Management -> Disk Management
Right click a disk and choose Properties


On the General tab you will see the Bus number (SCSI controller) and the Target ID (SCSI target ID), note the number - in this case below the ID is 4.


To identify SCSI target ID from the VI client:
Now go to 'Edit Settings' for the VM under and locate the disk with the corresponding target ID (see Virtual Device Node for the disk). Make sure the that the controller number and SCSI ID is the same. In this case it is Hard Disk 5 that have SCSI ID 4.

Shut down the VM to remove the disk.

Thursday, September 10, 2009

SVMotion GUI plugin for VI Client in VI3

Lost Creations has made this very popular GUI plugin for doing SVMotion from the VI Client. It has been out there for quite some time, so this post is merely for my own reference (I actually thought I had posted about this before...)

It's absolutely a 'must have' tool for daily operations of the virtual infrastructure.

(Update 2011.01.05: Use this link for download in stead)

Go here for installation guide.


SVMotion with .vmdk's on different LUN's

Yesterday, I had to extend a number of disks on a VM. There were about seven .vmdk's spread over three different LUN's which were all out of space. In VI3 there's really no good way to increase a LUN (unless you use extend, but don't), so to increase the disk sizes of the .vmdk's, a larger LUN had to be created onto which the .vmdk's could be moved before extending them.

The storage guys create a 1 TB LUN for the VM. So, I wanted to use SVMotion to move the .vmdk's one by one to the new LUN. If you start out with a disk that is not the primary, or OS, disk you will get an error (I'm using the GUI plugin from Lost Creations), so you can only move the primary disk. However, when you move that primary disk, all of the .vmdk's attached to that VM will be moved with the VM at the same time and will be placed on the target LUN.

So when SVMotioning, all .vmdk's attached to that VM are moved at the same time. Therefore, make sure to have enough space on the target LUN.

Saturday, September 5, 2009

I passed the VCP 4 beta exam!

Hurrah! This Monday I received an email notification from VMware stating that I have passed the VCP4 beta exam on vSphere 4 that I took on July, 16th. For that I got a fancy little VCP button to wear at VMworld ;-) The email notification only mentioned that I passed and then a formal score report will be sent next week...

Friday, August 28, 2009

VMworld 2010 Europe - in Copenhagen!

I just saw on ntpro.nl today that VMworld Europe 2010 has been moved from Cannes to Copenhagen (11-14th October). That really warms my heart ;-) One should think, however, that it would be a bit more exotic to have it in Amsterdam in stead - also because VMware apparently is moving very fast there - but maybe it's a question of facilities.

Anyways, it'll be a nice and short 20 minutes trip next year ;-)

Tuesday, August 11, 2009

Access rights and permissions in vSphere

In more than one instance, I have experienced a situation where we had issues with managing permissions in both vCenter (vSphere 4) and VirtualCenter (VI3). The issue is that a user loses access rights when a group to which the user belongs is added with less permissions.

An example could be that a given user, 'UserA', has administrator rights at the top level (Hosts and Clusters) and then at a lower level (let's say at Datacenter level), a given security group, Group1, in which UserA exists is given, let's say, 'Virtual Machine User' rights. This will decrease the permissions for UserA on that datacenter to only Virtual Machine User in stead of Administrator - he cuts the tree under himself, so to speak.

The consequence can be that in stead of risking this scenario of suddenly losing access rights when groups are added, then security groups are not used at all, only single users are added. This is not a problem when only a few users needs acces to the vCenter or VirtualCenter. However, if many users need access, e.g. 20-40 employees, it gets rather complex to manage.

To be absolutely sure how these permissions work, I have done a bit of testing on both vCenter and VirtualCenter.

Test cases

First of all, permissions seem to work identically in both versions, that is VI3 (VirtualCenter) and vSphere (vCenter). Furthermore, when permissions are changed in vCenter, then they are applied more or less instantaneously. So if you change or configure permissions for a user that has the vSphere client open, then the changes will appear to the user at the same time while he has the vSphere client (or VI Client) open (this makes it nice and easy for testing purposes, by the way...)

If the administrator role is assigned to UserA at the Hosts and Clusters level, and then he is assigned less permissions at a lower level (e.g. at a given Cluster), then the less permisssions on that lower level will win.

It works the same way the other way around, if UserA has 'Read only' on Host and Clusters and Administrator rights at a given Datacenter, then UserA will have full rights on that Datacenter and read only on the rest of the virtual environment.

If UserA has Administrator rights at the Hosts and Clusters level and at the same time a group to which UserA belongs is added with Read only to the same level - the interesting question is which of the two different permision levels will UserA be granted, Administrator (as a single user) or Read only (as he belongs to the group)?
The answer is that the highest defined permissions defined at a given level for a user will win. In the case UserA will have administrator rights at hosts and clusters level.

Administrators group

Another thing to be aware of is that Windows Administrators on the vCenter server are automatically added as administrators in vCenter. If you do not intend to give all of your Windows admins full acces to your VMware environment, then remove the 'Administrators' group from vCenter (in stead, you can add the local administrator user a an administrator in vCenter, so you have the possibility to log in with a local account should AD fail..)

Security groups or Distribution lists

Only security groups defined in Active Directory (AD) can be used as groups in vCenter. Distribution lists won't work.

Recommendations for managing users

In regards to the use of groups for managing users in vCenter, I recommend that groups are used at the hosts and clusters level (of course, this can vary greatly depending on your setup). For example, you could have three groups:

  • VMware admins (Administrator)
  • VM admins (Deploy/destroy rights, change VM specs, etc.)
  • Windows admins (console access to the VMs, similar to ILO access on physical servers)

Even though a VMware admin belongs to several groups, as long as these are defined at the same level, then he/she will retain administrator rights.

By using security groups, then the VMware admins won't have to manage user administration on the VMware environment. When a user is added to a given group in AD (this should be handled by your user administration department or system), then he automatically gets access to vCenter.

For more info, see Basic System Administration guide pp. 213-230 on VMware.com.

Monday, July 27, 2009

Expert level on Communities - Whoop!

That's right! Today, I reached Expert level (750 points) on the VMware Communities forum. I have been active on the english forums for about four months now, and that is where I have collected most of the points. Next level is Master and it requires 2000 points so that's propably going to take a while to reach ;-)

Click here for link to overview of the community levels.



Resizing disks in VMware Workstation

If you want to increase the size of a virtual machine (VM) in VMware Workstation, you can use the command line tool, vmware-vdiskmanager, from a command prompt. The command can be executed from the VMware Workstation folder under Program Files\VMware. The VM should be powered off.

The following command will increase the size of the virtual disk to 30 GB. In this case, the .vmdk file resides on a network share.

C:\Program Files\VMware\VMware Workstation>vmware-vdiskmanager -x 30GB "\\FILESERVER\folder-X\My Virtual Machines\testserver\testserver.vmdk"

This will work both on a disk where all space has been allocated and disk that are allowed to grow.

VLAN trunking / grouping in distributed virtual switch

In vSphere, there's a new networking feature which can be configured on the distributed virtual switch (or DVS). In VI3 it is only possible to add one VLAN to a specific port group in the vSwitch. in the DVS, you can add a range of VLANs to a single port group. The feature is called VLAN trunking and it can be configured when you add a new port group. There you have the option to define a VLAN type, which can be one of the following: None, VLAN, VLAN trunking, and Private VLAN. But this can only be done on the DVS, not on a regular vSwitch. See screendumps below (both from vSphere environment)


Thursday, July 23, 2009

Links to VMware documentation

I always seem to forget where VMware has placed their documentation and in which docs to look for what documentation (maybe it's just me). So here's a few links:

vSphere main documentation:

vSphere Upgrade guide:

VI3 documentation:

Hardware Compability List (HCL):

VI3 and vSphere patch download:

Supported guest OS'es


Thursday, July 16, 2009

VCP4 Beta exam

Today, I took the VCP4 Beta exam. And last chance to take is tomorrow. I was lucky to get an invitation, as I fulfilled the two prerequisites: 1. to be a VCP and 2. to have participated in the vSphere 4 beta program. However, I had my doubts about taking the exam as it was announced rather suddenly and it gave me less than two weeks to study for it. But I thought, what the heck, let's study hard for two weeks and then see what happens...

The exam was quite tough, there were more questions than there will be in the final test. Because I'm a non-native English speaker, I got an extra 30 minutes. But still, 270 minutes is not much for 270 questions! That's one question per minute for 4½ hours straight and no breaks... The exam is covered by the NDA so I can't go into details, but I'm glad I reviewed the configuration maximums and then I should have spent more time on resource management (resource pools), iSCSI, paravirtualization, NPIV, and storage.

For study material, I've used:

And now, we just have to wait 6-8 weeks to get the results...

Sunday, June 21, 2009

Going to VMworld 2009!

Oh happy day! Yesterday, we got final confirmation that a colleague and I are going to VMworld in San Francisco in August. Due to the economic crisis and all it has been a bit of a struggle to get the approval, so it was quite the relief.

I was in Las Vegas last year for VMworld 2008 and it was cool so I'm really looking forward to San Francisco this year.

Saturday, June 20, 2009

Howto: Getting the Navisphere Agent for ESX Server

There are several post in the forum about where to download the EMC Navisphere agent. Navisphere is an agent that you install in the Service Console on your ESX Server which helps to manage EMC Clariion storage systems. Click here for more info.

The agent is not publicly available for download. If you have a partner login, then I believe you can download it at http://powerlink.emc.com/ .

The way to go to get the agent is via your storage department. Either they can get the login for you or have them contact EMC, then they will send the software. Navissphere is shipped together with the Clariion storage systems on the Navisphere Server Support CD (see this article page 16). But contact EMC if you want to be sure to have the latest version.

In this document on page 7, it is stated that Navisphere v6.22 is compatible with ESX v3.5

Wednesday, June 17, 2009

ESX 4.0 in Workstation - requires Intel-VT

I have been running ESX 3.5 and ESX 4.0 in VMware Workstation 6.5.1 for a while on my Lenovo T61 from work without any problems. A prerequisite for doing this, at least for ESX 4.0 (an probably also for Hyper-V) as it runs 64-bit, is that the CPU supports virtualisation mode - which in the Intel terminology is called Intel-VT - an which has to be enabled in the BIOS. The T61 is about one year old and has Intel-VT, so I thought that it was standard on all newer Intel processors. But oh-no, this is not the case. I recently purchased a Dell Studio 17 for private use with a Intel Core Duo 2 T6400 processor and I thought that I was in the good house. But - no Intel-VT support. Everything else was in order, 4 GB of memory, Windows 7 64-bit and so on. This was a bit disappointing. If your're looking to buy a new laptop, then check that this feature comes with the CPU. I found an article on ZDnet which lists a number of processors and wheather they have Intel-VT enabled.

The following has been copied from the ZDnet article. YES means that the CPU type supports Intel-VT:


Saturday, June 13, 2009

vCenter Converter Standalone 4 - ports used

We're doing quite a few P2V conversions at the moment, and that means that we see all kinds of weird errors, conversion failures, and connection issues. P2V is definitely not an exact science.

One thing that is recommended to have in order is that proper network ports are opened.

VMware has written a good KB article that explains which ports are used.

If you have server with Converter Standalone installed on it, and you have trouble connecting to the source physical computer, then first make sure that Windows Firewall is disabled. If that doesn't work, then install the Converter application directly on the source computer. Then you will need outbound 443 TCP connection to vCenter (former Virtual Center) (it's assumed that port 443 TCP is open inbound on the vCenter server, of course).

To test if ports are open, open a CMD prompt and run following command:

telnet 'vCenter ip' 443

(without the ' ') If the DOS prompt goes black, then the connection is good. Othervise you will get a 'can't connect' or something similar)

If you P2V directly to an ESX server, then ports 902, 903, and 443 TCP are used.

If you, for some reason, can't get port 443 opened, then a workaround is as follows:

  • Install the Converter directly on the source system
  • If you have an existing test VM in the same IP range, then create a new disk and attach that to the test VM.
  • Make a Windows share on the new disk
  • From the Converter choose to export to standalone virtual machine in Workstation format and then coose to place files on the share just created
  • After export, change the VLAN to an IP range that doesn't have any firewalls blocking
  • Import the VM from within vCenter

Thursday, June 11, 2009

P2V of domain controller

Summary: Cold clone P2V of domain controllers works just fine.


We had to migrate two root domain controllers the other day at work. I knew that domain controllers in particular can give you trouble when being converted / migrated, so I researched it a bit and found a useful article on yellow-bricks.com which linked to a very good VMware KB article . This KB recommends that in stead of migrating, then deploy a fresh VM and do a 'dcpromo' and then shut down the physical server after. I like this way as it moves the responsibility away from the VMware team and over to the application responsible.

However, we did not have enough time to do the recommended solution, so we whent for P2V. We did cold clone because hot migration is likely to go wrong and it is not supported by Microsoft.


There were FSMO roles on the DC's, so before we began, we had the AD guy move all the roles over to one of the servers. Then we took the other one down and P2V'ed it. We resized the disks to save SAN space which was not a problem. When it came back up, the AD guy tested and then moved FSMO roles over to the migrated DC. And then we migrated the other one. After both had been migrated, the AD guy tested again.


If your responisbility area does not cover the application layer, which it does not for me in this case, then arrange for an application responisble to test the app before it is released into production. It may sound banal, but it is sometimes overlooked when the pace is fast and only basic OS testing is done.


Time synchronization


There are several ways of setting up time synchronization. One important point is that there should be only one source for synchronization for all the DC's. There's a feature in VMware tools, where you can synchronize the VM against the ESX - this we did not use. We let Windows take care of the synchronisation. If you have a mixed environment of DCs (bare metal and virtual), then you can let a bare metal DC sync to an external source, and then let all the other DC's sync to the bare metal DC.


We had the PDC emulator sync with a dedicated physical NTP server, and then let the second DC sync with the PDC emulator. The ESX servers sync with the physical NTP server - but no synchronization between VM and ESX server. Read this article for further info on time sync.

Update: In a KB article (KB 888794) from Microsoft about considerations when hosting DC's in a virtual environment, there is one important paragraph about forced unit access (FUA) which has resulted in some confusion. The paragraph states:

"If the virtual hosting environment software correctly supports a SCSI emulation mode that supports forced unit access (FUA), unbuffered writes that Active Directory performs in this environment are passed to the host operating system. If forced unit access is not supported, you must disable the write cache on all volumes of the guest operating system that host the Active Directory database, the logs, and the checkpoint file. "

According to VMware, forced unit access (FUA) is supported on VMware. Here's the answer from VMware technical support:

-----Original Message-----
From: VMware Technical Support [mailto:webform@vmware.com]
Sent: 24. februar 2010 11:25
To: (Jakob Fabritius Nørregaard)
Subject: Re: VMware Support Request SR# 1490632591

** Please do not change the subject line of this email if you wish to

respond. **

Hello Jakob,

Forced Unit Access is supported by VMware. A large number of customer's have virtualized Domain Controllers which is evident in the community forums.

Thanks & Best Regards

Derek Collins

Technical Support Engineer

VMware Global Support Services

1-877-486-9273

VMware Technical Support Knowledge Base

http://kb.vmware.com/kb"

Saturday, June 6, 2009

Howto: 101 Scripting ESX server installation on vSphere 4

I have been wanting to look into scripted ESX installations for a while now but haven't gotten around to it untill now. At first glance it looks a bit complicated - there a several much used deployment tools around (e.g. EDA and UDA), people are posting bunches of deployment scripts etc. I wanted to know the absolute basics - what is the simplest way to script an ESX installation?

First off, I recommend that you download the ESX and vCenter Server Installation Guide and read pp. 43-58 on scripting installations. This documentation helped me to get started more than posts on the web.
On ESX 4, there are two built-in scripts that you can run when you boot the installation CD: 'ESX scripted install to first disk' and 'ESX scripted install to first disk (overwrite VMFS)'. But that's a little boring as these scripts can't be modified.


In stead, we can let ESX server generate a script for us based on your own installation. I like this way as it simplifies things compared to the very comprehensives scripts out there - and it fits to your environment. Better to have a simple script that works than to have a do-it-all script that doesn't. You can always expand it later.

When you install ESX 4 in the default graphical mode, then a Kicstart script (ks.cfg) with your specific settings is generated and placed in the /root/ folder of your ESX installation. Make a copy of this file, as this is the one we will be using as our base script.

This script is to be copied to the root of the installation ISO. To do that, you need an ISO modifying tool like MagicISO (you need to pay 29$ to make ISO's larger than 300 MB). Open the ESX 4 installation ISO in MagicISO and copy the ks.cfg file into the root of the ISO.

Now, boot your server with the ESX ISO. When the first installation screen shows as below, then hit F2 to get 'other options'. Then shift down to the 'ESX scripted install using USB ks.cfg'. We will not be installing from USB, we will just use the command as a template and modify it to get the ks.cfg script from the CD in stead.

Modify the boot options command like this:

Boot options initrd=initrd.img vmkopts=debugLogToSerial:1 mem=512M ks=cdrom:/ks.cfg quiet

That's it. This will do a basic scripted installation of the ESX 4 server...


Mini troubleshooting: I tried to reinstall the ESX server that I had already installed, which means that there is already a VMFS partitioned disk on the server. So the clearpart command needed the --overwritevmfs flag to work. Furthermore, in the partitioning section I had to comment out some lines and in stead uncomment the 'part' commands with the --firstdisk flags.

I have pasted the basic script below for reference.

----------------sample ks.cfg----------------------
# Don't edit script in notepad or Word. Use Notepad++ or like app

accepteula

keyboard dk

auth --enablemd5 --enableshadow

#I have added the '--overwritevmfs' flag which is
#necessary when reinstalling an existing ESX

clearpart --overwritevmfs --firstdisk

install cdrom

#The encrypted password is taken from the original
#graphical install

rootpw --iscrypted $1$k364YM8i$CyveR0PWuw294uX8HLzcE0

timezone --utc 'Europe/Stockholm'

network --addvmportgroup=true --device=vmnic0 --bootproto=dhcp

part '/boot' --fstype=ext3 --size=1100 --onfirstdisk
part 'none' --fstype=vmkcore --size=110 --onfirstdisk
part 'Storage1' --fstype=vmfs3 --size=8604 --grow --onfirstdisk

virtualdisk 'esxconsole' --size=7604 --onvmfs='Storage1'

part 'swap' --fstype=swap --size=600 --onvirtualdisk='esxconsole'
part '/var/log' --fstype=ext3 --size=2000 --onvirtualdisk='esxconsole'
part '/' --fstype=ext3 --size=5000 --grow --onvirtualdisk='esxconsole'

%post --interpreter=bash

----------------sample ks.cfg EOF----------------------