Over the past couple of days, we've had a VM that has crashed a number of times. When you try to open the VM console you get a black screen and a yellow MKS error at the top of the console. Strangely enough the VM can still be pinged. After powering it off and on again it boots but after not too long the same thing happens again. Also, vMotion did not work for a number of VMs and fails with the following error:
"The operation is not allowed in the current state"
In the vmkwarning log there are the following entries:
"WARNING: Heap: 2900:
Heap_Align(vmfs3, 6160/6160 bytes, 8 align) failed. caller: 0x41800d8e84e9"
"The operation is not allowed in the current state"
In the vmkwarning log there are the following entries:
"WARNING: HBX: 1889:
Failed to initialize VMFS3 distributed locking on volume
50d136be-62d92875-869a-10604bace2cc: Out of memory"
"WARNING: Fil3: 2034:
Failed to reserve volume f530 28 1 50d136be 62d92875 6010869a cce2ac4b 0 0 0 0
0 0 0"
"WARNING: Heap: 2525:
Heap vmfs3 already at its maximum size. Cannot expand."
I found a KB article and a post from Cormac Hogan that explains the issue.
In ESXi 5.0 U1 the default VMFS heap size is set to 80 which means that the maximum total size of open vmdk files is 8 TB. When that limit is reached, then VMs can't access their disks.
There are two ways to fix this:
- Upgrade to ESXi 5.1 U1 (or ESX 5.0 patch 5)
- Increase the VMFS3.MaxHeapSizeMB to 256 (default is 80) in Configuration -> Advanced Settings and reboot the host
Upgrading to ESXi 5.1 U1 increases the maximum file total size of open vmdk's to 60 TB in stead of 8 TB.
Increasing the heap size to 256 will increase the maximum to 25 TB.