Supermicro bug on running VMs in Nested method.

One of our internal cloud environment which is running ESXi hosts under SuperMicro ( SYS-1027GR ) got failed with the PSOD referencing Fatal (unrecoverable) MCE on one of physical CPU .It happened almost all the ESX hosts on the cluster frequently but not always right away after the restart.

Sample from the log:

cpu2:33455)@BlueScreen: Machine Check Exception: Fatal (unrecoverable) MCE on PCPU2 in world 33455:vmnic1-pollW System has encountered a Hardware Error – Please contact the hardware vendor

Raised the support case with SuperMicro and VMware , after long investigation VMware Engineer identified it is known BUG in the SuperMicro  Servers if we running any VM in a nested manner.The issue is due to Intel erratum (which Intel has to acknowledge and release microcode/BIOS fix) this is a long-term solution to this issue. It seems that nested virtualization combined with PCI passthrough caused some errata in the CPU microcode on the Intel CPUs to make the hosts crash. 1603071 is the related bug number  mentioned by VMware.

To fix this PSOD, temporarily we have disabled the  Hardware Virtualization from the VMs option. Working with SuperMicro and VMware for the workaround and lon-term solution.

Advertisement
This entry was posted in ESXi issue, VMware and tagged , , , . Bookmark the permalink.

1 Response to Supermicro bug on running VMs in Nested method.

  1. Pingback: Newsletter: May 21, 2017 | Notes from MWhite

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s