My flight got delayed after the VMworld and have to spend few hours in the airport so just to make something interesting made this Mind-map for dvSwitch . Hope it will be helpful for the certificate preparation.
In our environment we have around 12 VCSA 6.0 U2 ( vCenter Appliance) in different region with various sizes , most of our environment will be running with around 20-25 ESXi hosts in two or three clusters and 800 – 1300 VMs so whenever any issue on the environment it will be always challenging to upload the logs to the VMware support because the support log size will be more that 15 GB to 20 GB and at one point log generation will fail because of the log size partition and even after increasing the space of the log partition, it will be big challenge to upload it to the VMware FTP site .
VMware support engineers have no clue about the reason for the huge bundle log size and have no other option to fix the log to upload it in their FTP site and in most cases we use to upload the specific date and log type only to the VMware for the troubleshooting.
I was searching lot of blogs and slack channels but didnt help much but came to know the plugin called VMware support assistance tool which we can directly upload the logs from the Vcenter to the VMware portal so I decided to install the same.
One drawback of this plugin is if the web-client session is time-out then it will interrupt the log upload and the process will be closed so we need to increase the web-client session time-out and it can be done by the help of KB 2040626 so we have increased the session time-out from the default 30 mins to 6 hours which helped us to successfully upload the logs to the VMware portal .
But still it is the security concern on having the web-client session opened for 6 hours and also really want to find out the reason behind the bundle log size growth.After long wait and several conversation with VMware senior engineers , identified that under /storage/core all the old vpxd.core dump will be stored which is not required and we only required live_core.VPXD.* for the vcenter. As per the suggestion we have deleted all the old files form our lab VC and now the log bundle size reduced significantly .
Note: Pls have proper backup or snapshot before deleting . If you wrongly delete the live_core then it will crash the vCenter.
Last month I was working in a security hardening project and implemented as per the standard recommended by the VMware.
We have to show to our internal security team that all the vsphere environment is protected as per the VMware recommendation so we built small web portal which will call the powercli script in the background to validate the hosts and provide the result.
Script will validate the below settings and provide the results
ESXi.config-ntp and service status
DNS IP check
Allowing only corresponding IP in DNS firewall for the UPD\TCP.
Result will show each tasks status and if it is fully protected then it will show as fully protected or else it will show which one is not as per the standard and in case if only one or two tasks are not up to the standard then it will show that corresponding tasks to change and rest it will show as 1 which means it is protected as per the recommendation .
Like mentioned results which is showing as 1 is good and if all the condition is 1 then it will be considered as true and host is fully protected.
Pls download the script from the below link.
Pls check my other blogs on DSM 9.5 and 9.6 and here we can see the steps to activate the VMs when it moved using vMotion to one host to another which will help when we want to activate the agentless protection in old environment with huge list of VMs.
In Trend DSM it is easy to create the event based task to activate the new VMs created in the ESXi hosts but for the computers moved from one host to another we need to use some basic logic to exclude VMs which we dont want to activate and also to assign policy only to the Windows based OS for the agentless protection.
One difficulty on having this event task is it will activate all the VMs to the policy it is defined and in case if you have different policy to Windows and Linux then it will over-write to the same default policy to all the vMotion VMs so we have to create extra condition to exclude the VMs and another condition for the Operating systems.
Go to Administration – Eventbased Tasks – Click New
Select Computer Moved
Select appropriate policy and other actions
First condition is select the ESX hosts
Next condition is important so we need to exclude any VM name which starts app but we dont have option like not matches to directly exclude the VM by name and to fix this use the WWW.REGEXPAL.COM to verify the regular expression which provides the no match or match condition.
FOR EX ” ^(?!.*app).*$ ” use this in the regular expression and in the Test String type some content with the word app and it will show the result as match or no match.
use it according to the requirement and the third condition is to exclude the Linux VMs so select the condition platform and use .*OS.*
One of our internal cloud environment which is running ESXi hosts under SuperMicro ( SYS-1027GR ) got failed with the PSOD referencing Fatal (unrecoverable) MCE on one of physical CPU .It happened almost all the ESX hosts on the cluster frequently but not always right away after the restart.
Sample from the log:
cpu2:33455)@BlueScreen: Machine Check Exception: Fatal (unrecoverable) MCE on PCPU2 in world 33455:vmnic1-pollW System has encountered a Hardware Error – Please contact the hardware vendor
Raised the support case with SuperMicro and VMware , after long investigation VMware Engineer identified it is known BUG in the SuperMicro Servers if we running any VM in a nested manner.The issue is due to Intel erratum (which Intel has to acknowledge and release microcode/BIOS fix) this is a long-term solution to this issue. It seems that nested virtualization combined with PCI passthrough caused some errata in the CPU microcode on the Intel CPUs to make the hosts crash. 1603071 is the related bug number mentioned by VMware.
To fix this PSOD, temporarily we have disabled the Hardware Virtualization from the VMs option. Working with SuperMicro and VMware for the workaround and lon-term solution.
As per the KB 2148819 , TLSv1.0 has been disabled from the VC and issue started to connect the VC from the desktop client and also using the powercli .
Error from the Desktop client:
Error from the Powercli
Using the web-client there is no issue on connecting the vcenter or ESX hosts and only the issue is from the desktop client and powercli . After reading the KB again noticed that they already mentioned in notes about the issue and pointed to the another KB 2149000 which describes the issue and add to do few changes on the below file with few MS .Net patches
C:\Program Files (x86)\VMware\Infrastructure\Virtual Infrastructure Client\Launcher\VpxClient.exe.config
Edit the VpxClient.exe.config file by setting the parameters
<add key = "EnableTLS12" value = "false" /> as
<add key = "EnableTLS12" value = "true" />
After doing the changes also had the same issue and finally it got resolved by re-installing the desktop client.
But still connecting the vCenter using the powercli was not fixed and finally found the another KB 2137109 which asked to do the below registry changes which fixed the issue.
Must use PowerCLI 6.0 R1 or later. Earlier versions of PowerCLI work with versions of the .NET Framework that cannot use the TLSv1.1 and TLSv1.2 protocols by editing the registry.
- For 32-bit processes, change the following registry key value to 1.
- For 64-bit processes, in addition to the above registry key, change the following registry key value to 1.
We have the PRD setup with external PSC and VC which is configured with the Primary DNS and Secondary DNS . Due to the hardware issue on our primary DNS server , it went down and we couldn’t connect the VC.
All other application in our environment was working fine and we login to the PSC and VC with the port 5480 ( https://VC:5480 ) and manually changed the primary DNS IP to the working DNS server and within few seconds , VC started connecting to the PSC and allowing the AD authentication .
In our investigation we couldn’t find any concert reason for the failure and also tested in lab by just changing the Primary DNS to some unknown IP and didnt find any issue on the connectivity .
Finally raised the ticket with VMware and they confirmed that the issue is because of some bug in the VCSA Update 2 and they are working on to fix the issue in the next update 3 and also they confirmed it has been fixed in the VCSA 6.5 version but still no answer for my lab environment which is working fine on changing the primary DNS.
UPDATE 3/16/2017 : VC 6.0 U3 release notes doesn’t show anything related to this bug fix and when we checked with VMware they confirmed still it is in testing stage and not included in the latest U3 update..
Also pls find the blog which list all the known issue on the VCSA