Options to check and alert the vcenter certificate expiration

Last week one of our vCenter went down because of the machine certificate got expired and it took some time to find out the issue so I thought it will be helpful to show the options to verify the certificate and make sure to enable the alarm.

Since the certificate as expired most of the services will fail to work properly since it cannot function/use the certificate it is assigned to use.

In our case, we are unable to vMotion because the service to vMotion (vmware-sps) is unable to connect to vpxd due to “server certificate chain not verified.”

Below is the log path to verify .

/var/log/vmware/vmware-sps/sps.log

com.vmware.vim.vmomi.client.exception.SslException: com.vmware.vim.vmomi.core.exception.CertificateValidationException: Server certificate chain not verified

Below is the command to verify the Machine certificate.

/usr/lib/vmware-vmafd/bin/vecs-cli entry list –store MACHINE_SSL_CERT –text |less

Solution certificate

/usr/lib/vmware-vmafd/bin/vecs-cli entry list –store machine –text |less

Also we can check the same using the web-browser  .

Using the “ /usr/lib/vmware-vmca/bin/certificate-manager” Replace the certificates on the vCenter via option 3 (just the MACHINE_SSL) or if it is with internal CA then follow the steps here.

So to make the alarm configured for the certificate expiration, already by default 30 days threshold is configured in the vcenter and You can change how soon you are warned with the vpxd.cert.threshold advanced option.

  1. Log in to the vSphere Web Client.
  2. Select the vCenter Server object, the select the Manage tab and the Settings subtab.
  3. Click Advanced Settings, select Edit, and filter for threshold.
  4. Change the setting of vpxd.cert.threshold to the desired value and click OK.

Also make sure under Alarm settings – Certificate Status – Enable this alarm is active so that according to the threshold  we will get the alarm notification when the issue occurred.

Reference :

https://docs.vmware.com/en/VMware-vSphere/6.0/com.vmware.vsphere.security.doc/GUID-D3DB7279-0A25-4AA8-83A0-F34E5676A8B9.html

Advertisements
Posted in Certificate, ESX command, Replacing vCenter 6.0 SSL Certificate, Vcenter Appliance, vCSA 6.0, VCSA6.5, VMware, vSphere 6.0 Template. | Tagged , , , | Leave a comment

Bug in NFS mount while using SRM

We were configuring the SRM with Tintri storage and while configuring the Array pairing in storage replication , SRM was failed to mount the Tintri datastore.

So as per Tintri we need to create the service group in the production site and it will automatically create the folder in the destination for the replication. Once the storage part is done, we started configuring the Array Pair in the SRM and it got completed without any error but in the below array pair column it was showing only the service group without the datasotre mapping.By the result of this we cant create the protection group which need datastore  groups mapping.

SRM

By the help of Tintri support we have tried lot of options like deleting the service group and re-creating it but nothing helped. At last found it is known  SRM issue with NFS datastore having mounted in the ESXi using the FQDN. ( KB 2009622 )

Site Recovery Manger cannot match the Fully Qualified Domain Name (FQDN) or short name of the NFS server that is used to mount the source NFS volumes on the ESX/ESXi hosts with the StoragePort information that is reported in the SRA response to the discoverdevices command.

Finally we remounted all the datastore with IP address and the issue got resolved

 

Reference

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2009622

Posted in SRM, VMware | Tagged , , , , | Leave a comment

Mind-Map for dvSwitch.

My flight got delayed after the VMworld and have to spend few hours in the airport so just to make something interesting made this Mind-map for dvSwitch . Hope it will be helpful for the certificate preparation.

Download

Posted in Vcenter Appliance, vCSA 6.0, VMware | Leave a comment

VMware support log size issue and workaround (VCSA ).

In our environment we have around 12 VCSA 6.0 U2 ( vCenter  Appliance)  in different region with various sizes , most of our environment will be running with around 20-25 ESXi hosts in two or three clusters and  800 – 1300 VMs so whenever any issue on the environment  it will be always challenging to upload the logs to the VMware support because the support log size will be more that 15 GB to 20 GB and at one point log generation will fail because of the log size partition and even after increasing the space of the log partition, it will be big challenge to upload it to the VMware FTP site .

VMware support engineers have no clue about the reason for the huge bundle log size and have no other option to fix the log to upload it in their FTP site  and in most cases we use to upload the specific  date and log type only to the VMware  for the troubleshooting.

I was searching lot of blogs and slack channels but didnt help much but came to know the plugin called  VMware support assistance tool which we can directly upload the logs from the Vcenter to the VMware portal so I decided to install the same.

One drawback of this plugin is if the web-client session is time-out then it will interrupt the log upload and the process will be closed so we need to increase the web-client session time-out and it can be done by the help of KB 2040626 so we have increased the session time-out from the default 30 mins to 6 hours which helped us to successfully upload the logs to the VMware portal .

But still it is the security concern on having the web-client session opened for 6 hours and also really want to find out the reason behind the bundle log size growth.After long wait and  several  conversation with VMware senior engineers , identified that under /storage/core  all the old vpxd.core  dump will be stored which is not required and we only required live_core.VPXD.* for the vcenter. As per the suggestion we have deleted all the old files form our lab VC and now the log bundle size reduced significantly .

Note: Pls have proper backup or snapshot before deleting .  If you wrongly delete the live_core then it will crash the vCenter.

Reference :

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2040626

http://www.virtuallyghetto.com/2012/09/configuring-vsphere-web-client-session.html

http://www.virtuallyghetto.com/2014/07/how-to-generate-specific-support-log-bundles-for-vcenter-esxi-using-vsphere-api.html

Posted in Vcenter Appliance, vCSA 6.0, VCSA6.5 | Tagged , , , | Leave a comment

Powercli script to validate the basic vSphere Hardening

Last month I was working in a security hardening project and implemented as per the standard recommended by the VMware.

We have to show to our internal security team that  all the vsphere environment is protected as per the VMware recommendation so we built small web portal which will call the powercli script in the background to validate the hosts and provide the result.

Script will validate the below settings and provide the results

ESXi.config-ntp and service status
ESXi.config-persistent-logs
ESXi.enable-ad-auth
ESXi.config-persistent-logs
ESXi.set-account-auto-unlock-time
ESXi.set-shell-interactive-timeout
ESXi.set-shell-timeout
DNS IP check
Allowing only corresponding IP in DNS firewall  for the UPD\TCP.

Result will show each tasks status and if it is fully protected then it will show as fully protected or else it will show which one is not as per the standard and in case if only one or two tasks are not up to the standard then it will show that corresponding tasks to change and rest it will show as 1 which means it is protected as per the recommendation .

Like mentioned results which is showing as 1 is good and if all the condition is 1 then it will be considered as true and host is fully protected.

Pls download the script from the below link.

Git-hub

 

Posted in PowerCLI, Powershell, VMware | Leave a comment

Trend Micro Deep Security ( Event Based Tasks to activate the VMs)

Pls check my other blogs on DSM 9.5 and 9.6 and here we can see the steps to activate the VMs when it moved using vMotion to one host to another which will help when we want to activate the agentless protection in old environment with huge list of VMs.

In Trend DSM it is easy to create the event based task to activate the new VMs created in the ESXi hosts but for the computers moved from one host to another we need to use some basic logic to exclude VMs which we dont want to activate and also to assign policy only to the Windows based OS for the agentless protection.

One difficulty  on having this event task is it will activate all the  VMs  to the policy it is defined  and in case if you have different policy to Windows and Linux then it will over-write to the same default policy to all the vMotion VMs so we have to create extra condition to exclude the VMs and another condition for the Operating systems.

Go to Administration – Eventbased Tasks – Click New

Select Computer Moved

Select appropriate policy and other actions

First condition is select the ESX hosts

Next condition is important so we need to exclude any VM name which starts app but we dont have option like not matches to directly exclude the VM by name and to fix this use the WWW.REGEXPAL.COM to verify the regular expression which provides the no match or match condition.

FOR EX  ”  ^(?!.*app).*$ ” use this in the regular expression and in the Test String type some content with the word app and it will show the result as match or no match.

   

use it according to the requirement and the third condition is to exclude the Linux VMs so select the condition platform and use .*OS.*

 

Posted in Trend Micro Deep Security | Tagged , , , | Leave a comment

Supermicro bug on running VMs in Nested method.

One of our internal cloud environment which is running ESXi hosts under SuperMicro ( SYS-1027GR ) got failed with the PSOD referencing Fatal (unrecoverable) MCE on one of physical CPU .It happened almost all the ESX hosts on the cluster frequently but not always right away after the restart.

Sample from the log:

cpu2:33455)@BlueScreen: Machine Check Exception: Fatal (unrecoverable) MCE on PCPU2 in world 33455:vmnic1-pollW System has encountered a Hardware Error – Please contact the hardware vendor

Raised the support case with SuperMicro and VMware , after long investigation VMware Engineer identified it is known BUG in the SuperMicro  Servers if we running any VM in a nested manner.The issue is due to Intel erratum (which Intel has to acknowledge and release microcode/BIOS fix) this is a long-term solution to this issue. It seems that nested virtualization combined with PCI passthrough caused some errata in the CPU microcode on the Intel CPUs to make the hosts crash. 1603071 is the related bug number  mentioned by VMware.

To fix this PSOD, temporarily we have disabled the  Hardware Virtualization from the VMs option. Working with SuperMicro and VMware for the workaround and lon-term solution.

Posted in ESXi issue, VMware | Tagged , , , | 1 Comment