VMware support log size issue and workaround (VCSA ).

In our environment we have around 12 VCSA 6.0 U2 ( vCenter  Appliance)  in different region with various sizes , most of our environment will be running with around 20-25 ESXi hosts in two or three clusters and  800 – 1300 VMs so whenever any issue on the environment  it will be always challenging to upload the logs to the VMware support because the support log size will be more that 15 GB to 20 GB and at one point log generation will fail because of the log size partition and even after increasing the space of the log partition, it will be big challenge to upload it to the VMware FTP site .

VMware support engineers have no clue about the reason for the huge bundle log size and have no other option to fix the log to upload it in their FTP site  and in most cases we use to upload the specific  date and log type only to the VMware  for the troubleshooting.

I was searching lot of blogs and slack channels but didnt help much but came to know the plugin called  VMware support assistance tool which we can directly upload the logs from the Vcenter to the VMware portal so I decided to install the same.

One drawback of this plugin is if the web-client session is time-out then it will interrupt the log upload and the process will be closed so we need to increase the web-client session time-out and it can be done by the help of KB 2040626 so we have increased the session time-out from the default 30 mins to 6 hours which helped us to successfully upload the logs to the VMware portal .

But still it is the security concern on having the web-client session opened for 6 hours and also really want to find out the reason behind the bundle log size growth.After long wait and  several  conversation with VMware senior engineers , identified that under /storage/core  all the old vpxd.core  dump will be stored which is not required and we only required live_core.VPXD.* for the vcenter. As per the suggestion we have deleted all the old files form our lab VC and now the log bundle size reduced significantly .

Note: Pls have proper backup or snapshot before deleting .  If you wrongly delete the live_core then it will crash the vCenter.

Reference :

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2040626

http://www.virtuallyghetto.com/2012/09/configuring-vsphere-web-client-session.html

http://www.virtuallyghetto.com/2014/07/how-to-generate-specific-support-log-bundles-for-vcenter-esxi-using-vsphere-api.html

Posted in Vcenter Appliance, vCSA 6.0, VCSA6.5 | Tagged , , , | Leave a comment

Powercli script to validate the basic vSphere Hardening

Last month I was working in a security hardening project and implemented as per the standard recommended by the VMware.

We have to show to our internal security team that  all the vsphere environment is protected as per the VMware recommendation so we built small web portal which will call the powercli script in the background to validate the hosts and provide the result.

Script will validate the below settings and provide the results

ESXi.config-ntp and service status
ESXi.config-persistent-logs
ESXi.enable-ad-auth
ESXi.config-persistent-logs
ESXi.set-account-auto-unlock-time
ESXi.set-shell-interactive-timeout
ESXi.set-shell-timeout
DNS IP check
Allowing only corresponding IP in DNS firewall  for the UPD\TCP.

Result will show each tasks status and if it is fully protected then it will show as fully protected or else it will show which one is not as per the standard and in case if only one or two tasks are not up to the standard then it will show that corresponding tasks to change and rest it will show as 1 which means it is protected as per the recommendation .

Like mentioned results which is showing as 1 is good and if all the condition is 1 then it will be considered as true and host is fully protected.

Pls download the script from the below link.

Git-hub

 

Posted in PowerCLI, Powershell, VMware | Leave a comment

Trend Micro Deep Security ( Event Based Tasks to activate the VMs)

Pls check my other blogs on DSM 9.5 and 9.6 and here we can see the steps to activate the VMs when it moved using vMotion to one host to another which will help when we want to activate the agentless protection in old environment with huge list of VMs.

In Trend DSM it is easy to create the event based task to activate the new VMs created in the ESXi hosts but for the computers moved from one host to another we need to use some basic logic to exclude VMs which we dont want to activate and also to assign policy only to the Windows based OS for the agentless protection.

One difficulty  on having this event task is it will activate all the  VMs  to the policy it is defined  and in case if you have different policy to Windows and Linux then it will over-write to the same default policy to all the vMotion VMs so we have to create extra condition to exclude the VMs and another condition for the Operating systems.

Go to Administration – Eventbased Tasks – Click New

Select Computer Moved

Select appropriate policy and other actions

First condition is select the ESX hosts

Next condition is important so we need to exclude any VM name which starts app but we dont have option like not matches to directly exclude the VM by name and to fix this use the WWW.REGEXPAL.COM to verify the regular expression which provides the no match or match condition.

FOR EX  ”  ^(?!.*app).*$ ” use this in the regular expression and in the Test String type some content with the word app and it will show the result as match or no match.

   

use it according to the requirement and the third condition is to exclude the Linux VMs so select the condition platform and use .*OS.*

 

Posted in Trend Micro Deep Security | Tagged , , , | Leave a comment

Supermicro bug on running VMs in Nested method.

One of our internal cloud environment which is running ESXi hosts under SuperMicro ( SYS-1027GR ) got failed with the PSOD referencing Fatal (unrecoverable) MCE on one of physical CPU .It happened almost all the ESX hosts on the cluster frequently but not always right away after the restart.

Sample from the log:

cpu2:33455)@BlueScreen: Machine Check Exception: Fatal (unrecoverable) MCE on PCPU2 in world 33455:vmnic1-pollW System has encountered a Hardware Error – Please contact the hardware vendor

Raised the support case with SuperMicro and VMware , after long investigation VMware Engineer identified it is known BUG in the SuperMicro  Servers if we running any VM in a nested manner.The issue is due to Intel erratum (which Intel has to acknowledge and release microcode/BIOS fix) this is a long-term solution to this issue. It seems that nested virtualization combined with PCI passthrough caused some errata in the CPU microcode on the Intel CPUs to make the hosts crash. 1603071 is the related bug number  mentioned by VMware.

To fix this PSOD, temporarily we have disabled the  Hardware Virtualization from the VMs option. Working with SuperMicro and VMware for the workaround and lon-term solution.

Posted in ESXi issue, VMware | Tagged , , , | 1 Comment

vSphere client and Powercli fails to connect vCenter after TLSv1.0 disabled.

As per the KB 2148819 , TLSv1.0 has been disabled from the VC and issue started to connect the VC from the desktop client and also using the powercli .

Error from the Desktop client:

Error from the Powercli

Using the web-client there is no issue on connecting the vcenter or ESX hosts and only the issue is from the desktop client and powercli . After reading the KB again noticed that they already mentioned in notes about the issue and pointed to the another KB 2149000 which describes the issue and add to do few changes on the below file with few MS .Net patches

 C:\Program Files (x86)\VMware\Infrastructure\Virtual Infrastructure  Client\Launcher\VpxClient.exe.config

Edit the VpxClient.exe.config file by setting the parameters

<add key = "EnableTLS12" value =  "false" /> as
<add key = "EnableTLS12" value =  "true" />

After doing the changes also had the same issue and finally it got resolved by re-installing the desktop client.

But still connecting the vCenter using the powercli was not fixed and finally found the another KB 2137109 which asked to do the below registry changes which fixed the issue.

Must use PowerCLI 6.0 R1 or later. Earlier versions of PowerCLI work with versions of the .NET Framework that cannot use the TLSv1.1 and TLSv1.2 protocols by editing the registry.
  • For 32-bit processes, change the following registry key value to 1.

    Key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\[.NET_version]
    Value: SchUseStrongCrypto (DWORD)

  • For 64-bit processes, in addition to the above registry key, change the following registry key value to 1.

    Key: HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\.NETFramework\[.NET_version]
    Value: SchUseStrongCrypto (DWORD)

Reference :

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2137109

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2149000

Posted in ESXi issue, vCSA 6.0, VMware | Tagged , , | Leave a comment

Primary\Secondary DNS IP Fail-Over bug in VMware vCenter Server Appliance 6.0 Update 2 ( VCSA U2)

We have the PRD setup with external PSC and VC which is configured with the Primary DNS and Secondary DNS . Due to the hardware issue on our primary DNS server , it went down and we couldn’t connect the VC.

All other application in our environment was working fine and we login to the PSC and VC with the port 5480 ( https://VC:5480 ) and manually changed the primary DNS IP to the working DNS server and within few seconds , VC started connecting to the PSC and allowing the AD authentication .

In our investigation we couldn’t find any concert reason for the failure and also tested in lab by just changing the Primary DNS to some unknown IP and didnt find any issue on the connectivity .

Finally raised the ticket with VMware and they confirmed that the issue is because of some bug in the VCSA Update 2 and they are working on to fix the issue in the next update 3 and also they confirmed it has been fixed in the VCSA 6.5 version but still no answer for my lab  environment which is working fine on changing the primary DNS.

UPDATE 3/16/2017 : VC 6.0 U3 release notes doesn’t show anything related to this bug fix and when we checked with VMware they confirmed still it is in testing stage and not included in the latest U3 update..

Also pls find the blog which list all the known issue on the VCSA

 

Posted in vCSA 6.0, VCSA6.5 | Tagged , , , | 2 Comments

Useful information and links about Microsoft Remote Procedure Call (RPC)

The diagram below shows the RPC workflow starting with the registration of the server application with the RPC Endpoint Mapper (EPM) in step 1 to the passing of data from the RPC client to the client application in step 7.

rpc

  1. Server app registers its endpoints with the RPC Endpoint Mapper (EPM)
  2. Client makes an RPC call (on behalf of a user, OS or application initiated operation)
  3. Client side RPC contacts the target computers EPM and ask for the endpoint to complete the client call
  4. Server Machine’s EPM responds with an endpoint
  5. Client side RPC contacts the server app
  6. Server app executes the call, returns the result to the client RPC
  7. Client side RPC passes the result back to the client app

How RPC Works

https://technet.microsoft.com/en-us/library/cc738291(v=ws.10).aspx

Troubleshooting “RPC server is unavailable” error, reported in failing AD replication scenario.

https://blogs.technet.microsoft.com/abizerh/2009/06/11/troubleshooting-rpc-server-is-unavailable-error-reported-in-failing-ad-replication-scenario/

Restricting Active Directory RPC traffic to a specific port

https://support.microsoft.com/en-us/kb/224196

How to configure RPC dynamic port allocation to work with firewalls

https://support.microsoft.com/en-in/kb/154596

https://blogs.technet.microsoft.com/askpfeplat/2015/01/11/rpc-endpoint-mapper-returns-dynamic-port-incorrectly-when-active-directory-is-configured-to-use-static-port/

Have you set static port on the DC for netlogon or for any other interfaces?

Long logon time after you set a specific static port for NTDS and NETLOGON in a Windows Server 2008 R2-based domain environment

http://support.microsoft.com/kb/2827870/en-us

AD replication fails with an RPC issue after you set a static port for NTDS in a Windows-based domain environment

http://support.microsoft.com/kb/2912805/en-us

Logon fails after you restrict client RPC to DC traffic in Windows Server 2012 R2 or Windows Server 2008 R2

http://support.microsoft.com/kb/2987849/en-us

Use the script https://gallery.technet.microsoft.com/Test-RPC-Testing-RPC-4396fcda that helps to test the RPC connectivity via TCP: This script tests TCP network connectivity to not just the RPC Endpoint Mapper on port 135, but it also checks TCP network connectivity to each of the registered endpoints returned by querying the EPM.  Many firewall teams have a difficult time with RPC, and they will end up allowing the Endpoint Mapper on port 135, but forget to also allow the ephemeral ports through the firewall.  This script uses localhost by default, but obviously you can specify a remote machine name or IP address to test a server across the network.  The script works by P/Invoking functions exported from rpcrt4.dll to get an enumeration of registered endpoints from the endpoint mapper, so it’s not just a wrapper around portqry.exe.

One of the issue if the ephemeral ports are blocked between clients and the domain controller it will show the RPC error while trying to join a client machine to the domain. Client gets joined to the domain and later fails with error “Changing the Primary Domain DNS name of this computer to “” failed. The name will remain “testlab.com. The error was: The RPC server is unavailable”.

use the below link to make sure we opened the required ports for the communication between clients and the DC.

How to configure a firewall for domains and trusts

https://support.microsoft.com/en-us/kb/179442

 

 

 

 

Posted in Windows | Tagged , , | Leave a comment