Issue on SSH login to the ESXi 6.7 host with the AD user account

We were not able to login to the ESXi ssh using the AD account and when we tried to leave the account or add the domain it is getting failed.

[root@esx:~] /usr/lib/vmware/likewise/bin/domainjoin-cli join prd.com admin
 - While adding the host got below error:
 
Error: LW_ERROR_LDAP_CONSTRAINT_VIOLATION 


Deleted the stale entry/ESXi computer account from Active Directory.

Post deleting the account, ESXi was successfully able to leave the domain. Used below command to leave the domain:
 
[root@esx:~] /usr/lib/vmware/likewise/bin/domainjoin-cli leave
- Used below command to add the ESXi back to the domain which was successful. 
 
[root@esx:~] /usr/lib/vmware/likewise/bin/domainjoin-cli join prd.com admin
Joining to AD Domain:   prd.com
With Computer DNS Name: esx.prd.com
 SUCCESS
- Post joining the ESXi to domain team was successfully able to login to ESXi host using domain user account.

Posted in ESXi issue, VMware | Tagged , , | Leave a comment

Tip to check the ESXi\vCenter errors using the Splunk.

Recently we had “All path down issue” in one of our host and I was looking to find how many events and how long this issue was there in host and identified the below steps in the Splunk in which we can highlight the key word to find the list.We can easily get the details from the ESXi but I felt below steps will be useful for the other use cases.

Make sure we have the Add-on for VMware https://splunkbase.splunk.com/app/3215/ in splunk which is no cost and it will install the VMware sourcetype parsers.

1. Click on Event Action > Extract Fields to start the wizard

2. Select Regular Expression > highlight to select a value > name the field > continue on to validation and complete the wizard.

When you click the events it will show all the events regarding the word you highlighted .

Useful Links:

https://splunkbase.splunk.com/app/3975/

Posted in logs, vCSA 6.0, VCSA6.5, VCSA6.7, VMware | Tagged , | Leave a comment

Bug noticed on VCSA 14367737 Syslog configuration.

We are running the VCSA 14367737 and it can’t be upgraded because we have the internal cloud stack on top of the vCenter and it supports only the VCSA version 14367737. I have tried forwarding the VCSA logs to the Syslog server ( SPLUNK ) and noticed after the configuration it worked for few hours and stopped working and we have to restart the service manually systemctl restart rsyslog to forward the logs again to the Splunk server.

After trying few options and in our test environment we have tried upgrading the VC to different version and noticed the issue got fixed on the vCenter Appliance 6.7 Update 3g (6.7.0.44000) 16046470.Eventough in the release notes they havent mentioned anything on this issue , it looks like they have upgraded the rsyslog version on ths VCSA version.

As the workaround we can configure the cron and restart the service for every two hours.

Posted in VCSA6.7, VMware | Tagged , , | 1 Comment

Packet drop issue on HP Gen 9 \ Gen 10 servers running ESXi6.7.

We have noticed the packet drop on all of our HP BL460c Gen 9 \ Gen 10 across the region which is running ESXi, 6.7.0, 16316930 and the Network adapter presently installed on the server is HPE FlexFabric 10Gb 2-port 536FLB Adapter which is Qlogic Adapter.

Version which comes with the HP custom image includes the qfle3  driver version 1.1.6.0-1OEM.650.0.0.4598673 and We have tried updating the driver \ firmware of the HP Enclosure \ OA \ Virtual Connect to the below versions but didn’t fix the issue.

OA Firmware : 4.96

https://support.hpe.com/hpsc/swd/public/detail?swItemId=MTX_8e583ffa28874a53aa272b959b

      3.. Upgrade the Virtual connect firmware on one switch and another switch.

HP Virtual Connect Firmware: 4.85

https://support.hpe.com/hpsc/swd/public/detail?swItemId=MTX_f99f0bc5bfc4414aac021f81af#tab3

Solution:

After a lot of options tried, HP has recommended installing the below driver version and packet drop issue has been fixed.

https://support.hpe.com/hpsc/swd/public/detail?swItemId=MTX_fca9a16a601345919247b0c240#tab-history.

[root@esx106:~] esxcli software vib install -v “/cp039955/QLogic-Network-iSCSI-FCoE-v2.0.102-14793946/QLogic-Network-iSCSI-FCoE-v2.0.102-offline_bundle-14793946/vib20/qfle3/QLC_bootbank_qfle3_1.0.87.0-1OEM.670.0.0.8169922.vib”

Installation Result

   Message: Host is not changed. Reboot is pending from previous transaction.

   Reboot Required: true

   VIBs Installed:

   VIBs Removed:

   VIBs Skipped: QLC_bootbank_qfle3_1.0.87.0-1OEM.670.0.0.8169922

I think as per the ESXi Patch advisory it is mentioned QFLE3 is 1.0.50.11-9vmw.670.0.0.8169922 so we need to have something near to the version and once installed the driver recommended by HP which is 1.0.87.0-1OEM.670.0.0.8169922 fixed the issue for us.

Posted in ESXi issue, ESXi Patches, HP, VMware | Tagged , , , , , , , | 4 Comments

Memories of 2020

We have started migrating our Tier-1 infrastructure to AWS and most of our internal applications have been moved to the cloud and learned lot of new AWS services and new technologies.

Around SEP, I was requested to support our internal cloud team which is running cloud-stack with VMware and after a very big gap again back to VMware and virtualization technology. Initially it was very difficult for the change over but now very much back in to the track.

After moving to the internal cloud team , I got the opportunity to take care of the Scrum master role and started doing the same and planning to finish the certification.

Even tough because of the pandemic there were lot of challenges , I had a good 2020 in my professional life and looking forward for the new year 2021..

Posted in Uncategorized | Tagged , , | Leave a comment

Easy way to uninstall the Trend Deep Security agent.

I was searching the easy way to uninstall the Trend Agent on the windows 10 and find the below command useful.

Get-Package -Name  “Trend Micro Deep Security Agent” | Uninstall-Package

Or

msiexec.exe /x <exact MSI package name>.msi /quiet

Reference:

https://success.trendmicro.com/solution/1055096-performing-silent-uninstallation-of-deep-security-agent-dsa-from-windows-machine

Posted in Trend Micro Deep Security, Trend Micro Deep Security 9.5 ( Deep Security Agent ) | Leave a comment

AWS Compute related updates

AWS End of Support Migration Program for Windows Server now available as a self-serve solution for customers

Resource Access Manager Support is now available on AWS Outposts

New course for Amazon Elastic Kubernetes Service

Amazon EKS now supports Kubernetes version 1.18

AWS Lambda Extensions: a new way to integrate Lambda with operational tools

AWS Compute Optimizer enhances EC2 instance type recommendations with Amazon EBS metrics

Amazon EBS CSI driver now supports AWS Outposts

Amazon ElastiCache on Outposts is now available

AWS Elastic Beanstalk Adds Support for Running Multi-Container Applications on AL2 based Docker Platform

AWS Batch introduces tag-based access control

Amazon EC2 G4dn Bare Metal Instances with NVIDIA T4 Tensor Core GPUs, now available in 15 additional regions

AWS Launch Wizard now supports SAP HANA backups with AWS Backint Agent

Posted in AWS, EC2 | Tagged | Leave a comment

Steps to blacklist the problematic DCs in VMware VCSA 6.7U3

We had a DNS issue in one of the DC running active directory integrated  DNS service and it caused our vCenter to fail to connect the domain in AD so we have changed the  DNS to the IPs which is working properly but identified still AD authentication getting failed and in the VAR\LOG\Messages it was still pointing to the problematic DC and failing to authenticate.

After a few research got the instruction from the VCSA6.7 U3b release notes about the steps to blacklist the DCs and added the problematic DC IP as mentioned below.

Active Directory authentication or joining a domain is slow

Active Directory authentication or joining a domain might be slow when configured with Integrated Windows Authentication (IWA), because of infrastructure issues such as network latency and firewalls in some of the domain controllers.

This issue is resolved in this release. The fix provides the option to blacklist selected domain controllers in case of infrastructure issues.

To set the option, use the following commands:
# /opt/likewise/bin/lwregshell set_value '[HKEY_THIS_MACHINE\Services\netlogon\Parameters]' BlacklistedDCs DC_IP1,DC_IP2,...
# /opt/likewise/bin/lwsm restart lwreg

To revert to the default settings, use the following commands:
# /opt/likewise/bin/lwregshell set_value '[HKEY_THIS_MACHINE\Services\netlogon\Parameters]' BlacklistedDCs ""
# /opt/likewise/bin/lwsm restart lwreg

But still we noticed the VC is connecting to the problematic DC and also in the file /var/lib/likewise/krb5-affinity.conf it was showing the problamatic DC IP and when we tried to change it manually , automatically it got updated to the old problamatic DC IP .

After research we added the VC subnet in Active Directory Sites and Services to the new DCs and waited for few mins and noticed in the krb5-affinity.conf the new DC IPs got updated and issue got fixed by pointing the VC to the correct DC and ignoring the problematic DC.

Note : BlacklistDCs will work only from the 6.7U3b version.

Useful links:

https://kb.vmware.com/s/article/2127213

https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.psc.doc/GUID-8C553435-27CD-4410-ACA9-9A84EA1D7334.html

https://kb.vmware.com/s/article/53698

https://docs.vmware.com/en/VMware-vSphere/6.7/rn/vsphere-vcenter-server-67u3b-release-notes.html

Posted in ESXi issue, Vcenter Appliance, vCSA 6.0, VCSA6.5, VCSA6.7, VMware | Tagged , , , | Leave a comment

cfn-lint useful tool for the CloudFormation.

Recently had the chance to learn about the CFN-LINT tool which is a very useful tool to validate the CloudFormation template directly using the editor and it makes us create the CF without any error and secured.

https://github.com/aws-cloudformation/cfn-python-lint

Posted in AWS, AWS Server Migration Service, EC2 | Tagged , | Leave a comment

vCenter VPXD crashes because of high memory.

We have a vcenter environment with around  500 ESXi hosts running on multiple clusters and for the past several weeks we had the issue of Vcetner down because of VPXD crash and the service will be in stopped status.

VMware support identified the VCDB growth is huge with high CPU and memory usage on the vCenter and they started to investigate the same.

[Analysis]
Most of the memory usage is contributed by the Events

Signature 7f46978acb30 (Vmomi::KeyAnyValue) has 81430551 instances taking 0xd0aaf2b8(3,500,864,184) bytes.
Signature 562dc2a74a90 (Vmomi::Primitive<std::string>) has 55823822 instances taking 0x50096420(1,342,792,736) bytes.
Signature 7f4696eb77d0 (Vim::Event::ManagedEntityEventArgument) has 21574134 instances taking 0x37913be0(932,264,928) bytes.
Signature 7f4696eb7870 (Vim::Event::DatacenterEventArgument) has 13307524 instances taking 0x2205bdc0(570,801,600) bytes.
Signature 7f4696eb7960 (Vim::Event::HostEventArgument) has 13285493 instances taking 0x21b98d98(565,808,536) bytes.
Signature 7f4696eb78c0 (Vim::Event::ComputeResourceEventArgument) has 13280205 instances taking 0x21ddefb8(568,192,952) bytes.
Signature 562dc2a831f0 (Vmomi::DataArray<Vmomi::KeyAnyValue>) has 13086149 instances taking 0x21532ee8(559,099,624) bytes.
Signature 7f4696eb7d20 (Vim::Event::EventEx) has 13084710 instances taking 0xc31d4ee0(3,273,477,856) bytes.
Signature 7f4696eb7aa0 (Vim::Event::AlarmEventArgument) has 12934883 instances taking 0x20af9168(548,376,936) bytes.
Signature 562dc2ae8730 (Vmomi::Primitive<int>) has 12863119 instances taking 0x12707908(309,360,904) bytes.
Signature 7f4696eb4080 (Vim::Event::AlarmActionTriggeredEvent) has 4301050 instances taking 0x2b89b560(730,445,152) bytes.
Signature 7f4696eb4170 (Vim::Event::AlarmSnmpCompletedEvent) has 4301049 instances taking 0x272dbf98(657,309,592) bytes.

journalctl -xe

Jul 14 17:40:06 vCenter1 vpxd[20178]: Event [-22821230] [1-1] [2020-07-14T17:40:06.285759Z] [vim.event.AlarmActionTriggeredEvent] [info] [] [SantaClara] [-22821230] [Alarm ‘Host hardware sensor state’ on host  triggered an action]

Jul 14 17:40:06 vCenter1 vpxd[20178]: Event [-22821225] [1-1] [2020-07-14T17:40:06.286465Z] [vim.event.EventEx] [info] [] [SantaClara] [-22821225] [Alarm ‘Host hardware sensor state’ on host triggered by event -42004159 ‘Sensor -1 type , Description Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent #15 state assert for . Part Name/Number N/A N/A Manufacturer N/A’]

For 6 hours, the number of events is high

grep “Sensor -1 type” journalctl_-b–* | wc -l
371899

This is contributing towards, the VCDB growth.

VCDB=# SELECT nspname || ‘.’ || relname AS “relation”, pg_size_pretty(pg_total_relation_size(C.oid)) AS “total_size” FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN (‘pg_catalog’, ‘information_schema’) AND C.relkind <> ‘i’ AND nspname !~ ‘^pg_toast’ ORDER BY pg_total_relation_size(C.oid) DESC LIMIT 20;

relation       | total_size
———————+————
vc.vpx_event_arg_35 | 9661 MB
vc.vpx_event_arg_34 | 9568 MB
vc.vpx_event_arg_40 | 9543 MB
vc.vpx_event_arg_32 | 9427 MB
vc.vpx_event_arg_37 | 9082 MB
vc.vpx_event_arg_38 | 8742 MB
vc.vpx_event_arg_36 | 8721 MB
vc.vpx_event_arg_39 | 8249 MB
vc.vpx_event_arg_33 | 8169 MB
vc.vpx_event_arg_57 | 7957 MB

The number of events in DB

VCDB=# SELECT COUNT(EVENT_ID) AS NUMEVENTS, EVENT_TYPE, USERNAME FROM VPXV_EVENT_ALL GROUP BY EVENT_TYPE, USERNAME ORDER BY NUMEVENTS DESC LIMIT 10;

numevents |                 event_type                 |         username
———–+——————————————–+————————–
58907485 | vim.event.AlarmActionTriggeredEvent        |
58905170 | com.vmware.vc.StatelessAlarmTriggeredEvent |
58899874 | vim.event.AlarmSnmpCompletedEvent          |
5894366 | com.vmware.vc.EventBurstStartedEvent       |
5892674 | com.vmware.vc.EventBurstCompressedEvent    |
5892554 | com.vmware.vc.EventBurstEndedEvent         |
1564588 | vim.event.AlarmStatusChangedEvent          |
702653 | vim.event.BadUsernameSessionEvent          | root
694115 | esx.audit.account.locked                   |
300159 | vim.event.TaskEvent                        |

[Action Plan]
https://kb.vmware.com/s/article/74607

As per VMware it is known issue with 6.7 causes the burst of events causing the high IO on the vCenter.
You could update all the ESXi to 6.7 P01 or the latest version to fix the issue or follow the workaround mentioned in the KB.

Posted in ESXi issue, Vcenter Appliance, vCSA 6.0, VCSA6.5, VMware | Tagged , , , | Leave a comment