vCenter VPXD crashes because of high memory.

We have a vcenter environment with around  500 ESXi hosts running on multiple clusters and for the past several weeks we had the issue of Vcetner down because of VPXD crash and the service will be in stopped status.

VMware support identified the VCDB growth is huge with high CPU and memory usage on the vCenter and they started to investigate the same.

[Analysis]
Most of the memory usage is contributed by the Events

Signature 7f46978acb30 (Vmomi::KeyAnyValue) has 81430551 instances taking 0xd0aaf2b8(3,500,864,184) bytes.
Signature 562dc2a74a90 (Vmomi::Primitive<std::string>) has 55823822 instances taking 0x50096420(1,342,792,736) bytes.
Signature 7f4696eb77d0 (Vim::Event::ManagedEntityEventArgument) has 21574134 instances taking 0x37913be0(932,264,928) bytes.
Signature 7f4696eb7870 (Vim::Event::DatacenterEventArgument) has 13307524 instances taking 0x2205bdc0(570,801,600) bytes.
Signature 7f4696eb7960 (Vim::Event::HostEventArgument) has 13285493 instances taking 0x21b98d98(565,808,536) bytes.
Signature 7f4696eb78c0 (Vim::Event::ComputeResourceEventArgument) has 13280205 instances taking 0x21ddefb8(568,192,952) bytes.
Signature 562dc2a831f0 (Vmomi::DataArray<Vmomi::KeyAnyValue>) has 13086149 instances taking 0x21532ee8(559,099,624) bytes.
Signature 7f4696eb7d20 (Vim::Event::EventEx) has 13084710 instances taking 0xc31d4ee0(3,273,477,856) bytes.
Signature 7f4696eb7aa0 (Vim::Event::AlarmEventArgument) has 12934883 instances taking 0x20af9168(548,376,936) bytes.
Signature 562dc2ae8730 (Vmomi::Primitive<int>) has 12863119 instances taking 0x12707908(309,360,904) bytes.
Signature 7f4696eb4080 (Vim::Event::AlarmActionTriggeredEvent) has 4301050 instances taking 0x2b89b560(730,445,152) bytes.
Signature 7f4696eb4170 (Vim::Event::AlarmSnmpCompletedEvent) has 4301049 instances taking 0x272dbf98(657,309,592) bytes.

journalctl -xe

Jul 14 17:40:06 vCenter1 vpxd[20178]: Event [-22821230] [1-1] [2020-07-14T17:40:06.285759Z] [vim.event.AlarmActionTriggeredEvent] [info] [] [SantaClara] [-22821230] [Alarm ‘Host hardware sensor state’ on host  triggered an action]

Jul 14 17:40:06 vCenter1 vpxd[20178]: Event [-22821225] [1-1] [2020-07-14T17:40:06.286465Z] [vim.event.EventEx] [info] [] [SantaClara] [-22821225] [Alarm ‘Host hardware sensor state’ on host triggered by event -42004159 ‘Sensor -1 type , Description Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent #15 state assert for . Part Name/Number N/A N/A Manufacturer N/A’]

For 6 hours, the number of events is high

grep “Sensor -1 type” journalctl_-b–* | wc -l
371899

This is contributing towards, the VCDB growth.

VCDB=# SELECT nspname || ‘.’ || relname AS “relation”, pg_size_pretty(pg_total_relation_size(C.oid)) AS “total_size” FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN (‘pg_catalog’, ‘information_schema’) AND C.relkind <> ‘i’ AND nspname !~ ‘^pg_toast’ ORDER BY pg_total_relation_size(C.oid) DESC LIMIT 20;

relation       | total_size
———————+————
vc.vpx_event_arg_35 | 9661 MB
vc.vpx_event_arg_34 | 9568 MB
vc.vpx_event_arg_40 | 9543 MB
vc.vpx_event_arg_32 | 9427 MB
vc.vpx_event_arg_37 | 9082 MB
vc.vpx_event_arg_38 | 8742 MB
vc.vpx_event_arg_36 | 8721 MB
vc.vpx_event_arg_39 | 8249 MB
vc.vpx_event_arg_33 | 8169 MB
vc.vpx_event_arg_57 | 7957 MB

The number of events in DB

VCDB=# SELECT COUNT(EVENT_ID) AS NUMEVENTS, EVENT_TYPE, USERNAME FROM VPXV_EVENT_ALL GROUP BY EVENT_TYPE, USERNAME ORDER BY NUMEVENTS DESC LIMIT 10;

numevents |                 event_type                 |         username
———–+——————————————–+————————–
58907485 | vim.event.AlarmActionTriggeredEvent        |
58905170 | com.vmware.vc.StatelessAlarmTriggeredEvent |
58899874 | vim.event.AlarmSnmpCompletedEvent          |
5894366 | com.vmware.vc.EventBurstStartedEvent       |
5892674 | com.vmware.vc.EventBurstCompressedEvent    |
5892554 | com.vmware.vc.EventBurstEndedEvent         |
1564588 | vim.event.AlarmStatusChangedEvent          |
702653 | vim.event.BadUsernameSessionEvent          | root
694115 | esx.audit.account.locked                   |
300159 | vim.event.TaskEvent                        |

[Action Plan]
https://kb.vmware.com/s/article/74607

As per VMware it is known issue with 6.7 causes the burst of events causing the high IO on the vCenter.
You could update all the ESXi to 6.7 P01 or the latest version to fix the issue or follow the workaround mentioned in the KB.

Posted in ESXi issue, Vcenter Appliance, vCSA 6.0, VCSA6.5, VMware | Tagged , , , | Leave a comment

Steps to be taken while migrating the EC2 to different type.

We have migrated one of our EC2 instance from T2 to T3 and found there is a lack of performance on the server and when we had a call with AWS support they shared that the T3 type is based on the nitro hardware and it required drivers to support a nitro based hardware.

Downloaded and updated the network (ENA) and Storage (NVMe) along with the PV drivers on that instance.

https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/migrating-latest-types.html#upgrade-pv

Posted in AWS, EC2 | Tagged , | Leave a comment

Some of the free stuffs which we can use it during this COVID-19 lockdown

Some of the free stuffs are available which is in the list below and hoping it will help during this tough period.

Pluralsight 

https://www.pluralsight.com/offer/2020/free-april-month

Google Cloud:

https://cloud.google.com/blog/topics/training-certifications/expanding-at-home-learning

Docker:

https://portal.cloudskills.io/docker-jumpstart

Books: Kindle Free

https://landing.google.com/sre/books/

UDEMY

https://www.udemy.com/introduction-to-serverlessaws/

https://www.udemy.com/amazon-web-services-aws-cloudformati…/

JENKINS

https://www.udemy.com/working-with-jenkins/

https://www.udemy.com/jenkins-beginner-tutorial-step-by-st…/

https://www.udemy.com/jenkins-quick-start/

https://www.udemy.com/jenkins-devops-pipeline-as-code/

https://www.udemy.com/devops-crash-course-cicd-with-jenkin…/

https://www.udemy.com/jenkins-intro/

ANSIBLE

https://www.udemy.com/ansible-essentials-simplicity-in-aut…/

https://www.udemy.com/ansible-quick-start/https://www.udemy.com/devops-series-server-automation-usin…/

https://www.udemy.com/devops-beginners-guide-to-automation…/

https://www.udemy.com/just-enough-ansible/

GIT

https://www.udemy.com/git-started-with-github/

https://www.udemy.com/learngit/https://www.udemy.com/git-expert-4-hours/

https://www.udemy.com/git-and-github-crash-course-creating…/

https://www.udemy.com/git-and-github-step-by-step-for-begi…/

https://www.udemy.com/intro-to-git/https://www.udemy.com/short-and-sweet-get-started-with-git…/

https://www.udemy.com/the-ultimate-git-5-day-challenge/https://www.udemy.com/git-quick-start/

https://www.udemy.com/gitandgithub/

https://www.udemy.com/git-basics/

https://www.udemy.com/git-bash/➡️

MAVEN

https://www.udemy.com/maven-quick-start/

Python

https://www.udemy.com/course/automate/?couponCode=APR2020FREE

Youtube:

AWS:

https://www.youtube.com/channel/UCSLIvjWJwLRQze9Pn4cectQ

Docker

 

Bash:

https://www.youtube.com/watch?v=e7BufAVwDiM&feature=youtu.be

 

Entertainment 

Kids Story : https://www.scarymommy.com/astronauts-story-time-in-space-kids-books/?fbclid=IwAR1128iYPnLObRifo273L0lNDkW3b1fEbhfancBgsalrfZgtTx-ODcPDtk4

Sling : Get 14 days of access with no credit card required

https://www.sling.com/pm/homepage?AID=14014078&PID=100048248&SID=AndroidCentral&cvosrc=affiliate.cj.100048248&adid=14014078&af=1&utm_medium=affiliate&utm_source=cj&utm_campaign=100048248&utm_content=14014078&cjevent=80c9450777b711ea82d902b60a1c0e10

 

HBO : HBO unlocks 500 hours of must-see films and TV series to stream for free

https://www.thrifter.com/hbo-unlocks-500-hours-must-see-films-and-tv-series-free-streaming?utm_content=bufferebf21&utm_medium=tw_link&utm_source=th_tw&utm_campaign=social

Indian Movies : One month free

https://www.shemaroome.com/cgny

Posted in AWS, General | Tagged , , | Leave a comment

Amazon FSx for Windows File Server

 

Amazon FSx for Windows File Server provides a fully managed native Microsoft Windows file system. Amazon FSx provides NTFS file systems that can be accessed from up to thousands of compute instances using the SMB protocol (SMB 2.0 to 3.1.1).  You can access your Amazon FSx file system from Amazon EC2, VMware Cloud on AWS, Amazon WorkSpaces, and Amazon AppStream 2.0 instances. 

 The service works with Microsoft Active Directory (AD) to integrate your file system with your existing Windows environments. Amazon FSx uses Distributed File System (DFS) Replication to support multi-AZ deployments. To ensure compatibility with your applications, Amazon FSx supports all Windows versions starting from Windows Server 2008 and Windows 7, and also current versions of Linux. 

AWS FSx can be used for various application workloads like home directories, web serving & content management, enterprise applications, analytics, and media & entertainment. All Amazon FSx file system data is automatically encrypted at rest and in transit.  

1.0 FSx for Windows file system 

Prerequisites – Currently AWS FSx only work with AWS Managed AD [AWS MAD], AWS support AWS Directory Service’s AD sharing feature, AWS plan to support AD Connector and Self-managed Microsoft Active Directory.  

These are the Directory Types options AWS have –  

a)       AWS Managed Microsoft AD 

b)      Simple AD 

c)       AD Connector 

d)      Cloud Directory

e)      Amazon Cognito Your User Pools 

No need to select any option from above. Self-managed Microsoft Active Directory is something we need to use for our use cases. Other options are little complicated and has some overhead to it to maintain as explained below.

a)        Create AWS Managed Active Directory  

We need to  create AD (eg: xyz.storage.com). During AWS Managed Directory service creation, you will be asked for – Directory Type; Edition; Directory DNS name; Directory NetBIOS name; VPC; Subnet; AZ. 

Limitation:

 Here we need to maintain AD in respective account. So, this AD need a continues sync for latest AD objects.

Details explained here for the filesystem creation for this method

b)      Simple AD 

Simple AD is a standalone managed directory that is powered by a Linux-Samba Active Directory–compatible server. Not recommended as it needs to create a directory and object limitation

b)      AD Connector 

Need to create a trust relationship between source (app account ) and destination account (AD account).

–> Need to create AWS Managed AD directory ID

Limitation:

1)      Overhead for this process and every account using this will need to have this trust relation.

2)      Cost for active directory service created.

d) Cloud Directory : Similar to RDS cloud directory, Cloud Directory is a high-performance, serverless, hierarchical data store still same limitations as above mentioned for Simple AD.

e) Amazon Cognito Your User Pools : Directs to Cognito service for directory creation. No use

Self-managed Microsoft Active Directory: this is recommended to use

1)      Select FSx for windows file server

Use cases for both 

  Amazon FSx for Lustre Amazon FSx for Windows File Server
Performance Compute Intensive Simple fileshare
Lifecycle Yes No lifecycle
Storage Minimum 1.2 TB Minimum 32 gig
AD authentecation NO Yes
Price $0.14 GB-month / 30 / 24 = $0.000194/GB-hour

3600 GB x $0.000194/GB-hour x 72 hours = $50.40

Total FSx for Lustre charge for 72 hours = $50.40
Storage: 1,024 GB-months x $0.130 GB-month= $133/mo

Throughput: 8 MBPS-months x $2.200/MBps-month= $18/mo

Backup: 500 GB-months x $0.050/GB-month = $25/mo

Total monthly charge: $176 ($0.172/GB-mo)


2)      Then for filesystem creation mention below options

File System Name – This will not be used to access the file share or File System. 

Storage Capacity – Minimum 32 GiB; Maximum 65,536 GiB 

Throughput capacity – Recommended is 8MB/s and max can go up to 2048MB/s .

3)      Network info

Please remember the fs is AZ specific but accessible in all Az’s of a region. Need to DFS for redundancy as explained below in doc.

Network & Security – Select your VPC; Availability zone; Subnet; VPC Security Group 

4)      Select the Window authentication method

Select Self-managed AD (SMAD), if selected AWS MAD-id  then create directory services mentioned above methods.

Mention all the requested info :- FDQN, Service account (use ’id” : “serviceaccount”,
   “domain_user”)

Use  Route53 revolvers route the traffice to DNS

5)      Other options

OU :- Please don’t leave default blank or else all the hardening rules will be applied to this , not sure on what impact it might have

OU=Storage and Backup,OU=Appliance,OU=Computers,OU=Objects,DC=ads,DC=xyz,DC=com

Encryption – Default [AWS Key Management Service (KMS) encryption key that protects your file system data at rest] 

6)      Review summary and click “create filesystem”

7) Filesystem will be created

Check via using DNS name

\\amznfsxieasvi5.ads.xyz.com

Share folder will comes by default and cannot be deleted.

2.0 Automatic Daily Backups 

 Amazon FSx automatically takes backups of your file systems once a day. These daily backups are taken during the daily backup window that was established when you created the file system. At some point during the daily backup window, storage I/O might be suspended briefly while the backup process initializes (typically under a few seconds). When you choose your daily backup window, we recommend that you choose a convenient time of the day outside of the normal operating hours for the applications that will use the file system. Backups are kept for a certain period of time, known as a retention period. By default, backups are retained for 7 days. However, you can change the retention period to anywhere in a range of 0–35 days. 

We can perform backup creation and restoration from FSx Management Console, the AWS CLI, or one of the AWS SDKs 

3.0 Multi-AZ File System Deployments 

For workloads that require multi-AZ redundancy to tolerate temporary AZ unavailability, We can create multiple file systems in separate AZs, keep them in sync, and configure failover between them. Amazon FSx fully supports the use of the Microsoft Distributed File System (DFS) for file system deployments across multiple AZs to get Multi-AZ availability and durability. Using DFS Replication, you can automatically replicate data between two file systems. Using DFS Namespaces, you can configure one file system as your primary and the other as your standby, with automatic failover to the standby in the event that the primary becomes unresponsive. MS DFS support both async and sync replication. 

      AWS FSx provides high availability and failover support across multiple AZs which can be used for shared storage and also as mapped drive instead of EBS volumes as EBS cannot span Multi-AZ. 

4.0 Benefits and Cons of FSx

Benefits:

·      AWS FSx is fully managed. It relies on SSD storage and performs with high levels of IOPS and throughput, as well as consistent sub-millisecond latencies for a well-designed infra.

·      AWS FSx is secure. All of the file systems are a part of the Virtual Private Cloud (VPC); all data is encrypted both in transit and at rest, and all activities are logged to CloudTrail

Cons:

·      AWS FSx for windows File server supports custo DNS only Single-AZ filesystems, not for Multi-AZ as if yet

·      Cost is not accurate enough currently

AWS Documentation

Posted in AWS, Windows | Tagged , , , | Leave a comment

Bug in network introspection driver (vnetflt.sys) VMtools 11265 (11.0.1)

Recently we have upgraded the VMtools version from 10.305\10.346 to 1126511.0.1 and we noticed few VMs went to hung status and noticed the below alert in windows VMs.

vmware.log:

2020-02-07T12:50:58.182Z| vcpu-0| I125: Guest: vsep: AUDIT: VFileSocketMgrCloseSocket : Mux is disconnected <——————————————
2020-02-07T12:50:58.297Z| vmx| I125: VigorTransportProcessClientPayload: opID=3997b233-39-9b26 seq=290: Receiving MKS.IssueTicket request.
2020-02-07T12:50:58.297Z| vmx| I125: SOCKET 5 (129) creating new listening socket on port -1
2020-02-07T12:50:58.297Z| vmx| I125: Issuing new webmks ticket a9161e… (120 seconds)
2020-02-07T12:50:58.297Z| vmx| I125: VigorTransport_ServerSendResponse opID=3997b233-39-9b26 seq=290: Completed MKS request.
2020-02-07T12:50:58.666Z| vcpu-0| I125: Guest: vsep: AUDIT: SetupConsumerContext : Setting event Type as 256 from 0
2020-02-07T12:50:58.667Z| vcpu-1| I125: Guest: vsep: AUDIT: SetupConsumerContext : Setting event Type as 256 from 0
2020-02-07T12:50:58.676Z| vcpu-1| I125: Guest: vsep: AUDIT: SetupConsumerContext : Setting event Type as 256 from 0

VMware ticket has been raised and they recommended to upgrade the NSX Manager to 6.4.4 and confirmed the below

There is an internal bug which confirms that this is a known issue with VMware tools version you are using ( 11.0.1 ) and there is no external documentation available confirming this aspect. We have confirmed based on an engineering ticket that we have referred. As per the engineering ticket, this should be made available in the release notes of 11.0.5 and expected to be fixed in 11.1. There is no ETA mentioned about these releases.

Posted in ESXi issue, NSX 6.1.4, Vcenter Appliance, vCSA 6.0, VCSA6.5, VMs, VMtools issue, VMware, Vnic, vShield, Windows | Tagged , , | Leave a comment

Useful links on EC2\FSX with new features

https://aws.amazon.com/about-aws/whats-new/2019/11/amazon-fsx-for-windows-file-server-adds-support-for-high-availability-microsoft-sql-server-deployments/

https://aws.amazon.com/about-aws/whats-new/2019/11/amazon-ec2-auto-scaling-supports-max-instance-lifetime/

https://aws.amazon.com/about-aws/whats-new/2019/11/amazon-ec2-auto-scaling-supports-instance-weighting/

https://aws.amazon.com/about-aws/whats-new/2019/11/amazon-fsx-windows-file-server-supports-managing-file-shares-via-powershell/

https://aws.amazon.com/about-aws/whats-new/2019/12/amazon-ec2-spot-now-provides-instance-launch-notifications-via-amazon-cloudwatch-events/

Posted in AWS, EC2 | Tagged , | Leave a comment

Memories of 2019

2019 started with the lot of new surprises in the company roadmap and one of the main change is to move the on-perm to the cloud which means reducing the VMware footprint. Initially it was tough to accept the change but it was good planing on the change from the management end like providing enough training on the AWS\Azur and once team is having sufficient knowledge and confident then start on migrating the work load to the cloud.

I was put in AWS Training and lot of new learnings which allowed to prepare for the certification and  after several months of preparation and experience, completed the AWS solution architect certification. Last three months started migrating the application  to AWS which is very challenging to understand the current design of the application and  planning on it to run in cloud.Gained some experience in CHEF and last two months started working on CI\CD  also with few python  automation.

Past four years I got the opportunity to attend the VMware conference but this year  since the focus is on the cloud, didnt get the chance to attend but at the same time had change to attend my first AWS-Reinvent which is awesome for learning and explore new  services in AWS..

It was a good year and looking forward 2020 to learn more in cloud services.

Posted in 2019, AWS | Tagged | Leave a comment

VCSA6.5 failed to login using AD credentials

One of our vCenter was having issue to login using the AD Credentials . We verified the DNS and the other VC ‘s which connects to the same DNS and AD , found no issues.

When we checked the websso.log , noticed the below error.

2019-11-25T16:08:43.717Z vsphere.local        8d2b3655-340a-46db-b879-5b680911c743 ERROR] [IdentityManager] Failed to authenticate principal [ADUSER@ADDOMAIN] for tenant [vsphere.local]com.vmware.identity.interop.idm.IdmNativeException: Native platform error [code: 851968][null][null]

atcom.vmware.identity.interop.idm.LinuxIdmNativeAdapter.AuthenticateByPassword(LinuxIdmNativeAdapter.java:180)
atcom.vmware.identity.idm.server.provider.activedirectory.ActiveDirectoryProvider.authenticate(ActiveDirectoryProvider.java:279)
atcom.vmware.identity.idm.server.IdentityManager.authenticate(IdentityManager.java:2777)
atcom.vmware.identity.idm.server.IdentityManager.authenticate(IdentityManager.java:9145)
at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
at sun.rmi.transport.Transport$2.run(Unknown Source)
at sun.rmi.transport.Transport$2.run(Unknown Source)

We tried by rebooting the VC and also removing and adding the AD , even-though we are able to search the AD objects but the authentication was getting failed and finally  the below steps  fixed the issue.

  1. Removed the VC from the domain.
  2.  Deleted the computer account from the AD
  3.  Re-added the VC back to the domain.
  4.  Rebooted the VC, tested connection which was working fine.
Posted in Joining PSC with AD, Platform Services Controller (PSC ), Vcenter Appliance, vCSA 6.0, VCSA6.5, VMware | Tagged , , | Leave a comment

Error while migrating vSphere VMs to Amazon with AWS Server Migration Service

By the help of the link , configured the AWS Server Migration Service and at final stage of the sync it got failed with the error ” Instance failed to boot and establish network connectivity”

So we stopped all the non-microsoft services on the windows instance and tried the sync and it got completed successfully.

Posted in AWS, AWS Server Migration Service, VMware, Windows | Tagged , , | Leave a comment