Techbrainblog

NFS4 Slowness and Vcenter update 3f fixes the issue.

Posted on June 30, 2024 by Ganadmin

Pls check this blog which I explained about the NFS3 slowness and we were constantly testing every update and working with VMware.

Recently we have upgraded the vCenter to update 3f and tested the NFS4 with NFS3 and we are happy with the results.

On addition to that NFS4 datastore disconnect during the Netapp upgrade, pls check this blog for more details and in the upgrade notes it is mentioned that issue has been fixed while failover.

Posted in logs, vcsa7.0, VCSA8.0, VMware | Tagged NFS Issue, NFS4.1, VMware NFS | 4 Comments

VMware Licensing issue after the Broadcom portal migration

Posted on May 31, 2024 by Ganadmin

We already own the perpetual licensing for vSphere and after the migration we have noticed ESXi 8.0 license are started showing as end date and when we applied it to the new ESXi hosts it started showing the expire date instead of never expires.

We contacted the VMware support and they asked to split and merge the license and it started working after few days. Not sure if any background work was done from the support but it worked for us.

Posted in ESXi issue, vcsa7.0, VCSA8.0, VMs, VMware | Tagged Vmware Bradcom issue, VMware License | Leave a comment

VMware Licensing for Java integration in VCSA appliance.

Posted on April 30, 2024 by Ganadmin

Under the new Java license policy either we need to buy new Java License for any software which use java or else corresponding software vendor should already own the license as part of the product.

We have inquired with VMware and it has been confirmed that No separate licenses are needed for VCSA other than VMware Licenses.

Posted in VCSA6.5, VCSA6.7, vcsa7.0, VCSA8.0, VMware, vROPs | Tagged vcenter issue, VCSA | Leave a comment

Azure DevTest Labs User

Posted on January 31, 2024 by Ganadmin

One of the use cases is to allow the QA team users to access the image and deeply the VM with no permission to the Azure portal and other resources.

DevTest Labs User from Azure looks satisfying the below requirements.

Requirement to only provide DevTest Lab User role to individual QA Testers (and not groups) to isolate one users VMs from another.
Ability to turn off marketplace images so users can deploy virtual machines only from approved images.
Ability to restrict virtual machine sizes to only pre-approved sizes.
No need to create per-user resource groups to apply RBAC and restrict access to specific users.
Ability to create “formulas” that are a preconfigured collection of virtual machine settings alongside an image, to further simplify virtual machine deployment.

Posted in Azure | Tagged Azure DEvTest Lab, Isolated Access | Leave a comment

ESXi host logs to Splunk issue got resolved in the latest VMware release, 7.0 Update 3o

Posted on November 30, 2023 by Ganadmin

We have noticed that our ESXI hosts are acting as Syslog clients and data from those hosts are being sent to our Syslog Server(port 6514) and then we are redirecting those logs to Splunk.

We noticed that ESXI host stops reporting to Splunk frequently and every time we have to restart the Syslog Server service or need to re-install the splunk vib in esxi host to report again in spunk server.

This issue has been fixed with the latest VMware release 7.0 update 3o.

Posted in ESXi issue, logs, VMware | Tagged ESXi Logs, Splunk | Leave a comment

Options to decrypt the VMs protected by VMware native key provider.

Posted on August 31, 2023 by Ganadmin

VC in which the native key provider was enabled got deleted and couldn’t recover it from the backup. Unfortunately, there is no other backup for the primary key.

The only option to recover is to clone VMs to remove the encryption and then cutting production over from the current live VMs to the clones. If this does not prove successful, the VMs will need to be rebuilt.

We need to make sure to backup the VC and protect the primary key in a safe place.

Posted in VMware | Tagged encryption, VMware native key provider | Leave a comment

NFS4.1 datastore disconnected after the Netapp storage upgrade\failover.

Posted on July 16, 2023 by Ganadmin

Update Jun2024: Looks like issue has been fixed in Update 3f, pls check this blog.

we have NFS-3 and NFS 4.1 presented to this ESXI hosts both are from the same Netapp storage. As part of the Netapp Storage upgrade failover was performed in the storage end and experienced some APDs error on the hosts, we observed only issues with NFS.41 datastore and NFS-3 datastore in are connected state without any issue.

Error on the host end.

“cpu0:2098542)StorageApdHandlerEv: 110: Device or filesystem with identifier [] has entered the All Paths Down state.2023-06-10T10:52:58.189Z cpu0:)StorageApdHandlerEv: 110: Device or filesystem with identifier [] has entered the All Paths Down state.2023-06-10T10:55:18.193Z cpu0:2098542)StorageApdHandlerEv: 126: Device or filesystem with identifier [] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will no$“

We involved Netapp to check the storage log and below are the findings.

When LIF is migrated to another node, storage will send RST to close existing connection and send GARP to update mac address of the port where LIF is moved, so the client can initiate new connection to LIF over that mac.

Trace review

Storage sent RST to close the connection and sent GARP to update new MAC

Client uses new MAC and initiated TCP connection to storage, and it is successful

Client sent Bind_conn_to_session call and storage response is NFS4ERR_BADSESSION. Once that error is received, clients should send Exchange ID and initiate new session with storage.

When the client sends Exchange ID, the storage response is NFS4ERR_DELAY. When the NFS4ERR_DELAY response is received on the client, the client will retry the same operation.

Later, we could see Exchange ID operation is responded to successfully by storage and the client Acknowledged the response.

After that, the client is not sending calls to initiate the session and only sends TCP keep-alive calls. Same behavior is seen when LIF is reverted back to the node.From the traces it clearly shows that after acknowledging the response the client is not able to send further calls to initiate the session.

Vmware is involved to check further in the logs based on the Netapp input.

From the logs, once NFS41 client gets the EXiD from the server, it is comparing the cluster roles. NFS41 client is expecting 393, but got 655, so it is bailing out there.

WARNING: NFS41: NFS41ExidNFSProcess:2054: EXCHANGE_ID error: NFS4ERR_DELAY
2023-06-21T14:27:44.755Z cpu4:2099419)WARNING: NFS41: NFS41ExidNFSProcess:2054: EXCHANGE_ID error: NFS4ERR_DELAY

As per VMware it looks like the NFS server role got changed from 0x60**** before upgrade to 0x10*** after upgrade. Vmkernel.log doesn’t have the original server logs that shows the role before upgrade may be they rolled over. If the server role got changed, our NFS41 client doesn’t re-initiate the sessions but it expects to un-mount and remount the datastores.In the captured logs prior to the upgrade the role flags as 0x00060*** which is (EXCHGID4_FLAG_USE_PNFS_MDS | EXCHGID4_FLAG_USE_PNFS_DS), after the upgrade we got these role flags as 0x00010**** which is EXCHGID4_FLAG_USE_NON_PNFS.

Further investigation VmWare explained the NFS41 client behavior WRT server role change.

– pNFS settings was changed in on NetApp Server / volume level (from _USE_PNFS_MDS|_USE_PNFS_DS to USE_NON_PNFS)
– Later there is NetApp server switchover as part of NetApp Server upgrade, some of the ESXis stopped the Session continuation after the new connection is up

From the RFC5661 Section 13.1 the server roles are agreed upon during the EXCHANGE_ID operation, after this exchange if the server changes the roles there is no protocol operation to inform this role change to the client. This role mismatch can be a problem because the client is assuming the older role and the server is assuming the newer role.

In this case, after NetApp Server upgrade the connection reset happened and when the ESXi client re-establishes the connection it got new role in the EXCHANGE_ID for the existing data store. Our current client implementation doesn’t support the role change for already mounted datastores as this involves releasing the older resources and setting up the new resources. So rejected this new role and stopped progressing with the Session establishment.

Since there is no protocol support for the dynamic change of server role, the VMware recommended method is to un-mount and remount the datastore after server role changes. Which helps us setup the appropriate context based on the server role, then any NetApp Server upgrades will be seamless without any disruption.

The only workaround is to reboot the host which will reinitiate the connection and connect the datastore back to the hosts.

Posted in Dell, ESXi issue, Storage, VMware | Tagged NFS ADP, NFS Issue, NFS4.1, NFSv4 | 1 Comment

Azcopy CONCURRENCY VALUE

Posted on May 31, 2023 by Ganadmin

As per MS normal azcopy depends on the CPU but blob to blob is serverto server APIs so as use AZCOPY_CONCURRENCY_VALUE= more than 1000.

https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-optimize?toc=%2Fazure%2Fstorage%2Fblobs%2Ftoc.json#increase-the-number-of-concurrent-requests

” If you’re copying blobs between storage accounts, consider setting the value of the AZCOPY_CONCURRENCY_VALUE environment variable to a value greater than 1000. You can set this variable high because AzCopy uses server-to-server APIs, so data is copied directly between storage servers and does not use your machine’s processing power. ”

Previously it was taken more than 30 mins to download the data but once we provided AZCOPY_CONCURRENCY_VALUE=2000 and was able to copy from blob to blob in 3.41 minutes.

Posted in AZCopy, Azure, Cloud | Tagged AZCopy, Azure blob, Azure storage | Leave a comment

Adding an RDM LUN to the second node in the MSCS cluster fails with the error: Incompatible device backing specified for the device

Posted on April 30, 2023 by Ganadmin

Followed the document https://docs.vmware.com/en/VMware-vSphere/6.5/vsphere-esxi-vcenter-server-651-setup-mscs.pdf to configure the windows cluster but it is getting failed with the error

Tried a few VMware KB articles to fix the issue https://kb.vmware.com/s/article/2054897 but all the options got failed and finally the issue got fixed by moving the OS Disk to the RDM LUN.

Posted in SQL server Failover Cluster, Vcenter Appliance, VMware, Windows | Tagged MSCS Cluster fails, RDM, windows cluster | Leave a comment

Deploy Diagnostic Settings for Azure Function to Event Hub

Posted on February 28, 2023 by Ganadmin

Azure policy to set the diagnostic Settings for Azure Function to Event Hub so that all the events will be routed to the corresponding event hub.

Azure policy can automate this process and for resources that already exist, a remediation task has to be created.

Below is code that will help to create the Azure Policy to set the Azure Functions diagnostic settings to event-hub.

https://github.com/Ganeshsekarbabu/Azure-Policy

Posted in Azure, Azure event hub, Azure policy, Cloud | Tagged aure function, Azure event hub, Azure policy | Leave a comment

Techbrainblog

NFS4 Slowness and Vcenter update 3f fixes the issue.

VMware Licensing issue after the Broadcom portal migration

VMware Licensing for Java integration in VCSA appliance.

Azure DevTest Labs User

ESXi host logs to Splunk issue got resolved in the latest VMware release, 7.0 Update 3o

Options to decrypt the VMs protected by VMware native key provider.

NFS4.1 datastore disconnected after the Netapp storage upgrade\failover.

Azcopy CONCURRENCY VALUE

Adding an RDM LUN to the second node in the MSCS cluster fails with the error: Incompatible device backing specified for the device

Deploy Diagnostic Settings for Azure Function to Event Hub

Archives

Categories

Tags

Blog Stats

Follow me on Twitter

RSS

Social

Categories

Tags

Social

Blog Stats

RSS

Follow me on Twitter

Categories

Tags

Blog Stats

RSS

Follow me on Twitter

Social

Categories

Tags

Blog Stats

RSS

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Archives

Categories

Tags

Blog Stats

Follow me on Twitter

RSS

Social

Categories

Tags

Social

Blog Stats

RSS

Follow me on Twitter

Categories

Tags

Blog Stats

RSS

Follow me on Twitter

Social

Categories

Tags

Blog Stats

RSS