Useful tips to find the resource types under the azure policy.

In our organization, we have restrictions to use all the services in azure and have to work with the different teams to enable the resource types in policy for the resources.

Recently I was working to add the non-azure VM in Azure ARC and it was failing with an error.

“FATAL   RequestCorrelationId:281e537a-90cfa3a4003 Message: Resource ‘12345’ was disallowed by policy. ” while troubleshooting found this link which helped to find the policy and it is very easy to search

https://www.azadvertizer.net/azpolicyadvertizer_all.html#%7B%7D

For EX to find the appropriate policy, we can search Azure ARC and it will list all the categories.

We can see for the guest configuration “Microsoft.HybridCompute/machines” resource types.

Posted in Azure, Cloud | Tagged , , | Leave a comment

NFSv3 datastore is much faster than NFSv4

Recently we have noticed in a few datastore the search operation is taking more time than expected and in our testing, it was identified that compare to NFSv4, NFSv3 results are better.

Our testing is to first list files from the host shell and we mounted the same storage as nfsv4 and nfsv3 to the same host. We ran this command from the host shell against both storages. time ls -lahR | wc -l . nfsv4 takes 1 minute and 30 seconds to finish running this command. When it is nfsv3, it only takes 15 seconds. 

Then tried using Powershell. Use the below way to search files. 

$searchspec = New-Object VMware.Vim.HostDatastoreBrowserSearchSpec

$searchSpec.matchpattern = “*.vmx”

$taskMoRef = $dsBrowser.SearchDatastoreSubFolders_Task($datastorePath2, $searchSpec) 

The data transfer rate is pretty much the same but the slowness issue is at the list/search file for nfs4.

Based on the results we involved the vendors to investigate the issue. NetApp conducted a thorough investigation and determined there were no performance issues, but VMware acknowledged there was a problem with NFS v4.1. Below was the reply from the VMware.

“Based on the analysis, Engineering team has identified that the issue related to slow search on NFSv4.1 is caused because the NFSv4.1 does not support “Directory Name Lookup Cache(DNLC) yet. However, for NFSv3, most of the LOOKUP calls are served from cache, which avoids sending a LOOKUP instruction to the NFS server. The VMware engineering is working to add this feature for NFSv4.1 however we do not have a version confirmation where this is expected to be included.”

We applied the latest patches but the issue still exists and hopefully, in the future patch it will get fixed.

Posted in ESX command, ESXi issue, ESXi Patches, logs, VCSA6.7, VMware | Tagged , , , , | Leave a comment

Issue connecting Azure VM using Azure AD from our laptop 

Azure AD has been configured and we are able to login to the Azure VM from another Azure VM using the AD credentials but it is getting failed when we try to connect using our local laptop.

One of the prerequisites is to make sure the local laptop should show AzureAdJoined : YES but still having issues and the error it failed is ” The logon attempt failed”.

dsregcmd /status it was showing the AzureAdJoined : YES.

After a few searches, identified the issue because the local GPO applied to the laptop.

Specifically, this is called out in the doc for AAD Login to Windows VMs here: https://docs.microsoft.com/en-us/azure/active-directory/devices/howto-vm-sign-in-azure-ad-windows#unauthorized-client

Here’s the doc for that particular setting: https://docs.microsoft.com/en-us/windows/security/threat-protection/security-policy-settings/network-security-allow-pku2u-authentication-requests-to-this-computer-to-use-online-identities

Reference:

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsyfuhs.net%2Fhow-authentication-works-when-you-use-remote-desktop

Posted in Azure, Cloud | Tagged , , , , | Leave a comment

Reboot issue MCE error on Dell PowerEdge R6525 running ESXi 7.0 Update 3c

We have new hardware running Dell PowerEdge R6525\AMD EPYC 7713 64-Core Processor with ESXi 7.0 Build: 19193900 Release: ESXi 7.0 Update 3c (ESXi 7.0 Update 3c) and PRD VMs were migrated to the 15 hosts cluster. After a few weeks, started noticing randomly ESXi started rebooting and after further troubleshooting, we upgraded all the hardware firmware and BIOS ( 2.5.6 – upto 2.6.0 ) but the issue didn’t fix.

After monitoring for several weeks, identified DRS rule which running the Linux VMs on certain hosts are most affected compared to windows running hosts so with the help of the vendor changed the CPU and also motherboard on a few hosts but it didn’t help.

All the hosts failed with the ERROR : (Fatal/NonRecoverable) 2 (System Event) 13 Assert + Processor Transition to Non-recoverable

The issue was escalated to the top technical team in Dell and after several months, the vendor asked us to upgrade the BIOS to the 2.6.6  and finally, it helped us to arrest the reboot.

  1. Error from ESX logs – showing memory error
    2022-04-03T05:34:42 13 – Processor 1 MEMEFGH VDD PG 0 Assert + Processor Transition to Non-recoverable
  2.  After the above error, server was running till 12PM UTC
    1. 2022-04-03T10:00:00.611Z heartbeat[2308383]: up 5d6h22m15s, 94 VMs; [[2103635 vmx 67108864kB] [2114993 vmx 134084608kB] [2105683 vmx 134090752kB]] []
      Reboot might have happened between this time

Note : We have another environment that runs the same hardware R6525 with ESXi6.7 U3 but didn’t face any issue and after several analyses, we couldn’t find any solid evidence points the issue was caused by Linux VMs or applications running on the same.

Posted in Dell | Tagged , , , , | Leave a comment

NFS 4.1 datastores might become inaccessible after failover or failback operations of storage arrays

NFS 4.1 datastores might become inaccessible after failover or failback operations of storage arrays.

When storage array failover or failback operations take place, NFS 4.1 datastores fall into an All-Paths-Down (APD) state. However, after the operations are complete, the datastores might remain in APD state and become inaccessible.


As per the VMware this issue is happening in hosts older than build version 16075168 and it is resolved in the newer version. We tested it in our environment and the newer version works fine without any datastore failure.

Posted in Storage, Storage\Backup, VMware | Tagged , , , | Leave a comment

VCSA upgrade stuck in 88%


VCSA 7.0 U3b upgrade stuck in 88%

Resolution

Vami page was stuck at 88% for more than a hour.

Removed the update_config file and restarted the VAMI, but update was not done.

Downloaded the fp.iso patch and patched the VCSA via VAMI successfully.

Posted in VMware | Tagged , , | 2 Comments

VMs running on new MAC-Mini ESXi network issues.

We have a new MAC MINI 2019\2020 and old model (2018 ) running with ESXi 6.7U3 and noticed on new mac-mini VMs ( MAC\Windows\Linux) having issues connecting the network and downloading the files. The only difference is the network card which is a different model.

We have tried enabling the jumbo frame on the VMs and it started working and able to download the files but couldn’t find out the exact cause for the issue because from the hypervisor or having the MAC-OS we don’t have any issue.

Still investigating the issue and workaround is to enable the jumbo frame on the VMs.

Posted in MacMini, MacMini, VMware | Tagged , | Leave a comment

Ports required for the AD

Lots of links talk about the ports required for the AD connection and in my environment below ports are enabled and able to add the client to the AD with DNS registred.

TCP_636
  TCP_3268
  TCP_3269
TCP_88
  UDP_88
  TCP_53
UDP_53
  TCP_445
 UDP_445
  TCP_25
TCP_135
TCP_5722
  UDP_123
  TCP_464
  UDP_464
  UDP_138
  TCP_9389
  UDP_137
 TCP_139
UDP_49152-65535
  TCP_49152-65535


Refer:

https://isc.sans.edu/diary/Cyber+Security+Awareness+Month+-+Day+27+-+Active+Directory+Ports/7468

Posted in AWS, Azure, Cloud | Leave a comment

IP customization is falling on RHEL 5 and 6 VM with SRM 8.3.1

Below is the issue we have faced after upgrading the SRM to 8.3.1

IP customization is falling on RHEL 5 and 6 VM with SRM 8.3.1

IP customization previous worked on these RHEL versions with SRM 6.5

IP customization work with RHEL 7 VMs which can utilize the SAML tokens for authentication.

It looks like changes that happened between SRM 6.5 and later versions, that caused the conflict with LDAP on your RHEL6 machines. Prior to the changes, SRM performs script transfer using the VIX protocol that has little to no authentication. This master access method worked from vCenter, where SRM would transfer the script through vCenter, and then directly to the ESXi host and eventually the VM, without any authentication or tokens involved.

For security reasons, this is obviously a weakness. This has changed and is now enforced, that instead, we use a SAML token authentication, through an SSO Solution User, that is created when SRM registers with the PSC/SSO and vCenter. This new method also meant we needed to upgrade how Tools operates and allow it to be able to be apart of that process with SSO, thus the vgAuth part of the tools. 

This process now impersonates the root account to execute scripts inside the GuestOS that are directly tied to an authentication token through SSO.

Also as you see above, SRM only contacts SSO to get authentication, but outside of that, SRM itself transfers the script now to the ESXi host and then the VM, instead of vCenter doing it. This new process forces us to authenticate and use the benefits of the temporary SAML token for activities like this. This is also the exact same process if you run custom scripts inside the Guest OS on your plans.

We have seen cases where LDAP and now with you, openLDAP, cause a conflict with our ability to impersonate on the Guest OS. Unfortunately, like any other third party application or solution that conflicts with our operation needs to be addressed from the offending application itself. In this case, it appears SSSD works as proven by your tests.

Posted in SRM, VMware | Tagged , , , , | Leave a comment

Bug in vCenter running AMD EPYC Zen3 (Milan)AMD and EPYC 7713.

Recently we moved to AMD EPYC 7713 64 with Dell R6525 and noticed ESXi hosts showing 100% CPU and it is keep on intermittently fluctuating . When we checked the performance in ESXTOP it was very low and in our other environment AS -2114GT-DNR  SuperMicro with same AMD EPYC 7713P noticed similar CPU spike.

As per the KB https://kb.vmware.com/s/article/85071 it is some kind of cosmetic issue and we can safely ignore it or else there is workaround mentioned on the same.

Posted in VMware | Tagged , | Leave a comment