Azure AZCopy architecture design

Advertisement
Posted in AZCopy, Azure, Cloud | Tagged , , , , | Leave a comment

Datastore with NFS4 slowness issue and Netapp\VMware findings.

Already mentioned in my previous blog here about the NFS 4 slowness issue compared to the NFS3 and we captured the packets and worked with the Netapp, below are the findings of our test when we tried ls -lahR | wc -l

  • There was no latency found in the perf-archives.
  • Took packet traces of nfsv3 and nfsv4 while listing the directory, 30 minutes apart.
  • There is a huge jump in lookup calls in V4.1.  From the V4.1 packet traces, even though readdir call is returning entries with file name FH and attributes, still the client is sending explicit lookup calls (i.e. compound with PUTFH,LOOKUP,GETFH,GETATTR) for directory entries.

    NFS3 SRT’s

    Index  Procedure/Opcodes
    /Commands     Calls  Min SRT Max SRT Avg SRT Sum SRT
    1      GETATTR 55     0.000059      0.004604      0.000414       0.022770
    3      LOOKUP 14363  0.000060      0.471455      0.000318       4.565892 <<<
    4      ACCESS 2874   0.000048      0.012290      0.000353       1.015239
    17     READDIRPLUS   6275   0.000072      0.922698       0.001608      10.091936 <<<
    18     FSSTAT 3      0.000065      0.004150      0.001443       0.004329


    NNFS 4.1 SRT’s
    Index  Procedure/Opcodes
    /Commands     Calls  Min SRT Max SRT Avg SRT Sum SRT
    1      COMPOUND (proc #)    95810  0.000020      1.020477       0.001411      135.205752
    3      ACCESS 2887   0.000068      0.004859      0.000321       0.925366
    9      GETATTR 95788  0.000068      1.020477      0.001411       135.198447
    10     GETFH  86659  0.000112      1.020477      0.001164       100.898088
    15     LOOKUP 80923  0.000094      1.020477      0.001144       92.574459 <<<
    16     LOOKUPP 5754   0.000123      0.845588      0.001448       8.330835

    22     PUTFH  95806  0.000068      1.020477      0.001411       135.205653
    26     READDIR 6233   0.000147      0.900400      0.005354       33.373798   <<<
    53     SEQUENCE      95810  0.000020      1.020477       0.001411      135.205752

We decided to test it on the same single datastore by mounting it as NFS 3 and capturing the data, once the test is done then unmount it and mount it as NFS 4 so on the same datastore with the same size and data below are the result.

Based on the result, VMware will be working on the option to enable the Lookup cache and mostly it will be available in future ESXi patches.

Posted in ESX command, ESXi issue, ESXi Patches, VMware | Tagged , , , , | 1 Comment

Useful tips to find the resource types under the azure policy.

In our organization, we have restrictions to use all the services in azure and have to work with the different teams to enable the resource types in policy for the resources.

Recently I was working to add the non-azure VM in Azure ARC and it was failing with an error.

“FATAL   RequestCorrelationId:281e537a-90cfa3a4003 Message: Resource ‘12345’ was disallowed by policy. ” while troubleshooting found this link which helped to find the policy and it is very easy to search

https://www.azadvertizer.net/azpolicyadvertizer_all.html#%7B%7D

For EX to find the appropriate policy, we can search Azure ARC and it will list all the categories.

We can see for the guest configuration “Microsoft.HybridCompute/machines” resource types.

Posted in Azure, Cloud | Tagged , , | Leave a comment

NFSv3 datastore is much faster than NFSv4

Recently we have noticed in a few datastore the search operation is taking more time than expected and in our testing, it was identified that compare to NFSv4, NFSv3 results are better.

Our testing is to first list files from the host shell and we mounted the same storage as nfsv4 and nfsv3 to the same host. We ran this command from the host shell against both storages. time ls -lahR | wc -l . nfsv4 takes 1 minute and 30 seconds to finish running this command. When it is nfsv3, it only takes 15 seconds. 

Then tried using Powershell. Use the below way to search files. 

$searchspec = New-Object VMware.Vim.HostDatastoreBrowserSearchSpec

$searchSpec.matchpattern = “*.vmx”

$taskMoRef = $dsBrowser.SearchDatastoreSubFolders_Task($datastorePath2, $searchSpec) 

The data transfer rate is pretty much the same but the slowness issue is at the list/search file for nfs4.

Based on the results we involved the vendors to investigate the issue. NetApp conducted a thorough investigation and determined there were no performance issues, but VMware acknowledged there was a problem with NFS v4.1. Below was the reply from the VMware.

“Based on the analysis, Engineering team has identified that the issue related to slow search on NFSv4.1 is caused because the NFSv4.1 does not support “Directory Name Lookup Cache(DNLC) yet. However, for NFSv3, most of the LOOKUP calls are served from cache, which avoids sending a LOOKUP instruction to the NFS server. The VMware engineering is working to add this feature for NFSv4.1 however we do not have a version confirmation where this is expected to be included.”

We applied the latest patches but the issue still exists and hopefully, in the future patch it will get fixed.

Update: OCT2022

For more details check the blog.

Posted in ESX command, ESXi issue, ESXi Patches, logs, VCSA6.7, vcsa7.0, VCSA8.0, VMware | Tagged , , , , | 1 Comment

Issue connecting Azure VM using Azure AD from our laptop 

Azure AD has been configured and we are able to login to the Azure VM from another Azure VM using the AD credentials but it is getting failed when we try to connect using our local laptop.

One of the prerequisites is to make sure the local laptop should show AzureAdJoined : YES but still having issues and the error it failed is ” The logon attempt failed”.

dsregcmd /status it was showing the AzureAdJoined : YES.

After a few searches, identified the issue because the local GPO applied to the laptop.

Specifically, this is called out in the doc for AAD Login to Windows VMs here: https://docs.microsoft.com/en-us/azure/active-directory/devices/howto-vm-sign-in-azure-ad-windows#unauthorized-client

Here’s the doc for that particular setting: https://docs.microsoft.com/en-us/windows/security/threat-protection/security-policy-settings/network-security-allow-pku2u-authentication-requests-to-this-computer-to-use-online-identities

Reference:

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsyfuhs.net%2Fhow-authentication-works-when-you-use-remote-desktop

Posted in Azure, Cloud | Tagged , , , , | Leave a comment

Reboot issue MCE error on Dell PowerEdge R6525 running ESXi 7.0 Update 3c

We have new hardware running Dell PowerEdge R6525\AMD EPYC 7713 64-Core Processor with ESXi 7.0 Build: 19193900 Release: ESXi 7.0 Update 3c (ESXi 7.0 Update 3c) and PRD VMs were migrated to the 15 hosts cluster. After a few weeks, started noticing randomly ESXi started rebooting and after further troubleshooting, we upgraded all the hardware firmware and BIOS ( 2.5.6 – upto 2.6.0 ) but the issue didn’t fix.

After monitoring for several weeks, identified DRS rule which running the Linux VMs on certain hosts are most affected compared to windows running hosts so with the help of the vendor changed the CPU and also motherboard on a few hosts but it didn’t help.

All the hosts failed with the ERROR : (Fatal/NonRecoverable) 2 (System Event) 13 Assert + Processor Transition to Non-recoverable

The issue was escalated to the top technical team in Dell and after several months, the vendor asked us to upgrade the BIOS to the 2.6.6  and finally, it helped us to arrest the reboot.

  1. Error from ESX logs – showing memory error
    2022-04-03T05:34:42 13 – Processor 1 MEMEFGH VDD PG 0 Assert + Processor Transition to Non-recoverable
  2.  After the above error, server was running till 12PM UTC
    1. 2022-04-03T10:00:00.611Z heartbeat[2308383]: up 5d6h22m15s, 94 VMs; [[2103635 vmx 67108864kB] [2114993 vmx 134084608kB] [2105683 vmx 134090752kB]] []
      Reboot might have happened between this time

Note : We have another environment that runs the same hardware R6525 with ESXi6.7 U3 but didn’t face any issue and after several analyses, we couldn’t find any solid evidence points the issue was caused by Linux VMs or applications running on the same.

Posted in Dell | Tagged , , , , | Leave a comment

NFS 4.1 datastores might become inaccessible after failover or failback operations of storage arrays

NFS 4.1 datastores might become inaccessible after failover or failback operations of storage arrays.

When storage array failover or failback operations take place, NFS 4.1 datastores fall into an All-Paths-Down (APD) state. However, after the operations are complete, the datastores might remain in APD state and become inaccessible.


As per the VMware this issue is happening in hosts older than build version 16075168 and it is resolved in the newer version. We tested it in our environment and the newer version works fine without any datastore failure.

Posted in Storage, Storage\Backup, VMware | Tagged , , , | Leave a comment

VCSA upgrade stuck in 88%


VCSA 7.0 U3b upgrade stuck in 88%

Resolution

Vami page was stuck at 88% for more than a hour.

Removed the update_config file and restarted the VAMI, but update was not done.

Downloaded the fp.iso patch and patched the VCSA via VAMI successfully.

Posted in VMware | Tagged , , | 2 Comments

VMs running on new MAC-Mini ESXi network issues.

We have a new MAC MINI 2019\2020 and old model (2018 ) running with ESXi 6.7U3 and noticed on new mac-mini VMs ( MAC\Windows\Linux) having issues connecting the network and downloading the files. The only difference is the network card which is a different model.

We have tried enabling the jumbo frame on the VMs and it started working and able to download the files but couldn’t find out the exact cause for the issue because from the hypervisor or having the MAC-OS we don’t have any issue.

Still investigating the issue and workaround is to enable the jumbo frame on the VMs.

Posted in MacMini, MacMini, VMware | Tagged , | Leave a comment

Ports required for the AD

Lots of links talk about the ports required for the AD connection and in my environment below ports are enabled and able to add the client to the AD with DNS registred.

TCP_636
  TCP_3268
  TCP_3269
TCP_88
  UDP_88
  TCP_53
UDP_53
  TCP_445
 UDP_445
  TCP_25
TCP_135
TCP_5722
  UDP_123
  TCP_464
  UDP_464
  UDP_138
  TCP_9389
  UDP_137
 TCP_139
UDP_49152-65535
  TCP_49152-65535


Refer:

https://isc.sans.edu/diary/Cyber+Security+Awareness+Month+-+Day+27+-+Active+Directory+Ports/7468

Posted in AWS, Azure, Cloud | Leave a comment