GSLB With AKO & AMKO - NSX Advanced LoadBalancer


Global Server LoadBalancing in VMware Tanzu with AMKO

This post will go through how to configure AVI (NSX ALB) with GSLB in vSphere with Tanzu (TKGs) and an upstream k8s cluster in two different physical locations. I have already covered AKO in my previous posts, this post will assume knowledge of AKO (Avi Kubernetes Operator) and extend upon that with the use of AMKO (Avi Multi-Cluster Kubernetes Operator). The goal is to have the ability to scale my k8s applications between my "sites" and make them geo-redundant. For more information on AVI, AKO and AMKO head over here

Preparations and diagram over environment used in this post

This post will involve a upstream Ubuntu k8s cluster in my home-lab and a remote vSphere with Tanzu cluster. I have deployed one Avi Controller in my home lab and one Avi controller in the remote site. The k8s cluster in my home-lab is defined as the "primary" k8s cluster, the same goes for the Avi controller in my home-lab. There are some networking connectivity between the AVI controllers that needs to be in place such as 443 (API) between the controllers, and the AVI SE's needs to reach the GSLB VS vips on their respective side for GSLB health checks. Site A SE's dataplane needs connectivity to the vip that is created for the GSLB service on site B and vice versa. The primary k8s cluster also needs connectivity to the "secondary" k8s clusters endpoint ip/fqdn, k8s api (port 6443). AMKO needs this connectivity to listen for "GSLB" enabled services in the remote k8s clusters which triggers AMKO to automatically put them in your GSLB service. More on that later in the article. When all preparations are done the final diagram should look something like this:

(I will not cover what kind of infrastructure that connects the sites together as that is a completely different topic and can be as much). But there will most likely be a firewall involved between the sites, and the above mentioned connectivity needs to be adjusted in the firewall. In this post the following ip subnets will be used:

  1. SE Dataplane network home-lab: (I only have two se's so there will be two addresses from this subnet) (I am running the all services on the same two SE's which is not recommended, one should atleast have dedicated SE's for the AVI DNS service)
  2. SE Dataplane network remote-site: (Two SE's here also, in remote site I do have dedicated SE's for the AVI DNS Service but they will not be touched upon in this post only the SE's responsible for the GSLB services being created)
  3. VIP subnet for services exposed in home-lab k8s cluster: (a dedicated vip subnet for all services exposed from this cluster)
  4. VIP subnet for services exposed in remote-site tkgs cluster: (a dedicated vip subnet for all services exposed from this cluster)

For this network setup to work one needs to have routing in place, either with BGP enabled in AVI or static routes. Explanation: The SE's have their own dataplane network, they are also the ones responsible for creating the VIPs you define for your VS. So, if you want your VIPs to be reachable you have to make sure there are routes in your network to the VIPS where the SEs are next hops either with BGP or static routes. The VIP is what it is, a Virtual IP meaning it dont have its own VLAN and gateway in your infrastructure. It is created and realised by the SE's. The SE's are then the gateways for your VIPS. A VIP address could be anything. At the same time the SEs dataplane network needs connectivity to the backend servers it is supposed to loadbalance, so this dataplane network also needs routes to reach those. In this post that means the SE's dataplane network will need reachability to the k8s worker nodes where your apps are running in the home-lab site and in the remote site it needs reachability to the TKGs workers. On a sidenote I am not running routable pods, they are nat-ed trough my workers, and I am using Antrea as CNI with NodePortLocal configured. I also prefer to have a different network for the SE dataplane, different VIP subnets as it is easier to maintain control, isolation, firewall rules etc.

The diagram above is very high level, as it does not go into all networking details, firewall rules etc but it gives an overview of the communication needed.

When one have an clear idea of the connectivity requirements we need to form the GSLB "partnership" between the AVI controllers. I was thinking back and forth whether I should cover these steps also but instead I will link to a good friends blog site here that does this brilliantly. Its all about saving the environment of unnecessary digital ink ๐Ÿ˜„. This also goes for AKO deployment. This is also covered here or from the AVI docs page here
It should look like this on both controllers when everything is up and ready for GSLB: It should be reflected on the secondary controller as well, except there will be no option to edit.

Time to deploy AMKO in K8s

AMKO can be deployed in two ways. It can be sufficient with only one instance of AMKO deployed in your primary k8s cluster, or you can go the federation approach and deploy AMKO in all your clusters that you want to use GSLB on. Then you will end up with one master instance of AMKO and "followers" or federation member on the others. One of the benefit is that you can promote one of the follower members if the primary is lost. I will go with the simple approach, deploy AMKO once, in my primary k8s cluster in my home-lab.

AMKO preparations before deploy with Helm

AMKO will be deployed by using Helm, so if Helm is not installed do that. To successfully install AMKO there is a couple of things to be done. First, decide which is your primary cluster (where to deploy AMKO). When you have decided that (the easy step) then you need to prepare a secret that contains the context/clusters/users for all the k8s clusters you want to use GSLB on. An example file can be found here. Create this content in a regular file and name the file gslb-members. The naming of the file is important, if you name it differently AMKO will fail as it cant find the secret. I have tried to find a variable that is able override this in the value.yaml for the Helm chart but has not succeeded, so I went with the default naming. When that is populated with the k8s clusters you want, we need to create a secret in our primary k8s cluster like this: kubectl create secret generic gslb-config-secret --from-file gslb-members -n avi-system. The namespace here is the namespace where AKO is already deployed in.

This should give you a secret like this:

1gslb-config-secret                      Opaque                                1      20h

A note on kubeconfig for vSphere with Tanzu (TKGs)

When logging into a guest cluster in TKGs we usually do this through the supervisor with either vSphere local users or AD users defined in vSphere and we get a timebased token. Its not possible to use this approach. So what I went with was to grab the admin credentials for my TKGs guest cluster and used that context instead. Here is how to do that. This is not a recommended approach, instead one should create and use a service account. Maybe I will get back to this later and update how.

Back to the AMKO deployment...

The secret is ready, now we need to get the value.yaml for the AMKO version we will install. I am using AMKO 1.8.1 (same for AKO). The Helm repo for AMKO is already added if AKO has been installed using Helm, the same repo. If not, add the repo:

1helm repo add ako

Download the value.yaml:

1 helm show values ako/amko --version 1.8.1 > values.yaml   (there is a typo in the official doc - it points to just amko)

Now edit the values.yaml:

  1# This is a YAML-formatted file.
  2# Declare variables to be passed into your templates.
  4replicaCount: 1
  7  repository:
  8  pullPolicy: IfNotPresent
 10# Configs related to AMKO Federator
 12  # image repository
 13  image:
 14    repository:
 15    pullPolicy: IfNotPresent
 16  # cluster context where AMKO is going to be deployed
 17  currentCluster: 'k8slab-admin@k8slab' #####use the context name - for your leader/primary cluster
 18  # Set to true if AMKO on this cluster is the leader
 19  currentClusterIsLeader: true
 20  # member clusters to federate the GSLBConfig and GDP objects on, if the
 21  # current cluster context is part of this list, the federator will ignore it
 22  memberClusters:
 23  - 'k8slab-admin@k8slab' #####use the context name
 24  - 'tkgs-cluster-1-admin@tkgs-cluster-1' #####use the context name
 25# Configs related to AMKO Service discovery
 27  # image repository
 28  # image:
 29  #   repository:
 30  #   pullPolicy: IfNotPresent
 32# Configs related to Multi-cluster ingress. Note: MultiClusterIngress is a tech preview.
 34  enable: false
 37  gslbLeaderController: '' ##### MGMT ip leader/primary avi controller
 38  controllerVersion: 22.1.1
 39  memberClusters:
 40  - clusterContext: 'k8slab-admin@k8slab' #####use the context name
 41  - clusterContext: 'tkgs-cluster-1-admin@tkgs-cluster-1' #####use the context name
 42  refreshInterval: 1800
 43  logLevel: INFO
 44  # Set the below flag to true if a different GSLB Service fqdn is desired than the ingress/route's
 45  # local fqdns. Note that, this field will use AKO's HostRule objects' to find out the local to global
 46  # fqdn mapping. To configure a mapping between the local to global fqdn, configure the hostrule
 47  # object as:
 48  # [...]
 49  # spec:
 50  #  virtualhost:
 51  #    fqdn:
 52  #    gslb:
 53  #      fqdn:
 54  useCustomGlobalFqdn: true    ####### set this to true if you want to define custom FQDN for GSLB - I use this
 57  username: 'admin'  ##### username/password AVI Controller
 58  password: 'password' ##### username/password AVI Controller
 61  # appSelector takes the form of:
 62  appSelector:
 63    label:
 64      app: 'gslb'     #### I am using this selector for services to be used in GSLB
 65  # Uncomment below and add the required ingress/route/service label
 66  # appSelector:
 68  # namespaceSelector takes the form of:
 69  # namespaceSelector:
 70  #   label:
 71  #     ns: gslb   <example label key-value for namespace>
 72  # Uncomment below and add the reuqired namespace label
 73  # namespaceSelector:
 75  # list of all clusters that the GDP object will be applied to, can take any/all values
 76  # from .configs.memberClusters
 77  matchClusters:
 78  - cluster: 'k8slab-admin@k8slab' ####use the context name
 79  - cluster: 'tkgs-cluster-1-admin@tkgs-cluster-1' ####use the context name
 81  # list of all clusters and their traffic weights, if unspecified, default weights will be
 82  # given (optional). Uncomment below to add the required trafficSplit.
 83  # trafficSplit:
 84  #   - cluster: "cluster1-admin"
 85  #     weight: 8
 86  #   - cluster: "cluster2-admin"
 87  #     weight: 2
 89  # Uncomment below to specify a ttl value in seconds. By default, the value is inherited from
 90  # Avi's DNS VS.
 91  # ttl: 10
 93  # Uncomment below to specify custom health monitor refs. By default, HTTP/HTTPS path based health
 94  # monitors are applied on the GSs.
 95  # healthMonitorRefs:
 96  # - hmref1
 97  # - hmref2
 99  # Uncomment below to specify a Site Persistence profile ref. By default, Site Persistence is disabled.
100  # Also, note that, Site Persistence is only applicable on secure ingresses/routes and ignored
101  # for all other cases. Follow to create
102  # a Site persistence profile.
103  # sitePersistenceRef: gap-1
105  # Uncomment below to specify gslb service pool algorithm settings for all gslb services. Applicable
106  # values for lbAlgorithm:
107  # 1. GSLB_ALGORITHM_CONSISTENT_HASH (needs a hashMask field to be set too)
108  # 2. GSLB_ALGORITHM_GEO (needs geoFallback settings to be used for this field)
109  # 3. GSLB_ALGORITHM_ROUND_ROBIN (default)
111  #
112  # poolAlgorithmSettings:
113  #   lbAlgorithm:
114  #   hashMask:           # required only for lbAlgorithm == GSLB_ALGORITHM_CONSISTENT_HASH
115  #   geoFallback:        # fallback settings required only for lbAlgorithm == GSLB_ALGORITHM_GEO
116  #     lbAlgorithm:      # can only have either GSLB_ALGORITHM_ROUND_ROBIN or GSLB_ALGORITHM_CONSISTENT_HASH
117  #     hashMask:         # required only for fallback lbAlgorithm as GSLB_ALGORITHM_CONSISTENT_HASH
120  # Specifies whether a service account should be created
121  create: true
122  # Annotations to add to the service account
123  annotations: {}
124  # The name of the service account to use.
125  # If not set and create is true, a name is generated using the fullname template
126  name:
129  limits:
130    cpu: 250m
131    memory: 300Mi
132  requests:
133    cpu: 100m
134    memory: 200Mi
137  type: ClusterIP
138  port: 80
141  # creates the pod security policy if set to true
142  pspEnable: false
144persistentVolumeClaim: ''
145mountPath: /log
146logFile: amko.log
148federatorLogFile: amko-federator.log

When done, its time to install AMKO like this:

1helm install  ako/amko  --generate-name --version 1.8.1 -f /path/to/values.yaml  --set configs.gslbLeaderController=<leader_controller_ip> --namespace=avi-system    ####There is a typo in the official docs - its pointing to amko only

If everything went well you should se a couple of things in your k8s cluster under the namespace avi-system.

 1k get pods -n avi-system
 3ako-0    1/1     Running   0          25h
 4amko-0   2/2     Running   0          20h
 6k get amkocluster amkocluster-federation -n avi-system
 7NAME                     AGE
 8amkocluster-federation   20h
10k get gc -n avi-system gc-1
12gc-1   20h
14k get gdp -n avi-system
15NAME         AGE
16global-gdp   20h

AMKO is up and running. Time create a GSLB service

Create GSLB service

You probably already have a bunch of ingress services running, and to make them GSLB "aware" there is not much to be done to achieve that. If you noticed in our value.yaml for the AMKO Helm chart we defined this:

2  # appSelector takes the form of:
3  appSelector:
4    label:
5      app: 'gslb'     #### I am using this selector for services to be used in GSLB

So what we need to in our ingress service is to add the below, and then a new section where we define our gslb fqdn.

Here is my sample ingress applied in my primary k8s cluster:

 2kind: Ingress
 4  name: ingress-example
 5  labels:    #### This is added for GSLB 
 6    app: gslb #### This is added for GSLB - Using the selector I chose in the value.yaml
 7  namespace: fruit
10  ingressClassName: avi-lb
11  rules:
12    - host:  #### Specific for this site (Home Lab)
13      http:
14        paths:
15        - path: /apple
16          pathType: Prefix
17          backend:
18            service:
19              name: apple-service
20              port:
21                number: 5678
22        - path: /banana
23          pathType: Prefix
24          backend:
25            service:
26              name: banana-service
27              port:
28                number: 5678
29---       #### New section to define a host rule
31kind: HostRule
33  namespace: fruit
34  name: gslb-host-rule-fruit
36  virtualhost:
37    fqdn: #### Specific for this site (Home Lab)
38    enableVirtualHost: true
39    gslb:
40      fqdn:  ####This is common for both sites

As soon as it is applied, and there are no errors in AMKO or AKO, it should be visible in your AVI controller GUI:

If you click on the name it should take you to next page where it show the GSLB pool members and the status: Screenshot below is when both sites have applied their GSLB services: "

Next we need to apply gslb settings on the secondary site also:

This is what I have deployed on the secondary site (note the difference in domain names specific for that site)

 2kind: Ingress
 4  name: ingress-example
 5  labels: #### This is added for GSLB
 6    app: gslb #### This is added for GSLB - Using the selector I chose in the value.yaml
 7  namespace: fruit
10  ingressClassName: avi-lb
11  rules:
12    - host: #### Specific for this site (Remote Site)
13      http:
14        paths:
15        - path: /apple
16          pathType: Prefix
17          backend:
18            service:
19              name: apple-service
20              port:
21                number: 5678
22        - path: /banana
23          pathType: Prefix
24          backend:
25            service:
26              name: banana-service
27              port:
28                number: 5678
29---     #### New section to define a host rule
31kind: HostRule
33  namespace: fruit
34  name: gslb-host-rule-fruit
36  virtualhost:
37    fqdn:  #### Specific for this site (Remote Site)
38    enableVirtualHost: true
39    gslb:
40      fqdn:   ##### Common for both sites

When this is applied Avi will go ahead and put this into the same GSLB service as above, and the screenshot above will be true.

Now I have the same application deployed in both sites, but equally available whether I am sitting in my home-lab or at the remote-site. There is a bunch of parameters that can be tuned, which I will not go into now (maybe getting back to this and update with further possibilities with GSLB). But one of them can be LoadBalancing algorithms such as Geo Location Source. Say I want the application to be accessed from clients as close to the application as possible. And should one of the sites become unavailable it will still be accessible from one of the sites that are still online. Very cool indeed. For the sake of the demo I am about to show the only thing I change in the default GSLB settings is the TTL, I am setting it to 2 seconds so I can showcase that the application is being load balanced between both sites. Default algorithm is Round-Robin so it should balance between them regardless of the latency difference (accessing the application from my home network in my home lab vs from my home network in the remote-site which has several ms in distance). Heres where I am setting these settings:

With a TTL of 2 seconds it should switch faster so I can see the balancing between the two sites. Let me try to access the application from my browser using the gslb fqdn:

A refresh of the page and now:

To even illustrate more I will run a curl command against the gslb fqdn:

Now a ping against the FQDN to show the ip of the corresponding site that answer on the call:

Notice the change in ip address but also the latency in ms

Now I can go ahead and disable one of the site to simulate failover, and the application is still available on the same FQDN. So many possibilities with GSLB.

Thats it then. NSX ALB, AKO with AMKO between two sites, same application available in two physical location, redundancy, scale-out, availability. Stay tuned for more updates in advanced settings - in the future ๐Ÿ˜„