Thursday, August 27, 2020

First Experiences with Tanzu Kubernetes Grid on AWS

 Introduction

Kubernetes as a container orchestration system has been about for a while now and many vendors have jumped onto it to provide supported builds of it which are simple to install, update and manage when compared to the pure play open source kubernetes distribution.  VMWare currently provide a couple of fully supported distributions which are known under the Tanzu brand banner.  

  • TKG - Tanzu Kubernetes Grid (Formally known as the Heptio distribution)
  • TKGI - Tanzu Kubernetes Grid Integrated (Formally known as Pivotal Container Service - PKS)
This blog post is really to provide a simple guide to deploying TKG into an Amazon account, subsequent blog posts will dive into various features/components/use cases for such an environment.

Installation

Obviously the details of the installation may change with time, this summary is for TKG 1.1 and the docs provide the full details for step by step installation instructions.  

In summary the steps to follow are:-

  1. Download and install the tkg and clusterawsadm command lines from the VMWare Tanzu downloads.
  2. Setup environment variables to run clusterawsadm which will bootstrap the AWS account with necessary groups, profiles, roles and users.  (Only needs done once per AWS account)
    $ clusterawsadm alpha bootstrap create-stack
  3. The configuration for the first management cluster can be either done via a UI.  A good choice the first time you do this as it guides you through the setup.  The UI is available through a browser having run init -ui and then access localhost:8080/#/ui.
    $ tkg init --ui
    Or you can setup the ~/.tkg/config.yaml file with the AWS environment variables required.  (A template for this file can be created by running $ tkg get management-cluster. ). Once setup run the command to create a management cluster.
    $ tkg init --infrastructure aws --name aws-mgmt-cluster --plan dev
At this point what is happening behind the scenes is that the tkg command line actually runs a small kubernetes cluster locally using docker, this cluster includes the clusterapi custom resource and it understands the AWS API and is able to create VMs using the AWS infrastructure and ensure that the tkg kubernetes executables are deployed to it.  The number and size of the VMs are controllable via the plan used and other flags to specify the size and number of workers/masters.  

Even for a small plan this process will take several minutes to complete.  (on my laptop15mins)

% tkg init --infrastructure aws --name aws-mgmt-cluster --plan dev
Logs of the command execution can also be found at: /var/folders/nn/p0h624x937l6dt3bdd00lkmm0000gq/T/tkg-20200827T093026003038156.log

Validating the pre-requisites...

Setting up management cluster...
Validating configuration...
Using infrastructure provider aws:v0.5.4
Generating cluster configuration...
Setting up bootstrapper...
Bootstrapper created. Kubeconfig: /Users/dforbes/.kube-tkg/tmp/config_lieTADrw
Installing providers on bootstrapper...
Fetching providers
Installing cert-manager
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v0.3.6" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.6" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.6" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-aws" Version="v0.5.4" TargetNamespace="capa-system"
Start creating management cluster...
Saving management cluster kuebconfig into /Users/dforbes/.kube/config
Unable to persist management cluster aws-mgmt-cluster info to tkg config
Installing providers on management cluster...
Fetching providers
Installing cert-manager
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v0.3.6" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.6" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.6" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-aws" Version="v0.5.4" TargetNamespace="capa-system"
Waiting for the management cluster to get ready for move...
Moving all Cluster API objects from bootstrap cluster to management cluster...
Performing move...
Discovering Cluster API objects
Moving Cluster API objects Clusters=1
Creating objects in the target cluster
Deleting objects from the source cluster
Context set for management cluster aws-mgmt-cluster as 'aws-mgmt-cluster-admin@aws-mgmt-cluster'.

Management cluster created!


You can now create your first workload cluster by running the following:

  tkg create cluster [name] --kubernetes-version=[version] --plan=[plan]

%

At the end of this process there will be several VMs created in (a bastion server, a master and a worker for the basic dev plan.) a load balancer to allow ingress to the services and security groups etc. to restrict direct access to the cluster.

The tkg init command will by default create a cluster context in your .kube/config file called <cluster name>-admin@<cluster name>.  If other machines/operators are to manage the cluster then the tkg option of --kubeconfig will allow the access context to be stored in a separate file.  (Yup, I deleted the context from my config file then found there was no easy way to re-generate it!)

Testing

First off try a few basic commands

% tkg get management-clusters
 MANAGEMENT-CLUSTER-NAME  CONTEXT-NAME
 aws-mgmt-cluster *       aws-mgmt-cluster-admin@aws-mgmt-cluster

Shows the management cluster we just crated.

% tkg get clusters
 NAME  NAMESPACE  STATUS  CONTROLPLANE  WORKERS  KUBERNETES

No clusters created yet so not a lot of useful information here.  Lets create our first cluster.

% tkg create cluster test-cluster --plan dev
Logs of the command execution can also be found at: /var/folders/nn/p0h624x937l6dt3bdd00lkmm0000gq/T/tkg-20200827T110852116171506.log
Validating configuration...
Creating workload cluster 'test-cluster'...
Waiting for cluster to be initialized...
Waiting for cluster nodes to be available...

Workload cluster 'test-cluster' created

%

This creates a very small cluster with 3 nodes.  A bastion host which is connected to a public network and two VMs on a private network, 1 master or control plane node and 1 worker.  Typically this will take up to 10 minutes to create the VMs and configure the kubernetes cluster.



Next job is to configure kubectl to be able to connect to the cluster.  The tkg command line allows us to get an admin context for the cluster.

% tkg get credentials test-cluster
Credentials of workload cluster 'test-cluster' have been saved
You can now access the cluster by running 'kubectl config use-context test-cluster-admin@test-cluster' 
% kubectl config get-contexts
CURRENT   NAME                                      CLUSTER            AUTHINFO                 NAMESPACE
*         aws-mgmt-cluster-admin@aws-mgmt-cluster   aws-mgmt-cluster   aws-mgmt-cluster-admin
          minikube                                  minikube           minikube
          test-cluster-admin@test-cluster           test-cluster       test-cluster-admin
% kubectl config use-context test-cluster-admin@test-cluster
Switched to context "test-cluster-admin@test-cluster".
% kubectl get ns
NAME              STATUS   AGE
default           Active   10m
kube-node-lease   Active   10m
kube-public       Active   10m
kube-system       Active   10m

That's a kubernetes worker cluster up and ready for action.  An obvious next step might be to scale the cluster.  Again tkg has a very simple api for this.

% tkg get clusters
 NAME          NAMESPACE  STATUS   CONTROLPLANE  WORKERS  KUBERNETES
 test-cluster  default    running  1/1           1/1      v1.18.3+vmware.1
% tkg scale cluster test-cluster -w 2
Successfully updated worker node machine deployment replica count for cluster test-cluster
workload cluster test-cluster is being scaled
% tkg get clusters
 NAME          NAMESPACE  STATUS    CONTROLPLANE  WORKERS  KUBERNETES
 test-cluster  default    updating  1/1           1/2      v1.18.3+vmware.1
.
. . . <about 5 minutes>
.
% tkg get clusters
 NAME          NAMESPACE  STATUS   CONTROLPLANE  WORKERS  KUBERNETES
 test-cluster  default    running  1/1           2/2      v1.18.3+vmware.1

And finally once we have finished with the cluster we want to delete it.

% tkg delete cluster test-cluster
Deleting workload cluster 'test-cluster'. Are you sure?: y█
workload cluster test-cluster is being deleted

Note - there is also a tkg upgrade cluster command.  Makes it nice and simple to upgrade a given cluster just when you want.

Conclusion

A few things to download and startup and certainly a little more complex than using one of the public cloud providers kubernetes environment.  But if you pause for a second and think about the complexity of operation it is achieving in a short period of time it is very impressive and simple approach.  It also gives you access to cluster creation and management with the same experience/binaries running on multiple cloud providers or on-premise. Not to mention full control over the cluster lifecycle.

All in all I was suitably impressed at the ease of use and from nothing it would take no more than a morning or afternoon to have any size of k8s cluster created and ready for use.