Introduction To Kubernetes Cluster Networking with NSX-T

Share on:

When developing a cloud native application using Docker containers she soon needs to understand how Docker containers communicate. In previous post I looked at how Docker containers communicate on a single host. When the developer wants to scaleout capacity of the hosting across multiple hosts or increase abailability she might look at deploying this on a Kubernetes cluster. The move from single Docker host to multiple hosts managed as Kubernetes Cluster introduces changes to the container networking model. The four distinct networking problems a Kubernetes Cluster needs to address:

  • Highly-coupled container-to-container communications
  • Pod-to-Pod communications
  • Pod-to-Service communications
  • External-to-Service communications: this is covered by services

What Is A Kubernetes Pod

A Docker container is great for deploying a single atomic unit of software. This model can become a bit cumbersome when you want to run multiple pieces of software together. You often see this when developers create Docker images that use supervisord as an entrypoint to start and manage multiple processes. Many have found that it is instead more useful to deploy those applications in groups of containers that are partially isolated and partially share an environment. It is possible to configure Docker to control the level of sharing between groups of containers by creating a parent container and manage the lifetime of those containers, however this is administratively complex. Kubernetes provides an abstraction called Pods for just this use case.

A Kubernetes Pod implements a ‘pause’ container as the managing parent container, the Pod also contains one or more of application containers. The ‘pause’ container serves as the basis of Linux namespace sharing in the Pod the other containers are starterd within that namespace. Sharing a namespace includes sharing network stack and other resources such as volumes. Sharing a network namespace means containers within a Pod share an IP address and all containers within a Pod can all reach each other’s ports on localhost.

Kubernets Pod

What Is A Kubernetes Service

Typically a Kubernetes Deployment is used to When a Kubernetes Pod is deployed Pods. A Deployment describes a desired state it can create and destroy Pods dynamically. While each Pod gets its own IP address, the set of Pods running can change. A Kubernetes Service (sometimes called a micro-service) is an abstraction which defines a logical set of Pods. The Kubernetes API provides Service discovery to Pods, it also offers a method of exposing Services via network port or load balancer to external systems.

What Is Container Networking Interface (CNI)

Container-centric infrastructure needs a network and this network must be dynamic. Container networking is designed to be plugable, the Container Networking Interface is a defined specification. Various open source projects and vendors provide CNI compliant plugins which provide dynamic networking solution for containers.

What Is NSX-T Container Plugin (NCP)

The NSX-T Container Plug-in (NCP) provides a CNI plugin and an integration with container orchestrators such as Kubernetes and OpenShift.

NSX-T bring advanced features which can enrich Kubernetes cluster networking including:

  • Fine-grained traffic control and monitoring
  • Fine-grained security policy (firewall rules)
  • Automated creation of network topology
  • Integration with enterprise networking

The main component of NCP runs in a container and communicates with NSX Manager and with the Kubernetes control plane. NCP monitors changes to containers and other resources and manages networking resources such as logical ports, switches, routers, and security groups for the containers by calling the NSX API.

NSX-T Container Plug-in Release 2.5

I looked at the NCP initially for integration with OpenShift in mid-2018. I am revisiting this now to refresh my understanding and explore some of the new features introduced with the 2.5 release including:

Policy API Object Support

Prior to the 2.5 release all NSX objects which the NCP interacted with had to be created via the Advanced Networking & Security tab in the UI or the old imperative APIs. The imperative API was harder than it could have been to control programatically so with the NSX-T 2.4 release VMware introduced a new intent-based Policy API and corresponding Simplified UI. The NCP now supports either the imperative or the intent-based API, to use the intent-based API a new parameter in the NCP configmap (ncp.ini) policy_nsxapi needs to be set to True.

Simplified Installation

Another change I am interested in exploring is the simplified installation. In the past, an admin had to login to every k8s node and perform multiple steps to bootstrap it. She had to install the NSX CNI Plug-in and OpenVSwitch, to create OVS bridge and to add one vNic to the bridge. The 2.5 release introduces a second DaemonSet nsx-ncp-bootstrap Pod this now handles the deployment and lifecycle management of these components and we don’t need to login to every node. This should make it easier to scale out a cluster with additional nodes.

Lab Hardware Configuration

To explore Kubernetes networking and the NCP I am using my homelab. My homelab has a very simple physical network namely a single subnet ( with DHCP enabled and which has default route to the internet. Connected to the physical network are three Intel NUCs each has two 1G NIC an onboard and an additional 1G USB3 NIC.

Lab vSphere Configuration

The hosts run vSphere 6.7 Update 3 and have the onboard NIC configure as a vSphere Standard Switch hosting Management and vSAN VMkernels and the USB3 NIC is unused. The hosts are added to a vCenter appliance ( and formed into a VSAN enabled cluster. The cluster also hosts a Windows 2019 Server VM running Active Directory ( this also acts as DNS server and NTP source for lab.

Lab NSX-T Configuration

An extra-small NSX Manager appliance ( is deployed. All esxi hosts are configured as Transport Nodes using vusb0 interface to Transport Zone named ‘overlayTransport’. A medium sized NSX Edge called ‘nsxEdge’ is deployed which is a member of ‘overlayTransport’ and ‘vlanTransport’ Transport Zones. A Edge Cluster named ‘edgeCluster’ and add ‘nsxEdge’ is a member.

Kubernetes IP Address Space

A Kubernetes Cluster requires all Pods on a node to communicate with all Pods on all nodes in the cluster without NAT these are refered to as CluterIP. To support this a range of IP addresses must be defined to be issued to Pods and Services within a cluster. Even though the range is used for both Pods and Services, it is called the Pod address range. The last /20 of the Pod address range is used for Services. A /20 range has 212 = 4096 addresses. So 4096 addresses are used for Services, and the rest of the range is used for Pods.

The address range I will be using to issue ClusterIP for this lab cluster is

As well as internal communicatins using ClusterIP some parts of your application may need to be exposed as a Service to be accessible on an externally routable IP address. There are two methods for exposing Service onto an externally routable IP address, NodePort and LoadBalancer. Source NAT is used for translating private ClusterIP address to a public routable address.

The external address range I will be using to issue ExternalIP for this lab cluster is

NCP - NSX Object Identification

There can be many objects deployed within NSX-T, the NCP needs to understand which of these objects to interact with. The NSX objects which the NCP needs to interact with include:

  • Overlay transport zone
  • Tier-0 logical router
  • Logical switch to connect the node VMs
  • IP Block for internal ClusterIP addressing
  • IP Pool for ExternalIP addressing

The NCP configuration is stored in ncp.ini file. It is possible to put the UUID of NSX objects in the ncp.ini but this an administrative pain. The mapping of which NSX objects for NCP to interact with is better achieved by applying tags to the appropriate NSX objects.

An NSX-T instance can support multiple Kubernetes clusters to ensure correct object mapping a cluster name is used. The cluster name is specified in NCP configuration and the appopriate NSX objects must have tag of same name applied.

For this lab environment I am configuring with cluster name ‘pandora’.

Deploy and Configure Tier-0

The NCP deploys application centric network topology at the top of that topology is a Tier-0 router which provides uplink to the physical network.

Use the Policy UI to deploy a Tier-0 Logical Router with High-Availability mode Active-Passive. In order the NCP knows the correct Tier-0 Logical Router a tag like this needs to be applied:

Tag Scope
pandora ncp/cluster

When applied it should look like:

NSX-T Tier-0 Router

To communicate with the physical network the Tier-0 requires an uplink IP address. In order to add uplink IP address we require creating a network segment backed by VLAN.

VLAN Backed Segment

We can now configure an interface on the Tier-0 with IP address

NSX-T Tier-0 Uplink

To enable ExternalIP routing from physical network add a static route to physical router directing to

NSX IP Block for internal ClusterIP

NSX-T has an inbuilt capability for IP Management, in which we can allocate blocks of IP Addresses and create IP Pools.

The NCP requires an IP Block for issuing internal ClusterIP. Create the IP address block named ‘k8s-1-internal’ with tags applied like this:

Tag Scope
pandora ncp/no_snat
pandora ncp/cluster

When applied it should look like:

Internal Block

NSX IP Pool for ExternalIP

The NCP requires an IP Pool for issuing internal ExternalIP. First create an IP Block named ‘k8s-1-external’ with CIDR This IP Block is not accessed directly by NCP so does not need any tags.

External Block

Once the IP Block is in place create an IP Pool named ‘k8s-1-loadbalancer’ which has a subnet issued from IP Block ‘k8s-1-external’ which is sized at 128 the IP Pool. The External IP Pool can be shared but should at least have tag applied like this:

Tag Scope
true ncp/external

When applied it should look like:

External Block Tag

Node Logical Switch

Network connectivity to the containers running in Kubernetes is provided by a NSX-T logical switch segment which is often referred to as the node logical switch. For this create a new segment called ‘node-logical-switch’ within the ‘overlayTransportZone’ connected to no gateway.

Node Logical Switch

Kubernetes Master VM

Create a Ubuntu 18.04 VM with 2x CPU, 4GB RAM and 50GB vHDD named ‘k8s-master’. The VM should have two vNIC one which is used to communicate with NSX API and which will host Kubernetes API. The second connected to the node logical switch which will have the Open vSwitch (OVS) bridge configured to give connectivity to the Pods. In my lab first connected to ‘VM Network’ enumerates as ens160 and the scond connected to ‘node-logical-switch’ enumerates as ens192.

cat /etc/netplan/01-netcfg.yaml 

  version: 2
  renderer: networkd
    ens192: {}

In order the NCP know which vNIC to configure the VM vNIC connected ‘node-logical-switch’ creates a segment port object in NSX-T this port must have tag applied like this:

Tag Scope
k8s-master ncp/node_name
pandora ncp/cluster

When applied it should look like:

Segment Port Tags

Install and Configure Kubernetes

Docker and Kubernetes require installation do this by running the following.

sudo apt update -y
sudo apt upgrade -y
sudo apt install python apt-transport-https -y
sudo gpasswd -a $USER docker
sudo systemctl start docker
sudo systemctl enable docker
curl -s | sudo apt-key add -
sudo apt-add-repository "deb kubernetes-xenial main"
sudo apt-get update
sudo swapoff -a 
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
sudo apt-get install -y kubelet=1.16.4-00 kubeadm=1.16.4-00 kubectl=1.16.4-00

Initialize the Kubernetes cluster by running the following:

sudo kubeadm init

In order to easily run kubectl as a user we need to copy the cluster configuration to the user profile, do this by running the following.

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

We can now connect to cluster and check the state of the nodes, do this by running the following on the Master node.

kubectl get nodes

Init Get Nodes

The status of each Node will show ‘NotReady’, we can get more details of why it is in this state by running the following on the Master node.

kubectl describe nodes k8s-master | grep Conditions -A9

Init Describe Nodes

With this we can see this reason for the error is ‘NetworkPluginNotReady’ that the cni plugin is not initiated.

Install and Configure NSX Container Plug-in (NCP)

The NSX Container Plug-in (NCP) provides integration between NSX-T and Kubernetes, it is a containerised application which manages communicates between NSX Manager and the Kubernetes control plane. The NSX Container Plug-in (NCP) application runs in Kubernetes it is supplied as .zip download. The configuration of the NCP applications is maintained in a Kubernetes manifest file.

The NCP application ships a container image file we could deploy this to a container registry but here we will just upload the image file to VM and import the image to the local docker repository. The default container image name specified in the manifest is nsx-ncp so we can apply that as tag on the image.

sudo docker load -i /home/ubuntu/nsx-ncp-ubuntu-
sudo docker image tag registry.local/ nsx-ncp
sudo docker images | grep ncp

Prior to the 2.5 release there were multiple manifest files each targetting a specific area of configuration. These are now merged into a single manifest file with multiple sections with multiple resource specifications the sections can be identified by the separator ---. The resources are created in the order they appear in the file.

Resource Kind Resource Name Comments
Namespace nsx-system
ServiceAccount ncp-svc-account
ClusterRole ncp-cluster-role
ClusterRole ncp-patch-role
ClusterRoleBinding ncp-cluster-role-binding
ClusterRoleBinding ncp-patch-role-binding
ServiceAccount nsx-node-agent-svc-account
ClusterRole nsx-node-agent-cluster-role
ClusterRoleBinding nsx-node-agent-cluster-role-binding
ConfigMap nsx-ncp-config Must update Kubernetes API and NSX API parameters
Deployment nsx-ncp
ConfigMap nsx-node-agent-config Must update Kubernetes API and NSX API parameters
DaemonSet nsx-ncp-bootstrap
DaemonSet nsx-node-agent

For this lab I will use mostly default configuration from the supplied manifest file template. The settings I change from what is in template are the environment specifics such as details for connecting to NSX API including whether to use imperative API or intent-based API, the ‘cluster’ name and the Node vNIC on which the OVS bridge gets created on.

policy_nsxapi = True

nsx_api_managers =
nsx_api_user = admin
nsx_api_password = VMware1!

insecure = True

cluster = pandora

apiserver_host_ip =
apiserver_host_port = 6443

ovs_uplink_port = ens192

Once the manifest file is updated and docker image in local registry we can apply the NCP manifest. Applying the manifest takes a couple of minutes while as it creates various Pods we can watch their creation to view progress.

kubectl apply --filename /home/ubuntu/ncp-ubuntu.yaml
kubectl get pods --output wide --namespace nsx-system --watch

Deploy NCP

If all has gone well we can take a look at the objects created within the namespace.

kubectl get all --namespace nsx-system

Deployed NCP

Now its running we can check health of the NCP, the NCP has a CLI where we can run various commands including health checks.

kubectl exec -it nsx-ncp-6978b9cb69-dj4k2 --namespace nsx-system -- /bin/bash 
get ncp-k8s-api-server status
get ncp-nsx status

NCP Health

NSX Policy UI Object Creation

With NSX integration when we deploy a Kubernetes namespace the NCP creates a corresponding segment, IP Pool (from internet range), subnet and Tier-1 router. If you open NSX Manager and view one of the object categories which should have objects created. Then create two Kubernetes namespaces one called development and one called production.

kubectl create --filename
kubectl create --filename

If you refresh view in NSX Manager you will see the new objects appear in the Policy UI.

NSX-T Tier-1

We can then remove these two namespaces the NCP removes the corresponding segment, IP Pool (from internet range), subnet and Tier-1 router.

kubectl delete -f
kubectl delete -f

NCP Bootstrap Scaleout Cluster

We started this lab with a single node cluster to look at the nsx-ncp-bootstrap Daemonset. When Nodes are added to the cluster this should install and configure the Node with NCP.

Create a second and third VM with same hardware configuration as k8s-master but name these k8s-worker-1 / k8s-worker-2 and give IPs / Ensure the NSX segment port attached to VMs is tagged correctly. Ensure the NCP docker image is uploaded, imported to local docker registry and has tag applied.

To add the additional Node to the cluster first step is to create a token on master.

kubeadm token create --print-join-command


We use the output of command from the master to add the additional worker nodes to the cluster. The nsx-ncp-bootstap and nsx-node-agent are DaemonSets this ensures that all Nodes run a copy of a Pod. When we add the worker nodes to the cluster we can see the nsx-ncp-bootstrap Pods initialize and configure the Node and the nsx-node-agent Pods initialize.

kubectl get pods -o wide --namespace nsx-system --watch

NCP Scale Out

IP Assignments

When a Pod is deployed it can be exposed as a Service, the service

If we deploy an simple stateless application example like guestbook.

kubectl create -f
kubectl config set-context --current --namespace=development
kubectl apply -f
kubectl apply -f
kubectl apply -f
kubectl apply -f
kubectl apply -f
kubectl apply -f

When we configures the lab we created NSX managed IP Block When created the development namespace got allocated a /24 subnet from this block. If we view the Pods get IP in correct range.

kubectl get pods --output wide

Internal IP Allocation

As well as Pod deployments the guestbook application installation also creates Service resources. A Service is an abstraction which defines a logical set of Pods. Our application runs three frontend pods and two Redis slaves the Service provides virtual IP. The kube-proxy is responsible for implementing the virtual IP for Services. We can see the ClusterIP are issued from the last /20 of the Pod address range. If we look in more detail at the Service resource we can see the three internal IP addresses behind it.

kubectl get services --namespace development
kubectl cluster-info dump | grep -m 1 service-cluster-ip-range
kubectl describe service frontend

Cluster IP

To facilitate communications between Pods the NCP is configuring the Open vSwitch on each Node. The Open vSwitch user space daemon runs on a container named nsx-ovs within the nsx-node-agent Pods. The Open vSwitch bridge name can be specified in the ncp.ini but defaults to br-int. If we connect to this container we can view the Open vSwitch flows .

kubectl exec -it nsx-node-agent-llf2c -c nsx-ovs bash --namespace nsx-system
ovs-ofctl dump-flows br-int

OVS Flows

We can then remove the namespace and all objects within.

kubectl delete -f

NSX Loadbalancer

NSX provides an load balancer capability to Kubernetes we can use this by creating service resource with type LoadBalancer. If we create a simple replicaset of five pods and expose this we can see it gets issued with IP Address from the IP Pool tagged with ncp/external = True. We can also see this in NSX Loadbalancer configuration.

kubectl create -f
kubectl config set-context --current --namespace=development
kubectl apply -f
kubectl get replicasets
kubectl expose deployment hello-world --type=LoadBalancer --name=hello-world-nsx-lb
kubectl get services hello-world-nsx-lb --watch

NAME                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
hello-world-nsx-lb   LoadBalancer   <pending>     8080:32091/TCP   7s
hello-world-nsx-lb   LoadBalancer   8080:32091/TCP   7s

We can then remove the namespace and all objects within.

kubectl delete -f

Kubernetes Pods Micro-Segmentation

One of the great usecases for NSX with vSphere has been the is the Distributed Firewall which protects VM workload at the Hypervisor layer. When we extend NSX into the Kubernetes container we also extend the Distributed Firewall capability. We can create firewall rules which contain members of groups, the groups can be dynamically populated by use of NSX tags. Kubernetes objects can have labels attached a label attached to a Pod is reflected in NSX as a tag on the segement port of the Pod.

A simple test might be to deploy a two tier app where frontend can talk to backend but frontend cannot talk to other frontend. If we create a NSX group called Web with membership criteria Segement Port, Tag, Equals web Scope secgroup. We then create a NSX DFW rule with Web as source and destination and action of drop.

With this in place we can ping test from one frontend pod to another frontend and backend and see this works. We can then apply the label to the three frontend web Pods so they become members of the NSX group and are affected by the firewall rule. With these in place we can retest ping from one frontend pod to another frontend and see that this is now blocked.

kubectl create -f
kubectl config set-context --current --namespace=development
kubectl apply -f
kubectl apply -f
kubectl get pods --output wide --watch

NAME                            READY   STATUS    RESTARTS   AGE   IP         NODE           NOMINATED NODE   READINESS GATES
frontend-6cb7f8bd65-4mctt       1/1     Running   0          13m   k8s-worker-1   <none>           <none>
frontend-6cb7f8bd65-8wkhr       1/1     Running   0          13m   k8s-worker-1   <none>           <none>
frontend-6cb7f8bd65-rtgc9       1/1     Running   0          13m   k8s-worker-1   <none>           <none>
redis-master-7db7f6579f-zlx26   1/1     Running   0          3s   k8s-worker-1   <none>           <none>

kubectl exec -it frontend-6cb7f8bd65-4mctt ping
PING ( 56 data bytes
64 bytes from icmp_seq=0 ttl=64 time=0.083 ms

kubectl exec -it frontend-6cb7f8bd65-4mctt ping
PING ( 56 data bytes
64 bytes from icmp_seq=0 ttl=64 time=3.191 ms

kubectl label pod frontend-6cb7f8bd65-4mctt secgroup=web
kubectl label pod frontend-6cb7f8bd65-8wkhr secgroup=web
kubectl label pod frontend-6cb7f8bd65-rtgc9 secgroup=web

kubectl exec -it frontend-6cb7f8bd65-4mctt ping
PING ( 56 data bytes
^C--- ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss
command terminated with exit code 1

kubectl exec -it frontend-6cb7f8bd65-4mctt ping
PING ( 56 data bytes
64 bytes from icmp_seq=0 ttl=64 time=5.672 ms

We can then remove the namespace and all objects within.

kubectl delete -f