Kubernetes HA Setup with HAProxy and Keepalived

Kubernetes (K8s) is an open-source platform designed to automate deploying, scaling, and managing containerized applications.Instead of running apps directly on servers or VMs, Kubernetes lets you define how apps should run, but in prod environment you need multiple master nodes to increase cluster high availabilty so that everything keeps running and stay balanced. This document shows what steps you need to do for setting up a High availablity kubernetes cluster in air gap environment.

Prerequisites

This guide assumes a freshly installed Oracle 9 system on either physical hardware or a virtual machine.

A compatible Linux host. The Kubernetes project provides generic instructions for Linux distributions based on Debian and Red Hat, and those distributions without a package manager.
2 GB or more of RAM per machine (any less will leave little room for your apps).
2 CPUs or more for each control plane machines.
Full network connectivity between all machines in the cluster (public or private network is fine).
Unique hostname, MAC address, and product_uuid for every node. See here for more details.
Certain ports are open on your machines.

Cluster Details

Nodes	Details
MasterNode1	192.168.122.23
MasterNode2	192.168.122.24
MasterNode3	192.168.122.25
WorkerNode1	192.168.122.26
VIP	192.168.122.28
HTTP_PROXY	http://guest:pass@Devproxy:3129
docker registry	192.168.122.29

Note: Point VIP on MasterNode1

Operating System Requirements

In order to reliably run Kubernetes a few changes are needed to the base CentOS 8 install. The following prerequisite steps will need to be applied to all nodes in your cluster.

Disable SELinux

setenforce 0 && \
sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux

Disable swap

swapoff -a && \
sed -e '/swap/s/^/#/g' -i /etc/fstab

Disable firewalld

systemctl disable --now firewalld

Use iptables for Bridged Network Traffic

cat <<EOF > /etc/sysctl.d/iptables-bridge.conf
EOF
sysctl --system

Enable routing

cat <<EOF > /etc/sysctl.d/ip-forward.conf
net.ipv4.ip_forward = 1
EOF
sysctl --system

Disable AutoDNS update on all nodes

Command to check the current settings:

nmcli connection show ens192 | grep ipv4.ignore-auto-dns

Command to update the current settings:

nmcli connection modify ens192 ipv4.dhcp-send-hostname no
nmcli connection modify ens192 ipv4.ignore-auto-dns yes
nmcli connection down ens192 && nmcli connection up ens192

Setup Yum Repo for K8s on all nodes for Oracle 9

vi /etc/yum.repos.d/kubernetes.repo

[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.35/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.35/rpm/repodata/repomd.xml.key

Install HAProxy and Keepalived on all control plane nodes

yum install keepalived haproxy -y

Setup HAProxy Cfg on all nodes as below save and restart haproxy

vi /etc/haproxy/haproxy.cfg

global
    log /dev/log    local0
    log /dev/log    local1 notice
    chroot /var/lib/haproxy
    stats socket /var/lib/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon
 
defaults
    log global
    mode tcp
    option tcplog
    option dontlognull
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms
 
listen kubernetes-apiserver
    bind 0.0.0.0:6442
    mode tcp
    option tcp-check
    balance roundrobin
server master1 192.168.122.23:6443 check fall 2 rise 2
server master2 192.168.122.24:6443 check fall 2 rise 2
server master3 192.168.122.25:6443 check fall 2 rise 2
 
listen stats
    bind 0.0.0.0:80
    mode http
    log global
    maxconn 10
    stats enable
    stats uri /haproxy_stats
    stats refresh 10s
    stats auth admin:password

Start and Enable Haproxy Service:

systemctl restart haproxy.service
systemctl enable haproxy.service
systemctl status haproxy.service

Keepalived conf for Master Node1

Note: Here in this conf make sure to replace interface name with your actual NIC name, replace vip and subnet accordingly.

vi /etc/keepalived/keepalived.conf

vrrp_script chk_haproxy {
    script "killall -0 haproxy" # Check if HAProxy process is running
    interval 2 # check every 2 seconds
    weight 2
}
 
vrrp_instance VI_1 {
    state MASTER
    interface ens192
    virtual_router_id 51
    priority 101
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass mysecretpassword
    }
    virtual_ipaddress {
192.168.122.28/32 dev ens192 label ens192:1 noprefixroute # Your VIP
    }
    nopreempt
    track_script {
        chk_haproxy
    }
}

Start and Enable keepalived Service:

systemctl enable keepalived.service
systemctl start keepalived.service
systemctl status keepalived.service

Keepalived conf for Master Node2

vi /etc/keepalived/keepalived.conf

vrrp_script chk_haproxy {
    script "killall -0 haproxy"
    interval 2
    weight 2
}
  
vrrp_instance VI_1 {
    state BACKUP
    interface ens192
    virtual_router_id 51
    priority 100 # Medium priority
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass mysecretpassword
    }
    virtual_ipaddress {
192.168.122.28/32 dev ens192 label ens192:1 noprefixroute
    }
    nopreempt
    track_script {
        chk_haproxy
    }
}

Start and Enable keepalived Service:

systemctl enable keepalived.service
systemctl start keepalived.service
systemctl status keepalived.service

Keepalived conf for Master Node3

vi /etc/keepalived/keepalived.conf

vrrp_script chk_haproxy {
    script "killall -0 haproxy"
    interval 2
    weight 2
}
  
vrrp_instance VI_1 {
    state BACKUP
    interface ens192
    virtual_router_id 51
    priority 99
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass mysecretpassword
    }
    virtual_ipaddress {
192.168.122.28/32 dev ens192 label ens192:1 noprefixroute
    }
    nopreempt
    track_script {
        chk_haproxy
    }
}

Start and Enable keepalived Service:

systemctl enable keepalived.service
systemctl start keepalived.service
systemctl status keepalived.service

Install Containerd on All Nodes

###Load Kernel Modules

cat <<EOF | tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF

modprobe overlay && \
modprobe br_netfilter

Install Containerd

yum install containerd.io -y

Configure cgroups

CONTAINDERD_CONFIG_PATH=/etc/containerd/config.toml && \
rm "${CONTAINDERD_CONFIG_PATH}" && \
containerd config default > "${CONTAINDERD_CONFIG_PATH}" && \
sed -i "s/SystemdCgroup = false/SystemdCgroup = true/g" "${CONTAINDERD_CONFIG_PATH}"

Edit Containerd config

In case you are using local private docker registry than update containerd config file. Edit file vi /etc/containerd/config.toml.

look for [plugins.“io.containerd.grpc.v1.cri”.registry.configs] section and add following lines below it to allowing pulling images from insecure and private repo

      [plugins."io.containerd.grpc.v1.cri".registry.configs."192.168.122.29:5000".tls]
      insecure_skip_verify = true
      [plugins."io.containerd.grpc.v1.cri".registry.configs."ghcr.io".tls]
      insecure_skip_verify = true
      [plugins."io.containerd.grpc.v1.cri".registry.configs."quay.io".tls]
      insecure_skip_verify = true

look for [plugins.“io.containerd.grpc.v1.cri”.registry.mirrors] section and add following lines below it

        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."192.168.122.29:5000"]
        endpoint = ["http://192.168.122.29:5000"]
        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."ghcr.io"]
        endpoint = ["https://ghcr.io/v2/"]
        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."quay.io"]
        endpoint = ["https://quay.io"]

Edit Containerd Service file (vi /usr/lib/systemd/system/containerd.service) and in [Service] section before ExecStartPre add proxy and no_proxy as shown belo also replace the no proxy IPs with the nodes IPs

# Copyright The containerd Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
  
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target local-fs.target dbus.service
  
[Service]
#uncomment to enable the experimental sbservice (sandboxed) version of containerd/cri integration
#Environment="ENABLE_CRI_SANDBOXES=sandboxed"
Environment="HTTP_PROXY=http://guest:pass@Devproxy:3129"
Environment="HTTPS_PROXY=http://guest:pass@Devproxy:3129"
Environment="NO_PROXY=localhost,127.0.0.1,192.168.122.24,192.168.122.25,192.168.122.26,192.168.122.23,192.168.122.28,10.96.0.0/12,10.244.0.0/16"
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/bin/containerd
  
Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=infinity
# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
OOMScoreAdjust=-999
 
[Install]
WantedBy=multi-user.target

Restart Containerd on all nodes

systemctl enable --now containerd && \
systemctl restart containerd

Installing Kubernetes on all nodes

yum install -y kubeadm kubectl kubelet

Setup Proxy for All Nodes (vi /etc/profile) and (source /etc/profile)

export NO_PROXY=localhost,127.0.0.1,192.168.122.24,192.168.122.25,192.168.122.26,192.168.122.23,192.168.122.28,10.96.0.0/12,10.244.0.0/16
export http_proxy=http://guest:pass@Devproxy:3129
export https_proxy=http://guest:pass@Devproxy:3129
export KUBECONFIG=/etc/kubernetes/admin.conf

On master node:

systemctl enable --now kubelet

Edit kubelet service (vi /usr/lib/systemd/system/kubelet.service) add proxy and no proxy before ExecStart

[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/
Wants=network-online.target
After=network-online.target
  
[Service]
Environment="HTTP_PROXY=http://guest:pass@Devproxy:3129"
Environment="HTTPS_PROXY=http://guest:pass@Devproxy:3129"
Environment="NO_PROXY=localhost,127.0.0.1,192.168.122.24,192.168.122.25,192.168.122.26,192.168.122.23,192.168.122.28,10.96.0.0/12,10.244.0.0/16"
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
RestartSec=10
 
[Install]
WantedBy=multi-user.target

Master Node -

Run Kubeadm init -

kubeadm init --control-plane-endpoint "192.168.122.28:6442" --upload-certs --pod-network-cidr=10.244.0.0/16 --cri-socket unix:///var/run/containerd/containerd.sock

KubeConfig If you want to permanently enable kubectl access for the root account, you will need to copy the Kubernetes admin configuration to your home directory as shown below.

mkdir -p $HOME/.kube && \
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config && \
sudo chown $(id -u):$(id -g) $HOME/.kube/config

To remove the Control Plane taint:

kubectl taint nodes --all node-role.kubernetes.io/control-plane:NoSchedule-

nodes List

kubectl get nodes

Pod Network

wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f kube-flannel.yml

Note: When kubernetes master is initialized it will give 2 join cmds, one for control plane nodes and one for worker nodes

You can now join any number of control-plane nodes running the following command on each as root:

 kubeadm join 192.168.122.28:6442 --token 0whskv.uy1iw1z0xotk2fqb \
    --discovery-token-ca-cert-hash sha256:877f277cb0e31334d18dd22f19b0b1022343321e592fa88e17d7813f4cd71c0a \
    --control-plane --certificate-key 4e12940b10fce119737efa7208c9f6a360d6c85776fe0fb546e517e9621df93d

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.122.28:6442 --token 0whskv.uy1iw1z0xotk2fqb \
    --discovery-token-ca-cert-hash sha256:877f277cb0e31334d18dd22f19b0b1022343321e592fa88e17d7813f4cd71c0a

Control Plane Nodes

 kubeadm join 192.168.122.28:6442 --token 0whskv.uy1iw1z0xotk2fqb \
    --discovery-token-ca-cert-hash sha256:877f277cb0e31334d18dd22f19b0b1022343321e592fa88e17d7813f4cd71c0a \
    --control-plane --certificate-key 4e12940b10fce119737efa7208c9f6a360d6c85776fe0fb546e517e9621df93d

systemctl enable kubelet

Worker nodes

In the output of kubeadm cmd, we will get the join cmd which we have to run on worker nodes to join the cluster or we can use below cmd to print the join cmd explicitly.

kubeadm token create --print-join-command

Run on Worker nodes

kubeadm join 192.168.122.55:6443 --token auhgtc.z1afr0aqrtyg2yr7 --discovery-token-ca-cert-hash sha256:a3aa78b20c08d9fbacc6cf9998019ea6c885b06ad6a818a866824ad7a7f6ff56

Enable kubelet service

systemctl enable kubelet

lable worker node as worker

kubectl label nodes WorkerNode1 node-role.kubernetes.io/worker=worker

Setup Helm and Ingress Controller

curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash

export GODEBUG=x509negativeserial=1
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update

helm install nginx-ingress ingress-nginx/ingress-nginx
kubectl get pods -n default -l app.kubernetes.io/name=ingress-nginx
kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission


-------------------------------------------------------------------------------
To deploy Nginx Controller on Specific node create a values.yaml file as below
values.yaml
------------------------------
controller:
 nodeSelector:
    kubernetes.io/hostname: MasterNode1
------------------------------
And Run below cmds
export GODEBUG=x509negativeserial=1
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx -f values.yaml
kubectl delete -A ValidatingWebhookConfiguration crypt-ingress-nginx-admission

Setup Metabl LB

METALLB_VERSION=0.13.12 && \
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v${METALLB_VERSION}/config/manifests/metallb-nati…


cat <<EOF > /tmp/metallb-ipaddrpool.yml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
 name: first-pool
 namespace: metallb-system
spec:
 addresses:
 - 192.168.122.23-192.168.122.26
EOF


kubectl get ValidatingWebhookConfiguration
kubectl delete -A ValidatingWebhookConfiguration metallb-webhook-configuration
kubectl create -f /tmp/metallb-ipaddrpool.yml


cat <<EOF > /tmp/metallb-ipaddrpool-advert.yml
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
 name: example
 namespace: metallb-system
spec:
 ipAddressPools:
 - first-pool
EOF

kubectl create -f /tmp/metallb-ipaddrpool-advert.yml

Setup csi driver

Setup csi driver so that you can use windows shared Drive with pod pv and pvc.

helm repo add csi-driver-smb https://raw.githubusercontent.com/kubernetes-csi/csi-driver-smb/master/charts
helm install csi-driver-smb csi-driver-smb/csi-driver-smb --namespace kube-system

INFO

label worker nodes as worker (by default is none)

Use **kubectl get nodes** cmd to see node name to label
kubectl label nodes <nodename> node-role.kubernetes.io/worker=worker

To increase event retention time for events managed by K8S(Which is by default 1hr)

vi /etc/kubernetes/manifests/kube-apiserver.yaml
and add in command section of containers

    - --event-ttl=168h

Kubernetes HA Setup with HAProxy and Keepalived

Prerequisites

Cluster Details

Operating System Requirements

Setup Yum Repo for K8s on all nodes for Oracle 9

Install HAProxy and Keepalived on all control plane nodes

Setup HAProxy Cfg on all nodes as below save and restart haproxy

Keepalived conf for Master Node1

Keepalived conf for Master Node2

Keepalived conf for Master Node3

Install Containerd on All Nodes

Installing Kubernetes on all nodes

Setup Proxy for All Nodes (vi /etc/profile) and (source /etc/profile)

On master node:

Master Node -

Control Plane Nodes

Worker nodes

Setup Helm and Ingress Controller

Setup Metabl LB

Setup csi driver

INFO

Kubernetes HA Setup with …

Loki Dashboard

Ansible Modules