Kubernetes and Helm: Enterprise Deployment Architecture

2024-07-20

Kubernetes promised declarative infrastructure: describe what you want, and the system converges toward it. In practice, the gap between that promise and reality is filled with YAML — mountains of it, duplicated across environments, drifting between what Git says and what the cluster actually runs. Helm and GitOps exist to close that gap, and in my experience, they are the difference between a Kubernetes deployment that scales to dozens of services and one that collapses under its own configuration weight.

What follows is the architecture I have converged on after deploying Kubernetes across multiple projects: Helm for packaging and parameterisation, ArgoCD for GitOps-driven delivery, and a set of patterns for high availability, security, and observability that have survived contact with production.

Kubernetes Architecture Fundamentals

Before reaching for Helm charts and GitOps pipelines, it is worth grounding ourselves in what Kubernetes actually is — because I have met too many engineers who can write a Deployment manifest but cannot explain what etcd does or why the scheduler matters.

Control Plane Components

The control plane manages the cluster state and makes scheduling decisions:

Kubernetes Architecture

Component	Responsibility
API Server	REST API, authentication, admission control
Scheduler	Pod placement decisions based on resources and constraints
Controller Manager	Runs controllers (ReplicaSet, Deployment, etc.)
etcd	Distributed key-value store for cluster state

Worker Node Components

Each worker node runs the workloads:

Component	Responsibility
Kubelet	Ensures containers are running in pods
Container Runtime	Runs containers (containerd, CRI-O)
Kube-proxy	Network proxy, implements Service abstraction

Cluster Topology

Production clusters typically span multiple availability zones:

Cluster Topology

Helm: Taming the YAML

Helm simplifies Kubernetes deployments by packaging related resources into charts. If you have ever copied a set of YAML files from one environment to another and manually changed the image tag, the replica count, and the ingress hostname — then forgotten one of them and spent an hour debugging — you already understand why Helm exists.

Why Helm?

Without Helm, deploying an application requires managing multiple YAML files:

# Without Helm - managing individual resources
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f configmap.yaml
kubectl apply -f secret.yaml
kubectl apply -f ingress.yaml
# Repeat for each environment with different values...

With Helm:

# With Helm - single command, parameterized
helm install my-app ./my-chart --values production.yaml

Helm Architecture

Key Concepts:

Concept	Description
Chart	Package containing Kubernetes resource templates
Release	Instance of a chart deployed to a cluster
Repository	Collection of charts (like npm registry)
Values	Configuration parameters for customization

Chart Structure

Helm Chart Structure

Chart.yaml

apiVersion: v2
name: my-application
description: A Helm chart for My Application
type: application
version: 1.2.0        # Chart version
appVersion: "2.1.0"   # Application version

dependencies:
  - name: postgresql
    version: "12.x.x"
    repository: https://charts.bitnami.com/bitnami
    condition: postgresql.enabled

Values and Templating

values.yaml - Default configuration:

replicaCount: 2

image:
  repository: my-registry.example.com/my-app
  tag: "latest"
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 8080

ingress:
  enabled: true
  className: nginx
  hosts:
    - host: app.example.com
      paths:
        - path: /
          pathType: Prefix

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilization: 80

postgresql:
  enabled: true
  auth:
    database: myapp

templates/deployment.yaml - Using Go templating:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "my-application.fullname" . }}
  labels:
    {{- include "my-application.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "my-application.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      labels:
        {{- include "my-application.selectorLabels" . | nindent 8 }}
      annotations:
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
    spec:
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: {{ .Values.service.port }}
          livenessProbe:
            httpGet:
              path: /health/live
              port: http
            initialDelaySeconds: 30
          readinessProbe:
            httpGet:
              path: /health/ready
              port: http
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          envFrom:
            - configMapRef:
                name: {{ include "my-application.fullname" . }}-config
            - secretRef:
                name: {{ include "my-application.fullname" . }}-secret

Environment-Specific Values

values-dev.yaml:

replicaCount: 1

ingress:
  hosts:
    - host: app-dev.example.com

resources:
  requests:
    cpu: 50m
    memory: 128Mi
  limits:
    cpu: 200m
    memory: 256Mi

autoscaling:
  enabled: false

values-prod.yaml:

replicaCount: 3

ingress:
  hosts:
    - host: app.example.com
  tls:
    - secretName: app-tls
      hosts:
        - app.example.com

resources:
  requests:
    cpu: 500m
    memory: 1Gi
  limits:
    cpu: 2000m
    memory: 2Gi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 20

CI/CD Integration

GitLab CI/CD Pipeline

CI/CD Pipeline

.gitlab-ci.yml:

stages:
  - build
  - test
  - package
  - deploy

variables:
  REGISTRY: registry.example.com
  IMAGE_NAME: $REGISTRY/my-app
  HELM_REPO: https://charts.example.com

build:
  stage: build
  image: docker:24
  services:
    - docker:24-dind
  script:
    - docker build -t $IMAGE_NAME:$CI_COMMIT_SHA .
    - docker push $IMAGE_NAME:$CI_COMMIT_SHA
  only:
    - main
    - develop

test:
  stage: test
  image: $IMAGE_NAME:$CI_COMMIT_SHA
  script:
    - npm test
  only:
    - main
    - develop

package-helm:
  stage: package
  image: alpine/helm:3.14
  script:
    - helm lint ./charts/my-app
    - helm package ./charts/my-app --version $CI_COMMIT_SHA
    - curl --data-binary "@my-app-$CI_COMMIT_SHA.tgz" $HELM_REPO/api/charts
  only:
    - main

deploy-dev:
  stage: deploy
  image: alpine/helm:3.14
  environment:
    name: development
    url: https://app-dev.example.com
  script:
    - helm upgrade --install my-app ./charts/my-app
        --namespace dev
        --values ./charts/my-app/values-dev.yaml
        --set image.tag=$CI_COMMIT_SHA
        --wait
  only:
    - develop

deploy-prod:
  stage: deploy
  image: alpine/helm:3.14
  environment:
    name: production
    url: https://app.example.com
  script:
    - helm upgrade --install my-app ./charts/my-app
        --namespace prod
        --values ./charts/my-app/values-prod.yaml
        --set image.tag=$CI_COMMIT_SHA
        --wait
  only:
    - main
  when: manual

Helm Repository Setup

Host charts in a repository for team access:

Using Nexus Repository:

# Add Helm repository
helm repo add mycompany https://nexus.example.com/repository/helm-hosted/
helm repo update

# Push chart to repository
curl -u admin:password https://nexus.example.com/repository/helm-hosted/ \
  --upload-file my-app-1.0.0.tgz

Using ChartMuseum:

# Deploy ChartMuseum
helm install chartmuseum chartmuseum/chartmuseum \
  --set persistence.enabled=true \
  --set persistence.size=10Gi

# Push chart
curl --data-binary "@my-app-1.0.0.tgz" http://chartmuseum.example.com/api/charts

GitOps with ArgoCD

The fundamental problem with helm upgrade in a CI pipeline is that the pipeline knows what it deployed, but nobody knows what the cluster is. Drift accumulates — a manual kubectl edit here, a hotfix applied directly there — and eventually Git and the cluster disagree. ArgoCD solves this by inverting the flow: Git is the source of truth, and the cluster continuously reconciles toward it.

ArgoCD Architecture

Installing ArgoCD

# Create namespace
kubectl create namespace argocd

# Install ArgoCD
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Access UI
kubectl port-forward svc/argocd-server -n argocd 8080:443

# Get initial admin password
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

Application Definition

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-application
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://gitlab.example.com/team/my-app.git
    targetRevision: main
    path: charts/my-app
    helm:
      valueFiles:
        - values-prod.yaml
      parameters:
        - name: image.tag
          value: "v2.1.0"
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

ApplicationSet for Multi-Environment

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: my-application-set
  namespace: argocd
spec:
  generators:
    - list:
        elements:
          - env: dev
            namespace: development
            values: values-dev.yaml
          - env: staging
            namespace: staging
            values: values-staging.yaml
          - env: prod
            namespace: production
            values: values-prod.yaml
  template:
    metadata:
      name: 'my-app-{{env}}'
    spec:
      project: default
      source:
        repoURL: https://gitlab.example.com/team/my-app.git
        targetRevision: main
        path: charts/my-app
        helm:
          valueFiles:
            - '{{values}}'
      destination:
        server: https://kubernetes.default.svc
        namespace: '{{namespace}}'
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

Multi-Environment Deployment

Environment Separation

Separate environments using namespaces or clusters:

Environment Separation

Resource Quotas per Environment

apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-quota
  namespace: development
spec:
  hard:
    requests.cpu: "4"
    requests.memory: 8Gi
    limits.cpu: "8"
    limits.memory: 16Gi
    pods: "20"
    services: "10"
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: prod-quota
  namespace: production
spec:
  hard:
    requests.cpu: "32"
    requests.memory: 64Gi
    limits.cpu: "64"
    limits.memory: 128Gi
    pods: "100"
    services: "50"

High Availability Patterns

High availability in Kubernetes is not automatic — it is designed. A three-replica Deployment means nothing if all three pods land on the same node and that node fails. The following patterns ensure that availability survives the failures that actually happen in production.

Pod Anti-Affinity

Spread pods across nodes and zones:

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app: my-application
          topologyKey: kubernetes.io/hostname
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                app: my-application
            topologyKey: topology.kubernetes.io/zone

Pod Disruption Budget

Ensure minimum availability during updates:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-application-pdb
spec:
  minAvailable: 2    # Or use maxUnavailable: 1
  selector:
    matchLabels:
      app: my-application

Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-application-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-application
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60

Storage and Persistence

Storage Classes

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs  # Or your provider
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Persistent Volume Claims

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 100Gi

StatefulSet for Databases

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgresql
spec:
  serviceName: postgresql
  replicas: 3
  selector:
    matchLabels:
      app: postgresql
  template:
    metadata:
      labels:
        app: postgresql
    spec:
      containers:
        - name: postgresql
          image: postgres:15
          ports:
            - containerPort: 5432
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 50Gi

Networking and Ingress

Ingress Controller

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-application
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - app.example.com
      secretName: app-tls
  rules:
    - host: app.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-application
                port:
                  number: 8080

Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: my-application-policy
spec:
  podSelector:
    matchLabels:
      app: my-application
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
        - podSelector:
            matchLabels:
              app: frontend
      ports:
        - protocol: TCP
          port: 8080
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgresql
      ports:
        - protocol: TCP
          port: 5432
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53

Monitoring and Observability

Prometheus Stack

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword=admin

ServiceMonitor for Application

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-application
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: my-application
  endpoints:
    - port: http
      path: /metrics
      interval: 30s

Key Metrics Dashboard

Metric	Description	Alert Threshold
`container_cpu_usage_seconds_total`	CPU usage	> 80% for 5min
`container_memory_usage_bytes`	Memory usage	> 85% of limit
`kube_pod_status_ready`	Pod readiness	< desired replicas
`http_requests_total`	Request count	Rate anomaly
`http_request_duration_seconds`	Latency	p99 > 1s

Security Best Practices

Pod Security Standards

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Security Context

spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 1000
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: app
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        capabilities:
          drop:
            - ALL

Secrets Management

# Using External Secrets Operator
helm repo add external-secrets https://charts.external-secrets.io
helm install external-secrets external-secrets/external-secrets

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: my-application-secrets
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault-backend
    kind: ClusterSecretStore
  target:
    name: my-application-secret
  data:
    - secretKey: database-password
      remoteRef:
        key: secret/my-app
        property: db_password

Final Thoughts

Kubernetes is simultaneously the best and worst thing to happen to infrastructure engineering. Best, because it provides a universal abstraction layer that works the same way on a developer’s laptop and in a multi-region production cluster. Worst, because that abstraction comes with a complexity tax that teams underestimate until they are three months into a migration and drowning in YAML.

Helm, ArgoCD, and the patterns described here do not eliminate that complexity — they manage it. Helm gives you parameterisation so you stop copying files. ArgoCD gives you reconciliation so you stop wondering what the cluster actually looks like. Pod anti-affinity, PDBs, and HPA give you resilience so you stop waking up at 3 AM. And proper observability gives you visibility so you can distinguish a genuine incident from a false alarm.

None of this is magic. It is discipline, encoded in configuration. The architecture scales from small teams to enterprise deployments, but only if you invest the time to understand the primitives before layering the abstractions. Start with Kubernetes fundamentals, earn your way to Helm, and adopt GitOps when — not before — you have the operational maturity to trust the reconciliation loop.

Kubernetes and Helm: Enterprise Deployment Architecture

A guide to production-ready container orchestration.

Achraf SOLTANI — July 20, 2024

The Sanctuary

Kubernetes Architecture Fundamentals

Control Plane Components

Worker Node Components

Cluster Topology

Helm: Taming the YAML

Why Helm?

Helm Architecture

Chart Structure

Chart.yaml

Values and Templating

Environment-Specific Values

CI/CD Integration

GitLab CI/CD Pipeline

Helm Repository Setup

GitOps with ArgoCD

ArgoCD Architecture

Installing ArgoCD

Application Definition

ApplicationSet for Multi-Environment

Multi-Environment Deployment

Environment Separation

Resource Quotas per Environment

High Availability Patterns

Pod Anti-Affinity

Pod Disruption Budget

Horizontal Pod Autoscaler

Storage and Persistence

Storage Classes

Persistent Volume Claims

StatefulSet for Databases

Networking and Ingress

Ingress Controller

Network Policies

Monitoring and Observability

Prometheus Stack

ServiceMonitor for Application

Key Metrics Dashboard

Security Best Practices

Pod Security Standards

Security Context

Secrets Management

Final Thoughts