Kubernetes E2E Testing with Chainsaw

Eric Bailey

Written on 27 August, 2024
Updated on 27 August, 2024
Tags: ,

Chainsaw was developed to be used internally to continuously test Kyverno. Due to its declarative YAML-based design and highly expressive assertion model, Chainsaw's assertion model is based on kyverno-json. it's generally useful for managing complex end to end tests.

A common problem domain on Kubernetes is management of DNS records to point to load balancers and certificates to secure them. Arguably, the industry standard tools are ExternalDNS and cert-manager, respectively. Both rely on being able to manage particular DNS records, Such as CNAME records to point to Ingresses of type LoadBalancer and TXT records to solve DNS-01 challenges. which requires appropriate cloud provider permissions, proper configuration ensuring the tools point to the correct DNS zones, etc. While it's fairly easy to reason about, there are many moving parts points of failure, making it complex to test.

Handsaw
Figure 1: Photo by JD Hancock cc

Fortunately, Chainsaw makes it easy to define a Test covering the DNS management scenario.

This post is a literate program using Org Mode's noweb style syntax. The tangled results can be found on GitHub.

apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
  name: dns
spec:
  description: |
    Verify the DNS management provided by cert-manager and ExternalDNS.
  steps:
    - name: create-ingress
      description: |
        Create an Ingress with rules for hosts matching each of the zones and
        names configured in the ClusterIssuer/letsencrypt-staging, including a
        single TLS certificate for all the DNS names.
      try:
        - description: |
            Parse the solvers from the ClusterIssuer/letsencrypt-staging.
          <<Parse the solvers from the ClusterIssuer>>
        - description: |
            Create an nginx Deployment and Service as a backend.
          <<Create an nginx Deployment and Service as a backend>>
        - description: |
            Create the Ingress.
          <<Create an Ingress for all zones and names, including TLS>>
    - name: verify-ingress
      <<Verify the expected Ingress>>
    - name: verify-certificate
      <<Verify the expected Certificate>>

For the first step, That is, a TestStep. assume the existence of a ClusterIssuer/letsencrypt-staging similar to the following, and parse its DNS-01 solvers.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    privateKeySecretRef:
      name: letsencrypt-staging
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    solvers:
      - dns01:
          route53: {}
        selector:
          dnsZones:
            - foo.example.com
      - dns01:
          route53: {}
        selector:
          dnsZones:
            - bar.example.net
      - dns01:
          route53: {}
        selector:
          dnsNames:
            - baz.example.org

N.B. This example also assumes IRSA is configured.

This ClusterIssuer is responsible for solving DNS-01 challenges for the foo.example.com and bar.example.net zones, and for the specific DNS name baz.example.org.

First, use kubectl's JSONPath support to extract just the solvers.

kubectl get clusterissuer/letsencrypt-staging \
    --output jsonpath='{.spec.acme.solvers}'

Parse stdout and bind the result to $solvers. Refer to the Chainsaw documentation on loading an existing resource.

- name: solvers
  value: (json_parse($stdout))

From $solvers extract all the DNS names and zones as two flat arrays, leveraging Chainsaw's JMESPath support.

- name: dns_names
  value: ($solvers[?selector.dnsNames].selector.dnsNames[])
- name: dns_zones
  value: ($solvers[?selector.dnsZones].selector.dnsZones[])

Bring it all together to define the first Operation.

script:
  content: |
    <<Extract the solvers>>
  outputs:
    <<Parse stdout and bind the result to $solvers>>
    <<Extract the names and zones>>

Define a simple backend, using a vanilla nginx container, to test the DNS management setup. Include a Deployment, Service, and, mostly importantly, an Ingress.

apply:
  file: nginx.yaml

Since the particular backend is irrelevant to the test, just use a Deployment with a single nginx:alpine container to keep it simple.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: nginx
  template:
    metadata:
      labels:
        app.kubernetes.io/name: nginx
    spec:
      containers:
        - name: nginx
          image: nginx:alpine
          ports:
            - containerPort: 80
              name: http
              protocol: TCP

Create a Service using the same selectors as the Deployment, which will be referenced by the Ingress rules.

---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  selector:
    app.kubernetes.io/name: nginx
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: http
type: ClusterIP

With the boring Deployment and Service out of the way, define the Ingress, using several key Chainsaw features.

This is one of the major selling points of Chainsaw. For the Ingress, use bindings to declare $hosts based on the $dns_names and $dns_zones.

bindings:
  - name: hosts
    value: ([$dns_names, $dns_zones[].join('.', ['burek', @])][])
  - name: secret_name
    value: (join('-', [$namespace, 'tls']))

Define rules for each host in $hosts, again using JMESPath.

rules: |-
  ($hosts[].{"host": @, "http": {"paths": [{"backend": {"service": {"name": 'nginx', "port": {"number": `80`}}}, "path": '/', "pathType": 'ImplementationSpecific'}]}})

Request a single certificate for each DNS name, in order to verify cert-manager is working as expected.

tls: |-
  ([{"hosts": $hosts, "secretName": $secret_name}])

Finally, define the Operation to create the Ingress. N.B. The Ingress assumes an existing IngressClass/alb, such as part of an installation of the AWS Load Balancer Controller.

apply:
  <<Define some bindings for the Ingress>>
  resource:
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      annotations:
        alb.ingress.kubernetes.io/scheme: internal
        alb.ingress.kubernetes.io/target-type: ip
        cert-manager.io/cluster-issuer: letsencrypt-staging
      name: ($namespace)
    spec:
      ingressClassName: alb
      <<Define rules for each host in $hosts>>
      <<Declare a single TLS secret for all hosts>>

With those resources created, it's time to verify the Ingress. Verifying DNS records is left as an exercise to the reader. The main idea here is to use a JMESPath expression and assert the result is true, i.e., ensure the presence of status.loadBalancer.ingress, meaning a load balancer was successfully provisioned. cf. Beyond simple equality

description: |
  Verify the Ingress and dump ExternalDNS logs. Otherwise dump LBC logs.
try:
  - description: |
     Ensure the Ingress has successfully provisioned a load balancer
     within 5m.
    assert:
      resource:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        metadata:
          name: ($namespace)
        (status.loadBalancer.ingress != null): true
      timeout: 5m
catch:
  - description: |
      Dump LBC logs.
    podLogs:
      container: aws-load-balancer-controller
      namespace: kube-system
      selector: app.kubernetes.io/name=aws-load-balancer-controller
      tail: 30
finally:
  - description: |
      Dump ExternalDNS logs.
    podLogs:
      container: external-dns
      namespace: kube-system
      selector: app.kubernetes.io/name=external-dns
      tail: 30

Verifying the Certificate is more straightforward, not least because Certificate is a Kubernetes resource. cert-manager won't mark it as ready until Let's Encrypt has successfully issued it. The issuance shouldn't take more than five minutes, It's also possible to configure timeouts globally
instead of per operation.
so give up and dump the last 30 cert-manager log lines if it's been longer.

description: |
  Verify the expected certificate. Otherwise dump cert-manager logs.
try:
  - description: |
      Ensure the Certificate is ready within 5m.
    assert:
      resource:
        apiVersion: cert-manager.io/v1
        kind: Certificate
        metadata:
          name: (join('-', [$namespace, 'tls']))
        status:
          (conditions[?type == 'Ready']):
            - status: "True"
      timeout: 5m
catch:
  - description: |
      Dump cert-manager logs.
    podLogs:
      container: cert-manager-controller
      namespace: cert-manager
      selector: app.kubernetes.io/name=cert-manager
      tail: 30

Chainsaw's power makes it an indispensable tool. Writing complex tests using YAML Admittedly, said YAML is rather complex and includes extra features such as JMESPath support. is a pleasant experience, compared to something like Go with substantial and unavoidable boilerplate. As a bonus, it has a very Kubernetes-native feel. This post only scratches the surface of what Chainsaw can do. It supports kind, multi-cluster setups, building documention of tests, and much more!

Let's head down into that cellar and carve ourselves a witch.