Douglas Hellinger
Douglas Hellinger
Creator of this blog.
Sep 10, 2023 10 min read

Exploring OCI Registries: Chapter 2: Install a Public Helm Chart on a Private Kubernetes Cluster

thumbnail for this post

When you wanna install a public helm chart on a private kubernetes cluster…

See also How To Pull And Verify Any Public Helm Chart From An OCI Registry

Let’s continue the journey into OCI registries with another common use case:

Install A Public Helm Chart

Aisha picked the Tigera Operator for Calico Helm Chart.

Let’s install it…

helm repo add projectcalico \
https://docs.projectcalico.org/charts/
kubectl create namespace tigera-operator
helm install calico \
projectcalico/tigera-operator \
--version 3.26.1 \
--namespace tigera-operator

A moment later, the pod’s reporting ErrImagePull

kubectl get po -n tigera-operator
NAME                              READY   STATUS         RESTARTS   AGE
tigera-operator-5f4668786-tzjkp   0/1     ErrImagePull   0          7s

Oh Yeah, Forgot. Route To The Internet Is Blocked

Architecture Diagram Showing Private Kubernetes Cluster

The private kubernetes cluster can’t pull from the public internet.

But the private registry can. It operates as a pull-through cache for public images.

Architecture Diagram Showing Kubernetes Cluster Pull From Private OCI Registry Proxy

The cluster must pull images from the private registry.

Better Update All The Image Refs To Pull From The Private Registry

The default values file reveals images from 2 different public registries: quay.io and docker.io.

Looks like registry can be overridden for the operator.

tigeraOperator:
  image: tigera/operator
  version: v1.30.4
  registry: quay.io

For calicoctl, we prefix the registry in the image ref.

calicoctl:
  image: docker.io/calico/ctl
  tag: v3.26.1

Let’s update the registry for tigera/operator so that it pulls from our private registry proxy: k3d-quay-io-mirror:5000.

Likewise, let’s update the registry for calicoctl to pull from our other registry proxy, k3d-docker-io-mirror:5000.

tigeraOperator:
  image: tigera/operator
  version: v1.30.4
  registry: k3d-quay-io-mirror:5000

calicoctl:
  image: k3d-docker-io-mirror:5000/calico/ctl
  tag: v3.26.1

There. That should fix it.

Upgrade The Release With the Custom Values

Upgrade helm release with a new revision, specifying the custom values file.

helm upgrade calico \
projectcalico/tigera-operator \
--namespace tigera-operator \
--version v3.26.1 \
--install \
--values tigera-operator-values.yaml \
--debug

Now the Operator pod is Running:

kubectl get po -n tigera-operator
NAME                               READY   STATUS    RESTARTS   AGE
tigera-operator-85bc6fb7bf-58m2m   1/1     Running   0          26s

How about the calico pods?

kubectl get po -n calico-system
NAME                                       READY   STATUS                  RESTARTS   AGE
calico-typha-bfc4ccd59-cmgsf               0/1     ImagePullBackOff        0          23s
calico-kube-controllers-67b48b6f9c-5pgp9   0/1     ImagePullBackOff        0          23s
calico-node-9g7mg                          0/1     Init:ImagePullBackOff   0          23s
csi-node-driver-h2tnp                      0/2     ImagePullBackOff        0          23s

Ah. ImagePullBackOff.

There’s at least 5* containers here that weren’t visible in the default values file.

* (calico-node-9g7mg is a Init Container, stuck at pull. Once it completes, there may be more.)

Examining the events, we can see Containerd failed to send to a HEAD request to registry-1.docker.io, preventing pull of all container images.

kubectl get events -n calico-system
LAST SEEN   TYPE      REASON              OBJECT                                          MESSAGE
51s         Warning   Failed              pod/csi-node-driver-h5wcv                       Error: ErrImagePull
40s         Normal    BackOff             pod/csi-node-driver-h5wcv                       Back-off pulling image "docker.io/calico/csi:v3.26.1"
40s         Warning   Failed              pod/csi-node-driver-h5wcv                       Error: ImagePullBackOff
40s         Normal    BackOff             pod/csi-node-driver-h5wcv                       Back-off pulling image "docker.io/calico/node-driver-registrar:v3.26.1"
40s         Warning   Failed              pod/csi-node-driver-h5wcv                       Error: ImagePullBackOff
36s         Normal    Pulling             pod/calico-kube-controllers-6488ff9f7b-2psct    Pulling image "docker.io/calico/kube-controllers:v3.26.1"
35s         Normal    Pulling             pod/calico-node-v9ms2                           Pulling image "docker.io/calico/pod2daemon-flexvol:v3.26.1"
32s         Normal    Pulling             pod/calico-typha-6db4d666b5-9ggp7               Pulling image "docker.io/calico/typha:v3.26.1"
31s         Warning   Failed              pod/calico-typha-6db4d666b5-9ggp7               Failed to pull image "docker.io/calico/typha:v3.26.1": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/calico/typha:v3.26.1": failed to resolve reference "docker.io/calico/typha:v3.26.1": failed to do request: Head "https://registry-1.docker.io/v2/calico/typha/manifests/v3.26.1": dial tcp: lookup registry-1.docker.io: Try again
31s         Warning   Failed              pod/calico-typha-6db4d666b5-9ggp7               Error: ErrImagePull
31s         Warning   Failed              pod/calico-kube-controllers-6488ff9f7b-2psct    Failed to pull image "docker.io/calico/kube-controllers:v3.26.1": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/calico/kube-controllers:v3.26.1": failed to resolve reference "docker.io/calico/kube-controllers:v3.26.1": failed to do request: Head "https://registry-1.docker.io/v2/calico/kube-controllers/manifests/v3.26.1": dial tcp: lookup registry-1.docker.io: Try again
31s         Warning   Failed              pod/calico-kube-controllers-6488ff9f7b-2psct    Error: ErrImagePull
31s         Warning   Failed              pod/calico-node-v9ms2                           Failed to pull image "docker.io/calico/pod2daemon-flexvol:v3.26.1": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/calico/pod2daemon-flexvol:v3.26.1": failed to resolve reference "docker.io/calico/pod2daemon-flexvol:v3.26.1": failed to do request: Head "https://registry-1.docker.io/v2/calico/pod2daemon-flexvol/manifests/v3.26.1": dial tcp: lookup registry-1.docker.io: Try again
31s         Warning   Failed              pod/calico-node-v9ms2                           Error: ErrImagePull
27s         Normal    Pulling             pod/csi-node-driver-h5wcv                       Pulling image "docker.io/calico/csi:v3.26.1"
22s         Warning   Failed              pod/csi-node-driver-h5wcv                       Failed to pull image "docker.io/calico/csi:v3.26.1": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/calico/csi:v3.26.1": failed to resolve reference "docker.io/calico/csi:v3.26.1": failed to do request: Head "https://registry-1.docker.io/v2/calico/csi/manifests/v3.26.1": dial tcp: lookup registry-1.docker.io: Try again
22s         Warning   Failed              pod/csi-node-driver-h5wcv                       Error: ErrImagePull
22s         Normal    Pulling             pod/csi-node-driver-h5wcv                       Pulling image "docker.io/calico/node-driver-registrar:v3.26.1"
17s         Warning   Failed              pod/csi-node-driver-h5wcv                       Failed to pull image "docker.io/calico/node-driver-registrar:v3.26.1": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/calico/node-driver-registrar:v3.26.1": failed to resolve reference "docker.io/calico/node-driver-registrar:v3.26.1": failed to do request: Head "https://registry-1.docker.io/v2/calico/node-driver-registrar/manifests/v3.26.1": dial tcp: lookup registry-1.docker.io: Try again
9s          Normal    BackOff             pod/calico-typha-6db4d666b5-9ggp7               Back-off pulling image "docker.io/calico/typha:v3.26.1"
9s          Warning   Failed              pod/calico-typha-6db4d666b5-9ggp7               Error: ImagePullBackOff
4s          Normal    BackOff             pod/calico-node-v9ms2                           Back-off pulling image "docker.io/calico/pod2daemon-flexvol:v3.26.1"
4s          Warning   Failed              pod/calico-node-v9ms2                           Error: ImagePullBackOff
2s          Normal    BackOff             pod/calico-kube-controllers-6488ff9f7b-2psct    Back-off pulling image "docker.io/calico/kube-controllers:v3.26.1"
2s          Warning   Failed              pod/calico-kube-controllers-6488ff9f7b-2psct    Error: ImagePullBackOff

No need to analyse further. We know there’s no route to the public internet from the cluster.

Tigera Operator creates pods that pull images from DockerHub. Dockerhub is blocked.

We wanna pull images from our private registry. How can we do that? 🤔

Update All Images Used By Subcharts

Are those images specified in the values of subcharts?

helm inspect chart projectcalico/tigera-operator
apiVersion: v2
appVersion: v3.26.1
description: Installs the Tigera operator for Calico
home: https://projectcalico.docs.tigera.io/about/about-calico
icon: https://projectcalico.docs.tigera.io/images/felix_icon.png
name: tigera-operator
sources:
- https://github.com/projectcalico/calico/tree/master/calico/_includes/charts/tigera-operator
- https://github.com/tigera/operator
- https://github.com/projectcalico/calico
version: v3.26.1

Nope. No subcharts defined here.

Oh Wait, There Are Still Some Images As Part Of CRD, Update Them Too

Override The Registry In The Installation CRD

Turns out the answer is in the Installation CRD Specification as hinted in the comment of the default values.

Screenshot Showing Calico Installation CRD Specification

Let’s override the registry value for the default Installation and try it

installation:
  enabled: true
  kubernetesProvider: ''
  registry: k3d-docker-io-mirror:5000

After a minute or two, the Calico pods are Running.

kubectl get po -n calico-system 
NAME                                      READY   STATUS                 RESTARTS   AGE
calico-typha-59d8dcf55d-jlz9p             1/1     Running                0          2m43s
calico-kube-controllers-d6b96cb6d-pntdg   1/1     Running                0          2m41s
calico-node-stzgr                         0/1     Running                0          2m41s
csi-node-driver-5cvj8                     1/2     CreateContainerError   0          2m42s

kubectl get po -n calico-apiserver 
NAME                                READY   STATUS    RESTARTS        AGE
calico-apiserver-75fb78b976-xhszq   1/1     Running   10 (123m ago)   2m41s
calico-apiserver-75fb78b976-chsfj   1/1     Running   10 (123m ago)   2m41s

Problem Statement: Devs Need To Identify & Override Image Registries

For every public Helm Chart, devs need to update all the image refs to point to our private registry.

That’s not standardised across Helm Charts and Kubernetes Operators.

Each chart may do it differently. If there are Custom Resource Definitions, there are custom ways to specify image registries.

It can take a tedious number tries to deploy and update image refs until a public chart works.

What alternatives do we have? Is there a better way?

Some Possible Solutions

  1. Override the registry in the chart values.
  2. Use a Mutating Admission Webhook to re-write the registry. For example, this Kyverno Replace Image Registry ClusterPolicy.
  3. Configure the private registry as a Registry Mirror in the container runtime.

Layer Diagram Showing OCI, Kubernetes and Helm Layers

When building a platform, we should ask ourselves “Can we solve this at a layer below?”

We’re learning about OCI registries, so let’s explore the OCI Registry and Runtime layer…

Configure The Container Runtime To Use A Registry Mirror

  1. In Containerd, we can Configure An OCI-Compliant Registry Mirror in /etc/containerd/certs.d/docker.io/hosts.toml.

If you’re using EKS 1.27, you’re using Containerd. There’s no alternative CRI.

  1. In k3d, we can configure Containerd to use a Resgitry Mirror by configuring registries.yaml in the cluster spec:

k3d-dev-private-use-registry-mirror.yaml

apiVersion: k3d.io/v1alpha5
kind: Simple
metadata:
  name: dev-private
servers: 1
agents: 0
options:
  k3d:
    disableLoadbalancer: true
registries:
  use:
    - k3d-quay-io-mirror:5000
  config: | # define contents of the `registries.yaml` file (or reference a file); same as `--registry-config /path/to/config.yaml`
    mirrors:
      "quay.io":
        endpoint:
          - http://k3d-quay-io-mirror:5000
k3d cluster create \
  --config k3d-dev-private-use-registry-mirror.yaml \
  --volume $(pwd)/containerd.config.toml.tmpl:/var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl
INFO[0000] Using config file k3d-dev-private-use-registry-mirror.yaml (k3d.io/v1alpha5#simple) 
WARN[0000] No node filter specified                     
INFO[0000] Prep: Network                                
INFO[0000] Created network 'k3d-use-reg-mirror'         
INFO[0000] Created image volume k3d-use-reg-mirror-images 
INFO[0000] Starting new tools node...                   
INFO[0000] Starting Node 'k3d-use-reg-mirror-tools'     
INFO[0001] Creating node 'k3d-use-reg-mirror-server-0'  
INFO[0001] Using the k3d-tools node to gather environment information 
INFO[0001] HostIP: using network gateway 172.22.0.1 address 
INFO[0001] Starting cluster 'use-reg-mirror'            
INFO[0001] Starting servers...                          
INFO[0001] Starting Node 'k3d-use-reg-mirror-server-0'  
INFO[0004] All agents already running.                  
INFO[0004] All helpers already running.                 
INFO[0004] Injecting records for hostAliases (incl. host.k3d.internal) and for 2 network members into CoreDNS configmap... 
INFO[0007] Cluster 'use-reg-mirror' created successfully! 
INFO[0007] You can now use it like this:                
kubectl cluster-info

To test it, let’s create a simple pod. We’ll specify the tag, but omit the registry and repository.

kubectl run nginx \
  --image=nginx:stable
pod/nginx created

Describe the pod:

kubectl describe pod nginx
Containers:
  nginx:
    Container ID:   containerd://a6f171d5af552b144cec25504f4fb661015509185d0c064568b23ee55a04cfaf
    Image:          nginx:stable
    Image ID:       docker.io/library/nginx@sha256:a8281ce42034b078dc7d88a5bfe6d25d75956aad9abba75150798b90fa3d1010

We can see the Image ID normalises the given image ref. In particular, it prefixes with the default registry docker.io and default repository library.

Kubernetes Logs The Image Source As Dockerhub, But What Happened At The Container Runtime Layer?

Analyse the Containerd logs to find the request:

docker cp k3d-dev-private-server-0:/var/lib/rancher/k3s/agent/containerd/containerd.log - | less

Use / to search for the pattern nginx:

{"level":"info","msg":"PullImage \"nginx:stable\"","time":"2023-07-02T04:08:17.771504540Z"}
{"level":"debug","msg":"PullImage using normalized image ref: \"docker.io/library/nginx:stable\"","time":"2023-07-02T04:08:17.771521636Z"}
{"host":"docker-io-mirror:5000","level":"debug","msg":"resolving","time":"2023-07-02T04:08:17.778859386Z"}
{"host":"docker-io-mirror:5000","level":"debug","msg":"do request","request.header.accept":"application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*","request.header.user-agent":"containerd/v1.6.19-k3s1","request.method":"HEAD","time":"2023-07-02T04:08:17.778877611Z","url":"http://docker-io-mirror:5000/v2/library/nginx/manifests/stable?ns=docker.io"}

Containerd pulls from the Registry Mirror!

Notice this time, we didn’t need to specify the registry nor the repository explicitly. It did that transparently!

That’s useful!

You don’t wanna re-configure each image to come from your private OCI registry just to try the chart! That’s toil!

If you configure a registry mirror, you don’t have to!

Let’s see if nginx is there in the docker-io-mirror…

docker exec k3d-dev-private-server-0 \
  wget k3d-docker-io-mirror:5000/v2/_catalog -qO - | \
  jq
{
  "repositories": [
    "library/nginx",
    "rancher/klipper-helm",
    "rancher/klipper-lb",
    "rancher/local-path-provisioner",
    "rancher/mirrored-coredns-coredns",
    "rancher/mirrored-library-traefik",
    "rancher/mirrored-metrics-server",
    "rancher/mirrored-pause"
  ]
}

It is! Not only is nginx there, but all of the k3d images are there too!

They’re cached in our private registry mirror.

That should make it faster to spin up more k3d clusters 😁 !

Let’s Try Out This Other Helm Chart

Now the container runtime uses our private resgitry proxies as a Registry Mirror for dockerhub and quayio.

If you’re using helm charts, cert-manager is likely one of the leaves of your chart dependency tree.

Lets use cert-manager as an example…

Set Up The Helm Repo

helm repo add cert-manager https://charts.jetstack.io
"cert-manager" has been added to your repositories
helm repo update cert-manager 
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "cert-manager" chart repository
Update Complete. ⎈Happy Helming!⎈

Create The CRDs In The Cluster

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.12.3/cert-manager.crds.yaml
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created

Install The Chart

helm install cert-manager \
cert-manager/cert-manager \
--version v1.12.3 \
--debug

No need custom values files.

A moment later, its up and running!

kubectl get po
NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-cainjector-69b9bf685d-z8czg   1/1     Running   0          50s
cert-manager-697954868b-2p8bz              1/1     Running   0          50s
cert-manager-startupapicheck-nvcgz         1/1     Running   0          49s
cert-manager-webhook-59f46cd9c6-l585n      1/1     Running   0          50s

Containerd pulled the images from the mirror.

Let’s inspect all image refs in the chart’s default values:

helm inspect values cert-manager/cert-manager | grep -i image: -A1
image:
  repository: quay.io/jetstack/cert-manager-controller
--
  image:
    repository: quay.io/jetstack/cert-manager-webhook
--
  image:
    repository: quay.io/jetstack/cert-manager-cainjector
--
  image:
    repository: quay.io/jetstack/cert-manager-acmesolver
--
  image:
    repository: quay.io/jetstack/cert-manager-ctl

By default, cert-manager is pulling images from quay.io.

We didn’t change anything. Helm just installed the Chart and pods started running exactly as you’d expect. Exactly the same as if the cluster could pull from the public Internet.

Open For Experimentation

Charts often have more than one container image. They may have one or more subcharts. They may package CRDs, which themselves reference container images.

The problem of discovering images required by a chart is acknowledged in the Helm Improvement Proposal 0015.

If we configure our private OCI registry as a Registry Mirror, Containerd resolves public images to our private registry. Next time we wanna experiment with a public helm chart on our cluster, devs don’t need to change any image refs!

That’s useful! An improvement in Usability.

Less friction for Aisha and platform engineers.

For Development and Test workflows, we wanna be open for experimentation so that we can get fast feedback on our ideas.

We can enable that, while still complying with standard registry policy, by configuring Registry Mirrors in the CRI.


Creative Commons Licence
Except where otherwise noted, all original content by Douglas Hellinger is licensed under a Creative Commons Attribution 4.0 International License.