Creating a scalable and resilient Varnish cluster using Kubernetes

Recently I have been working on a project with a need for a scalable and resilient Varnish cluster. The interesting thing about such a project is the use of Kubernetes as a platform for our application stack. During the work on such project, I learned a lot of things, including the use of Go lang in Kubernetes controller as well as an understanding of how Varnish works.

What is Varnish?

Varnish is a layer of HTTP cache that caches requests mostly for anonymous users before they hit an application layer. Typically Varnish cache is stored in RAM, which helps to achieve higher performance. If all available memory is used for cache, the last used cache items will be purged.

The basic Varnish distribution is free and Open Source. HTTP cache works like depicted on the image below.

Traffic for the logged-in user or the one requesting dynamic content is not supposed to be cached, thus, it bypasses the Varnish caching layer and goes straight to the application service.

However, if content is supposed to be cached, Varnish checks if the corresponding item exists in the cache and returns it, otherwise the request is forwarded to the app service, and the result of it is cached and returned back to the user.

Usage of such architecture is rising up several questions:

  1. Can we eliminate Kubernetes service and let Varnish talk to app pods (backends) directly?
  2. Do we scale Varnish horizontally of vertically?
  3. How do we scale Varnish pods (frontends)?
  4. How do we shard cache if we have multiple Varnish pods?
  5. How do we flush cache?

Keep reading and you will find answers to these questions.

Kube-httpcache Controller

While I was trying to figure out answers to the questions above, I found an exciting Open Source project, kube-httpcache, which is a Varnish controller for Kubernetes. Since it is an Open Source project, I have significantly evolved it so it was able to handle all the features I needed and covered all the questions I had.

At that time out of the box, kube-httpcache, allowed to eliminate use a Kubernetes service for application, so Varnish was able to talk to backend pods directly as shown in the image below.

In this case, Varnish is aware of all the running backends, and routes traffic to them according to the algorithm set in VCL, a Varnish config file.

Every time a new backend pod is added, Varnish controller becomes aware of it updates Varnish configuration on the fly.

For this purpose, Go templating language is used in processing VCL template file. In the example below, a round robin algorithm is used to select a next backend pod.

Varnish scaling

There is a way you can scale Varnish vertically if your Kubernetes cluster supports this feature. However, this way is discouraged because there is a single point of failure. If your Varnish pod goes down, there won’t be any other pod to handle traffic right away. For this purpose, horizontal scaling is preferred. Also, horizontal scaling is easier to manage.

Horizontal does not need to be automatic, but could be manual instead. There are a couple of problems with fully automatic scaling as you will see later.

Horizontal scaling

I have been working on the feature that allows kube-httpcache controller not only to monitor backend pods but also to monitor frontend pods i.e., to be self-aware of own Varnish instances that are running as part of the cluster.

Since we have multiple Varnish instances, we can also shard cache across them. This article describes how to build a self-routing Varnish Cluster.

In this example, a user requests the resource, which is supposed to be cached by Varnish Frontend 1. However, traffic randomly goes to Varnish Frontend 2. The later frontend (2) determines with the help of the hashing algorithm that this resource is supposed to be cached by the earlier Varnish instance (1).

In case, if the resource is in the cache of Varnish Frontend 1, the cached result is returned to a user, otherwise, Varnish sends a request to one of the app backends, caches, and returns it.

The configuration for that looks like following.

The only downside of this approach is that when we change the number of Varnish pods, old hashes does not relate to new Varnish nodes. This is why autoscaling affects the performance significantly. Fortunately, there is a solution for that.

Consistent hashing

From the Varnish documentation, I figured out there is shard director which behaves similarly to hash director, except it uses consistent hash algorithm. The benefit of this algorithm is when new Varnish frontend is added, most of the old hashes are still related to their Varnish frontends, while a few hashes are associated with different Varnish frontends.

Consistent hashing is based on mapping each resource to a point on a ring. Shard director maps each available Varnish frontend to many pseudo-randomly distributed points on the same ring. To find a Varnish frontend with a cached resource, the shard director finds the location of that resource key on the ring; then walks around the ring until falling into the first Varnish frontend it encounters.

In this case configuration look like following.

Flushing cache

Both hash director and shard director can be used to flush a single resource, but what if we want to flush multiple resources tagged with the same tag?

In this case, we need to a pass flush signal to all of the Varnish frontends. For this purpose, there is a varnish signaller component built-in into kube-httpcache. Once you send a request to varnish signaller, it broadcasts it to all of the Varnish frontends.


As we saw, creating a scalable and resilient Varnish cluster requires knowledge about different aspects of Varnish, fortunately, kube-httpcache handles most of the work. Feel free to try this project and let me know what you think.

Software Developer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store