Disaster Recovery for Kubernetes

Disaster Recovery for Kubernetes

A lot of businesses are in the process of modernizing their applications as a part of their digital transformation strategies and the adoption of Kubernetes is a big part of this trend. As businesses have begun migrating toward containers and micros-service type applications, Kubernetes has been quickly becoming the defacto standard for container management and automation.

What is Kubernetes?

Kubernetes provides a platform for describing, running and managing applications that consist of multiple containers. Kubernetes is capable of scaling applications as well as protecting containers from failure by automatically restating them if they fail. Kubernetes is supported by almost all cloud platforms including AWS, Azure and Google as well as being able to run on-premise.

A survey by Portwrox showed that 89% of enterprise level businesses said they expect Kubernetes to play a larger role in the management of their infrastructure over the next 2-3 years.

Backing up Kubernetes

Although Kubernetes itself provides application/container level failover, that doesn’t mean you don’t need to worry about data protection, backups and disaster recovery (DR) for Kubernetes applications. While Kubernetes can protect the containers that are running from failure, for DR you still need to ensure that the platform that’s running Kubernetes is also protected.

Kubernetes nodes typically make use of virtual machines (VM) which run on some type of hardware platform that’s either hosted in the cloud or on-premises. That platform needs to be protected – even if it’s in the cloud. For example, late last year in December 2021, Amazon experienced multiple outages in their US-EAST region that affected a wide number of businesses and high profile services. An event like a regional outage would certainly take down your Kubernetes app if it were running in that region.

So how do you go about protecting Kubernetes apps for cloud outages and other events that could disrupt the service of the underlying platform? Kubernetes, despite all of its benefits, has some shortcomings. The main one is complexity. Kubernetes workloads can consist of dozens or even hundreds of containers. Plus, the Kubernetes platform has a number of working parts that need to be protected. The most important Kubernetes components are:

  • etcd — Kubernetes cluster database
  • statefile — Cluster configuration
  • Cluster configuration file — Cluster configuration
  • Certificates — Cluster authentication
  • Persistent storage — Stateful apps like databases
  • Images – Container images used by app components

Recovery Time Objective and Recovery Point Objective

The first step toward Kubernetes DR is make sure these vital components are all backed up and able to be restored. Next, you need to consider the RTO (Recovery Time Objective) and RPO (Recovery Point Objective) for your Kubernetes applications. The protective measures you take depend on the nature of the application. Mission critical apps require lower RTOs – in some cases even zero downtime – than apps that aren’t as important.

For mission critical Kubernetes apps in the cloud you need to ensure that you have multi-region replication enabled. This will provide protection even if an entire cloud region has an outage. All of the major cloud providers have this capability although it does come at an additional cost. For Kubernetes clusters running locally, the DR protection for mission critical apps would be very much like protecting your virtualized infrastructure where you use replication to update a hot DR site that’s either in a separate location or in the cloud.

In addition, for complete DR protection you also need to be sure that you can restore your Active Directory (AD) or other authentication infrastructure as well as the network Domain Name System (DNS) functionality which may or may not be provided as a part of your Kubernetes clusters.

Testing your Kubernetes disaster recovery plan

As you can see, while Kubernetes does offer its own failover/failback protection for running containers that doesn’t mean that these systems don’t need a DR strategy. And course to really know your Kubernetes DR plan is working you need to periodically test it. If you’re not sure about how to go about creating a DR plan for Kubernetes there are also a number of third party tools that are designed to provide Kubernetes DR.