Discovery: CloudNativePG for Kubernetes

May 20, 2024 · Part of my work at Carousell

Abstract: A research and discovery project to evaluate the feasibility of running production PostgreSQL databases on Kubernetes using the CloudNativePG operator.

Tech: #PostgreSQL #Kubernetes #CloudNativePG #SRE #High Availability

The Challenge

While our applications were moving to Kubernetes, our databases remained on traditional virtual machines. This created operational overhead and a disconnect between application and database management. The goal was to explore the possibility of running our PostgreSQL clusters directly on Kubernetes to unify our infrastructure and leverage cloud-native benefits.

The Evaluation

I led a discovery project to evaluate CloudNativePG, a leading Kubernetes operator for PostgreSQL. The primary benefits we aimed to achieve were:

Improved High Availability: Leveraging Kubernetes' self-healing capabilities for database failover.
Enhanced SRE Experience: Managing databases using familiar Kubernetes-native tools (kubectl) and declarative manifests.
Integrated Metrics: Gaining better observability with metrics that integrate seamlessly with our Prometheus monitoring stack.

I deployed a test cluster in a staging environment and performed a series of evaluations. While the initial results were promising in terms of deployment and native management, my testing on the backup-and-restore workflows and self-healing capabilities revealed that further, more rigorous testing would be required before we could confidently proceed to a production environment.

The Outcome

Based on my findings, we made the informed decision to pause a full production rollout for 2024. This project is a key example of technical due diligence, where a rigorous evaluation process prevents the premature adoption of a technology, ensuring that production stability and data integrity remain the top priorities.