Discovery: CloudNativePG for Kubernetes
· Part of my work at Carousell
Abstract: A research and discovery project to evaluate the feasibility of running production PostgreSQL databases on Kubernetes using the CloudNativePG operator.
Tech: #PostgreSQL#Kubernetes#CloudNativePG#SRE#High Availability
The Challenge
While our applications were moving to Kubernetes, our databases remained on traditional virtual machines. This created operational overhead and a disconnect between application and database management. The goal was to explore the possibility of running our PostgreSQL clusters directly on Kubernetes to unify our infrastructure and leverage cloud-native benefits.
The Evaluation
I led a discovery project to evaluate CloudNativePG, a leading Kubernetes operator for PostgreSQL. The primary benefits we aimed to achieve were:
- Improved High Availability: Leveraging Kubernetes' self-healing capabilities for database failover.
- Enhanced SRE Experience: Managing databases using familiar Kubernetes-native tools (
kubectl
) and declarative manifests. - Integrated Metrics: Gaining better observability with metrics that integrate seamlessly with our Prometheus monitoring stack.
I deployed a test cluster in a staging environment and performed a series of evaluations. While the initial results were promising in terms of deployment and native management, my testing on the backup-and-restore workflows and self-healing capabilities revealed that further, more rigorous testing would be required before we could confidently proceed to a production environment.
The Outcome
Based on my findings, we made the informed decision to pause a full production rollout for 2024. This project is a key example of technical due diligence, where a rigorous evaluation process prevents the premature adoption of a technology, ensuring that production stability and data integrity remain the top priorities.