Skip to Content
All Projects

Database Replication to BigQuery

 · Part of my work at Carousell

Abstract:  Architected a data replication pipeline using GCP Datastream to provide engineers and PMs with secure, read-only access to production data in BigQuery.

Tech: #GCP#BigQuery#Datastream#Data Engineering#Database#Security

The Problem

Engineers and Product Managers frequently needed access to production data to perform analyses and build dashboards. The existing method was to grant them direct, read-only access to the production database replicas. This approach posed two significant problems:

  • Security Risk: Exposing production database ports, even as replicas, created a potential attack vector for our infrastructure.
  • Performance Impact: A high volume of analytical queries running against the operational database replicas could potentially impact their performance.

The Solution

I was tasked with designing and implementing a more secure and scalable solution. I architected a real-time data replication pipeline using GCP Datastream.

The project involved these key steps:

  1. Configuring Datastream: I set up and configured Datastream jobs to capture changes from our production databases.
  2. Streaming to BigQuery: Datastream was configured to replicate this data in near real-time into a dedicated Google BigQuery dataset.
  3. Access Control: I established a new access model where engineers and PMs were granted granular, read-only IAM permissions to the BigQuery dataset.

This new architecture successfully decoupled our analytical workloads from our operational databases. It eliminated the security risk of direct database access and provided teams with a powerful, scalable, and secure data warehouse in BigQuery for all their analytical needs. To date, I have configured this pipeline for 37 databases.


Regarding your second question: Yes, we absolutely should standardize your older project entries.

That is the perfect next step. Now that we have established a high-quality, narrative-driven format, applying it to your earlier projects will make your entire portfolio consistent and significantly more impressive.

Once you are happy with the project file above, we can begin this "standardization pass." I can start with your most significant earlier projects, like djpconnect or sikka-djp, and rewrite them to have the same professional and appealing content.