Cloudera to Snowflake/AWS migration using CI/CD

Cloudera to Snowflake/AWS migration using CI/CD


Technologies: AWS, CI/CD, Impala, Jenkins, Python, Snowflake, SQL

Industry: Adtech


In a rapidly evolving digital landscape, a major ad-tech client faced a critical challenge - migrating an extensive inventory of systems, databases, and reports within a tight timeframe of three months. This ambitious undertaking also involved updating processes to seamlessly integrate with their existing Development CI/CD pipeline. This article delves into the context, challenges, and Enroute's innovative solution that not only met the deadline but also brought about significant improvements in efficiency.

Context & Challenges

The client's core business focused on extracting, analyzing, and presenting billions of online advertisement records every hour. Operating on Cloudera as their data infrastructure proved to be inefficient and costly, prompting the decision to transition to a Snowflake and AWS data architecture. This shift required meticulous coordination with the existing CI/CD pipeline to ensure the rapid and reliable deployment of code.

During the inventory analysis of data sources, applications, and reports, it became evident that legacy technologies and inefficient processes were in use. To mitigate risks, the initial strategy was to migrate everything "as-is" to prioritize the deprecation of Cloudera. Subsequently, the team would focus on enhancing and re-architecting pipelines for increased efficiency.

The Goal

Enroute set an ambitious goal - to migrate all data sources, reports, code, and dependencies to the new architecture before March 2020, thereby deprecating Cloudera. To achieve this, a multidisciplinary team comprising Enroute engineers specializing in Data, DevOps, and QA, adopted the Scrum methodology, delivering tangible results through bi-weekly sprints.

How We Did It

  1. Jenkins for PaC (Pipeline as Code): The team leveraged Jenkins for Pipeline as Code, seamlessly integrating it with Jira for efficient ticket tracking. Slack was employed to monitor the pipeline lifecycle, notifying users at different stages, such as PR/Deployment approvals.
  2. Infrastructure as Code (IaC) Templates: Enroute employed IaC templates to create environments for testing, ensuring a standardized and reproducible testing environment.
  3. Automated Testing: To streamline the testing process, automated scripts were developed and maintained in a regression suite. These scripts ran in various environments, providing comprehensive test coverage and accelerating the testing phase.
  4. Re-Architecture when Necessary: Recognizing disparities between Impala and Snowflake, some pipelines required re-architecture to ensure seamless integration and optimal performance.

Enroute's successful execution of the data migration project met the client's stringent deadline and ushered in a new era of efficiency. The strategic combination of Scrum methodology, innovative tools like Jenkins, IaC templates, and automated testing showcased Enroute's expertise in navigating complex technological transitions. This case study stands as a testament to Enroute's commitment to delivering results that go beyond expectations in the ever-evolving landscape of ad-tech data management.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics