Onboarding Phase: Converting a Transaction Data Mart to an ADF infrastructure

admin

Posted On:
May 23, 2022

Introduction

The client is a software company that develops screen capture solutions for customers who want to create and share images and videos for better training, tutorials, lessons, and everyday communication.  

They reached out to Blue Orange for an Onboarding Phase engagement to document discoveries and scope for converting their SSIS in the Sales Transaction Data Mart to an ADF.  Based on the client’s data infrastructure, Blue Orange would develop a streamlined data solution plan with managed tool-sets for the client’s data sources and long-term business needs.  This way their internal data team would be able to automate deployments and operate faster version control to expedite workflows and save time with troubleshooting.

The Challenge

The client expressed to the Blue Orange team that their data management studio was overly complex and involved deprecated technology so they wanted to migrate to a cloud-based solution, like Azure Data Factory.  After Blue Orange experts evaluated their current data environment, our team was able to identify the following issues:

  • The client was using Azure Data Catalog for Data Governance but the tool was deprecated and had been replaced by a more featured and robust solution in Azure Purview.
    • The main drawbacks of using the Azure Data Catalog were:
      • The information was not uploaded automatically whenever a data source changed.
      • There was no tag autocompletion so tagging was prone to human errors.
  • R scripts were being executed on local machines that were less powerful than cloud ready services. 
  • There was no fully automated version control system in place to help manage all the different developer’s code.

Based on these findings, Blue Orange recommended a six-layer model for building a modern Data Warehouse for them to automate their data processes. The six layers of the client’s data platform would include:

The Solution

To ensure these data platform layers are met, the Blue Orange team suggested the following:

  • Separate environments for development, testing and production, this way features or changes can be tested and evaluated before affecting production. 
  • Separating environments would also simplify automating deployments and allow for easier scaling of systems, since deployment and recreation of environments can be automated completely.
  • Code version control to allow for faster security so that changes would be forced to be reviewed and new developers could join the team with less anxiety over possible errors at the beginning.

Blue Orange  recommended  to build a complete re-engineering of all SSIS jobs into ADF to fully use all the powerful capabilities natively available in Data Factory. This implementation would simplify the process flow with fewer steps and jobs and faster processing times. It would also make it easier to add up changes in the future and to extend and include more sources into the process.

Since the client was already working with DataLake Gen2 and PowerBI for data transformations, Blue Orange proposed to go all-in on the Data Lake concept and keep all the data and information within the lake until consumed. This would include removing the SQL server and ingesting the raw data directly into the DLG2 storage, and then using PowerBI to do all the transformations needed.

The client also utilized R scripts with custom packages that they wanted to be moved to the Azure cloud. Blue Orange’s proposed solution consisted of creating an Azure Data Factory pipeline that would execute a Databricks notebook against the Databricks jobs cluster. This would centralize all data pipelines to create one source of truth for their data team.

Conclusion

Through the Onboarding Phase engagement, Blue Orange experts were able to develop a streamlined data solution plan with managed tool-sets for the client’s data sources and long-term business needs.  This plan included key data infrastructure for a data factory pipeline that gets triggered on a schedule and a Databricks R notebook that would be in charge of running all data transformations to automate their data processes. Therefore, the client’s internal team would be able to operate faster version control to expedite workflows and save time with troubleshooting their data sources and code.  

To learn more about Blue Orange’s Onboarding Phase engagement and our Data Transformation capabilities, please contact our team.  If you are interested in reading more about our custom client solutions, you can view our full Case Studies listing here.

See where smart data management can take your business