The client is a large non-bank mortgage servicer based in the US. The company has a sophisticated data architecture involving transactional message data and they required support normalizing this data to enable BI reporting. Blue Orange was brought in to define the data model and architecture and begin the initial implementation of the data warehouse.
Using our Sprint 0 approach (a 2-3 week fixed rate engagement to determine the data architecture and project estimation followed by a T&M implementation), the Blue Orange Team met with client stakeholders to define architectural and reporting requirements. At the conclusion of the Sprint 0, we proposed two potential data architectures with development and resourcing timeline estimations. Blue Orange was engaged to further develop these recommendations and provide consulting expertise on the ideal solution to meet the client’s business needs.
The client was seeking to normalize their transactional message store JSON data to improve reporting functionality. The company required support and guidance for defining the data model and architecture as well as implementing a data warehouse to enable BI reporting on this data.
In order to fulfill these requirements, the Blue Orange Team provided the following two-phase project plan to the client to fully address their data architecture and BI needs:
- Sprint 0: This 2 week phase would be used to develop the solutions architecture, schema, and project roadmap. The Sprint 0 Team would consist of 1 Solutions Architect, 1 Data Engineer, and 1 Technical Project Manager. Upon client approval, Blue Orange would move directly into development of the proposed solution.
- Implementation: The Implementation Team would develop the proposed solution and ensure normalized message store data is accessible to the client’s team for report development. This implementation would not include any report development for the selected BI tool. The Provisional Team would consist of 2 Data Engineers, a part-time Solutions Architect, and a part-time Technical Project Manager. The actual team size would be determined based on the Sprint 0 discovery findings and approval from the client stakeholders.
Upon approval of the two-phase project plan, the Blue Orange Team was able to design a solution methodology for the client’s team that would include a status evaluation, alternative data architectures, and technical requirements that would meet database needs. There were two main technical requirements to be met:
- Ingest a Data Stream of operational data on a OLTP database set, into a OLAP Data Warehouse to be used as the main datasource for BI analytical use cases.
- Process JSON data records to be denormalized into the Data Warehouse.
Along with these requirements, there were two main operational databases and data sources that needed to be considered as well. These were categorized into the following:
- Servicing Cluster
- Servicing message store.
- Lending message store.
- RedIQ message store.
- White Label message store.
- Polaris Cluster
- Common message store.
- Financial Processing message store.
The data sources, databases, and technical requirements were all factored into the Current Status Evaluation of the client’s data environment. Their message system architecture featured a centralized gateway that performed the http message processing and queue management of requests coming from applications or services/components. All the messages were stored in one of the Postgress databases available in one of the two clusters, Servicing or Polaris.
All Databases were designed as a narrow and tall structure, in which no data is deleted or updated, only appended. There is also an Aggregator component that processed a specific set of messages and stores to be consolidated into a Read Only Database, that could be then queried by an application or a dashboard.
Based on the Current Status Evaluation, the Blue Orange Team was able to offer a proposed solution to enhance the client’s data architecture. The recommendations were composed of two alternatives to choose from that included Alternative 1: AWS and Alternative 2: PostgresSQL Read Replica. Each of these was accompanied with a full description and use model to demonstrate their applicability:
Alternative 1: Redshift
Within the AWS environment, Redshift was the direct choice for the Data Warehouse platform. It supported the two main technical requirements and complemented the AWS ETL (Extract Transform Load) tools that would support the data stream processing requirement. Once the raw data was in the Staging area of the DW, the data should be denormalized in analytical tables.
For the ETL process, the recommendation was to do an EL to a Data Lake using one of the AWS Tools, either AWS Glue or Kinesis, depending on the required cadence of the data. For a more streamed approach, Kinesis would be the tool of choice and for a more paced approach, AWS Glue would be deployed. For both cases, Airflow could be used as the orchestration coordinator. Once the raw data arrived at the DW staging area, the Transformation of it, especially the JSON records, could be processed using DBT.
Alternative 2: PostgresSQL Read Replica
Instead of using a full-blown Dataware product like Redshift, a Postgress read replica was offered as an alternative. For the client’s volume of data process, it would provide the required processing power, support the business use case, and decouple the OLTP databases. The ETL process was also extremely simplified for implementation as a Read only replication. Finally, the use of Postgress as the OLTP database was beneficial due to the widespread experience and knowledge of the client’s team with this application.
The client employed a sophisticated data architecture that required support and guidance for normalizing the data to enable BI reporting. Blue Orange was able to define the data model and architecture and begin the initial implementation of the data warehouse by detailing three alternative Data Warehouses that would support their technical requirements.
Our two-phase project management plan ensured that the client team was provided with holistic recommendations for estimation, implementation, and resourcing for whichever data architecture they chose. This highly curated process is applicable across industries and sectors and if your business requires these consulting services you can contact our team today to discuss your project. If you are interested in reading further about our additional client success stories, you can view our full Case Studies listing here.