OpenLineage and Airflow Simplify Data Lineage
The GDPR (General Data Protection Regulation), asks organizations to implement data lineage for a clear understanding of the data used...
Migrated a locally developed Markov model, used in student loan analysis, from R Studio IDE to AWS cloud, increasing access, adaptability, and collaboration.
The client Varde Partners LP operates as an investment management firm and is a leading global alternative investment adviser. Founded in 1993, the firm has invested more than $68 billion since inception and manages over $14 billion on behalf of a global investor base.
Their goal is to seek opportunities in less competitive markets making large amounts of financial data and information critical to their success. Only high-performing, accurate financial models can help them understand the drivers of performance and give senior management a holistic financial view of the Firm’s performance.
In one particular project to analyze financial Markov models for student loans, the client relied on in-house experts to develop the models, but they weren’t easily accessible, adaptable, nor collaborative for other team members. Their job was performed in a local development environment, for which the RStudio IDE was chosen. The data used in model training was retrieved manually from PowerBI. The codebase consisted of custom-made scripts, which could not easily be reused nor adapted to other similar problems.
Blue Orange Digital identified a set of challenges arising from their approach to model development:
All in all, the limitations of the local development environment did not serve the long-term goals of the client. With the purpose to develop multiple models for different applications, the customer needed a way to improve access to model development and to make their results available to a variety of stakeholders. Employees, executives, and even external financial advisors required access to the models and their results.
Blue Orange Digital planned and implemented a migration plan that ported a locally developed financial model to the AWS cloud. Like this, anyone in the company could be granted access to the model, join in its development, and have easier access to its results.
Our team of engineers broke down the problem into the following steps:
1 . Refactored and optimized a locally developed student loan financial model
The existing code base had originally been written in R, which was less than the ideal choice for the task at hand. Our engineers reviewed, refactored, and optimized the existing code base. They identified hot spots, removed anti-patterns, and brought the code base up-to-speed with current best practices from the industry.
The effects of our work could be seen immediately: after refactoring, the model could run 12x times faster compared to its initial inference speed.
2 . Migrated the local codebase to a cloud-based notebook environment
Our team deployed the optimized model to cloud-based notebook environments managed with AWS. For this, they relied on the SageMaker tool, which they used to deploy, manage, and customize Notebook instances according to the client’s needs.
The use of notebook instances was crucial in enabling the client and their team to easily collaborate on future model development efforts.
3 . Created a guided model development flow
The Blue Orange team architected and implemented a model development flow based on a predefined notebook template and for the ability to manage multiple notebook instances in mind. By leveraging ipywidgets and the flexibility of Jupyter notebooks, they implemented an interface that allows users to create custom notebooks. This resulted in a “guided model development flow” that also enabled non-technical employees to duplicate, modify, and interact with their own models, in their own notebooks. The non-technical staff could “copy and paste” a model and adapt it as necessary to run a comparative analysis without circling back to their developers.
On top of that, our team set up a git-based version control system, which made model versioning possible. Not only has that made it easier to collaborate on notebooks, but also to keep track of changes over time.
4 . Provided technical support
Our engineers have also tackled the accessibility and security issues of the implemented solution. Given that access to model development was required on a global level, they made sure to set up user permissions according to the client’s needs. Like this, employees from different countries could manage their own notebook instances, running in isolation from all other instances.
Our team also provided the client with the documentation for the new cloud-based development environment. This included industry best practices, guidelines on model development, as well as an introduction to SageMaker-specific tooling. This enabled the client’s development team to self-manage and make the most out of their new cloud-based model development flow.
The solution provided by Blue Orange Digital laid the foundation for further development of financial models. With the cloud-based notebook environment in place, model development can now engage the whole company.
Notebooks enable user-friendliness
Reduced model inference times
Improved Model Accessibility and Data Security
While our client’s challenges are by no means unique in the financial sector, it took the vast expertise of our team to turn those challenges into opportunities. They carefully took into account existing infrastructure constraints, the availability of modern tooling, and the needs of the in-house experts. The solution they provided has made a significant impact on the model building process and has laid the foundation for future machine learning and analytics workflows.