Auto-Scraper for Trillion-Dollar Asset Management Firm

Josh Miramant

Posted On:
May 25, 2022

The Problem

A trillion-dollar asset manager was looking to automate their data collection and processing for an internal Client Services tool. Their Operations Analyst team spent multiple hours every day of Q4, manually checking 350+ asset manager websites (8,000 tickers) for newly posted capital gains. The challenge of this particular Robotic Process Automation (RPA) project revolved around the creation of a data scrape orchestration framework and a logic-based validation framework to ensure automated data accuracy checks.

Case Study

RPA Solution: Blue Orange implemented a custom data extraction framework using Selenium and Scrapy to collect and ingest over 300 distinct data sources.

Data Pipeline: Blue Orange managed, orchestrated, and ran the nightly project workload using Prefect. All pipeline errors, retries, and timeouts were managed via Prefect for high-fault tolerance and low dev-ops.

Rule-Based Validation Framework:  The Blue Orange developer team worked collaboratively with the Operations Analyst team on a daily basis to ensure that the web scraping spiders were extracting the desired data. This included more than 55 unique validation rules that were applied to the scrape.

Result: Blue Orange provided a web scraping tool that delivered daily data updates with computed delta logic and notifications. This will alleviate over 250 hours of manual input per person, per year.


Schedule a 15-min discovery call to get some advice on your project today.

For more on AI and technology trends, see Josh Miramant, CEO of Blue Orange Digital’s data-driven solutions for Supply ChainHealthcare Document Automation, and more.


Follow me on Twitter or LinkedIn. Check out my website

Josh Miramant- CEO
Josh Miramant- CEO

Josh Miramant is the CEO and founder of Blue Orange Digital, a data science and machine learning agency with offices in New York City and Washington DC.

Miramant is a popular speaker, futurist, and a strategic business & technology advisor to enterprise companies and startups. As an example of thought leadership, Miramant has been featured in IBM ThinkLeaders, Dell Technologies, Global Banking & Finance Review, the IoT Council of Europe, among others. He can be reached at contact@blueorange.digital.

Blue Orange Digital is recognized as a “Top AI Development and Consultant Agency,” by Clutch and YahooFinance, for innovations in predictive analytics, automation, and optimization with machine learning in NYC.

They help organizations optimize and automate their businesses, implement data-driven analytic techniques, and understand the implications of new technologies such as artificial intelligence, big data, and the Internet of Things.

Visit blueorange.digital for more information and Case Studies.

Main image source: Canva

See where smart data management can take your business