The Data Race Against The Pandemic, Together
How ML and data were crucial in fighting COVID-19 in 2020 as a united global community. 2020 was a year...
A Private Equity firm needed to predict quarterly pharmaceutical revenue for the next quarter in terms of doctors and pharmacies. The problem here is that the data contains a lot of seasonality because some doctors are not regular in their services, the same happens at the pharmacy level. The challenge is to create a robust model to overcome these gaps in the information.
Vertical: Forecast revenue
Model: Ensemble learning
Model: Time Series Analysis with FB Prophet
The ability to forecast demand across consumers is a fundamental aspect of the pharmaceuticals economy. But finding the right balance between supply and demand can be a real challenge, given the sheer amount of data involved, and the multitude of data sources that need to be interpreted altogether (e.g. seasonal medications and historical consumption data).
The biggest challenge was the incomplete data from doctors, pharmacies and other entities. The private equity firm wanted to predict the revenue for the next period of time, be that weekly, monthly, quarterly, yearly.
Machine learning algorithms have predictive superpowers that allow them to systematically process large amounts of consumer data. Modern cloud storage solutions make it possible to easily tackle heterogeneous data sources and make the models more and more accurate with time. The more data they are fed, the better the predictions are.
Accurate demand predictions improve all aspects of demand management. This leads to an optimized pharmaceutical flow between providers and consumers and to minimize costs on both sides.
Phase 1 of this project began by collecting data in as many forms as possible to build a complete picture. To improve our understanding of the demand for the products, we automated the ingestion of medical records, doctor’s handwritten notes, calendar information, patient doses, length of prescriptions, number of refills, etc. Next, we did the same thing from the pharmacy level, collecting information on limits, regulations, supply storage life cycles, storage capacity, etc.
We were looking to answer questions like:
The next challenge was how to manage the supply and distribution. Shipment orders needed to be automated and needed to fluctuate properly to meet the demand in different seasons. In order to tackle this problem, again we started with data collection. We collected delivery notes, quantities, locations, length of time until fulfillment, correlating weather conditions, and any other information we could get our hands-on. The beauty of machine learning and cloud storage is that you don’t have to be as picky about what information you input, even if you are sure how that information is valuable or impactful now, the results could be surprising. You might learn that you have a lot more control and influence over the market than you originally thought.
After the data is collected, it needs to be aggregated and made available for training in cloud storage locations that are optimized for quick access and real-time inference. The pharmaceuticals sector is particularly rich in data, but its convoluted nature also makes it hard to interpret. Preparing the data for machine learning is a task in and of its self, involving cleaning, labeling, organizing, etc. Luckily it can be automated. The work is worth the effort since data unification and storage optimization makes it easily accessible for business intelligence and visualization tools.
The next step was piecing together the layers of information using Machine Learning to find the trends in the data. Trends that weren’t visible to the naked eye are suddenly glaring gaps in the supply chain. Clear action steps could then be taken to modify any result. Any variable could be changed in the system to simulate different outcomes before experimenting with the real-world. Now they could get answers to, “Would more trucks or more storage capacity be more beneficial to meeting customer demands?”
A valuable side-effect of machine learning solutions is the new accessibility it offers. It is now possible to give multiple stakeholders, departments, and decision-makers, beyond the data science team, a clear picture of the company’s data that doesn’t need a manual to manipulate and interpret, if the right data visualization and reporting tools are employed. Your data engineers (or ours) can create systems that will use artificial intelligence to generate reports automatically. It will look at all the available data and display correlations and stats you weren’t privy to before. Many BI tools have smart sensing capabilities, meaning they will determine what kinds of data it is looking at and suggest specific pre-made displays to best visualize that data.
When business intelligence is enabled and available, organizations can be sure they are making informed decisions and have the correct overview of their operational aspects. When trying to balance out drug performance, competitive pricing schemes, side-effects risks, and supply chain efficiency, only advanced analytics can help find the right answer.
The approach was to utilize all six algorithms that AWS Forecast provided in 2019: npts, prophet, arima, ets, deeparp, and automl. On the other hand, experimentation on pure ML methods with Ensemble Learning was carried out. Finally, we integrated Prophet and LSTM.
Overfitting when using high dimensional representations is an extremely common problem. For example, if you just take Word2Vec embeddings and input them into a fully connected Neural Net, the model will massively overfit the training set and have abysmal validation set performance. Fundamentally, our choice of an LSTM helps address the more fundamental overfitting issues one would experience with a native model architecture.
To address overfitting while prototyping, we carefully monitored the difference in our training loss versus our validation loss.
We took two main approaches to ameliorate overfitting during the prototyping phase. We tuned the training dropout of our Mirrored LSTM. We reduced the number of trainable parameters in the model. We applied regularization to the Fully Connected Neural Net Layer. Introducing further data distortion and sparsity to just the training dataset.
At the pharmacies level, Ensemble Learning showed a better approximation. Conversely, at the doctor’s level, the Prophet library returned the best model. In all cases, the new models were outperforming the previous results for weekly, monthly and quarterly predictions. In all the scenarios, accounting for national holidays in different regions of the world was successfully addressed in the forecast.
Improving your supply chain is one way to improve your bottom line with data insights. You can also use dynamic pricing models to boost profits. The good news is that you don’t have to buy hardware to achieve these ends. You just need high-end data skills and the knowledge to ask the right questions. Contact Blue Orange Digital today to untap the opportunities in your supply chain.