
I walked into my first PE portfolio company engagement three months after close. The deal team had flagged "data infrastructure" as a value creation lever in the investment thesis. What I found was a Redshift cluster running queries from 2019, a Tableau Server that three people had login credentials for, and an S3 bucket structure that could only be described as archaeological.
No one was to blame. The company had grown fast, the data team was two people who had been doing their best with what they had, and the platform decisions made sense at the time they were made. But the thesis said "data-driven decision making in 90 days," and the clock started the day I showed up.
I have now done this three times. Each time, the specifics were different but the pattern was the same. Here is what the 90 days actually look like when you are the person doing the work.
Weeks 1-2: Figure out what you actually have
The instinct is to start building. Resist it.
I spent my first two weeks on every engagement doing four things, and nothing else.
Catalog everything that exists. Not what the architecture diagram says exists. What actually runs. I pull every scheduled job, every cron tab, every Airflow DAG, every dbt model, every stored procedure. I check what ran in the last 30 days versus what ran in the last 90. The delta between those two lists tells you what is alive and what is zombie infrastructure.
Map the data consumers. Who uses what data, and how? I sit with every team that touches analytics, reporting, or models. Finance has a spreadsheet they download from Redshift every Monday. Marketing runs a Looker dashboard that breaks every time someone changes a column name. The ML team has a Jupyter notebook that reads from a staging table that no one remembers creating.
This map is the single most important artifact you will create. Everything else flows from understanding who depends on what.
Audit the cost structure. Pull the cloud bills for the last six months. I have found $8,000/month Snowflake warehouses running auto-resume with no suspend policy. I have found Databricks clusters that were "always on" because someone set the auto-termination to 720 minutes during a demo and forgot about it. Quick wins hide in the billing console.
Document the tribal knowledge. The two-person data team knows things that are not written down anywhere. Which tables are trustworthy. Which pipelines break on month-end. Why that one column is called "revenue_final_v3_FIXED." Get this into a shared document before anyone leaves or gets reorganized.
By the end of week 2, you should have a one-page summary that answers: what data exists, who uses it, what it costs, and what breaks regularly. If you cannot write that page, you are not done with assessment.
Weeks 3-6: Win fast, win visibly
The operating partners are watching. They need to see that the data investment is producing results before the first board meeting. You need quick wins that are real, not cosmetic.
Here is what I prioritize in this window.
Kill the cost waste first. This is the fastest path to a number you can put in a slide. Right-size the compute. Set auto-suspend policies. Archive cold data to cheaper storage tiers. On my last engagement, I cut the monthly Snowflake bill from $14,000 to $6,200 in the first three weeks. That is not optimization wizardry. That is turning off things that should not have been running.
Fix the one report everyone complains about. Every company has one. The revenue dashboard that is always wrong. The pipeline report that takes 45 minutes to load. The forecast model that no one trusts. Pick the one that causes the most pain for the most senior people, and fix it properly. Not a patch. A rebuild with tested transformations and a clear data lineage.
This is not just good engineering. It is strategic. When the CFO's report works correctly for the first time in six months, you have a champion. You need champions for weeks 7-12.
Stand up a modern transformation layer. If they do not have dbt or something equivalent, this is the time. I set up dbt Core (or dbt Cloud if the team prefers managed infrastructure), connect it to version control, and migrate the three most critical transformation pipelines. Not all of them. Three. The ones that feed the reports you just fixed.
Why three? Because three is enough to prove the pattern works, build team confidence, and create templates for the rest. Trying to migrate everything at once is how these efforts stall in week 5.
Establish a single source of truth for metrics. Pick five metrics that the leadership team cares about. Revenue, churn, CAC, pipeline value, whatever matters for this specific business. Define them once, in code, with tests. Make them accessible in one place. This sounds basic because it is. Most portfolio companies do not have it.
Weeks 7-12: Build the foundation
This is where the real work starts. The quick wins bought you credibility and budget. Now you build the thing that lasts.
Pick your platform and commit. Databricks or Snowflake. Maybe both if the use cases genuinely warrant it. But make the decision based on the actual workload, not the sales pitch. I have seen companies waste four weeks evaluating platforms when the answer was obvious from the data profile. If your workload is 80% SQL analytics and reporting, Snowflake is probably the right call. If you have significant ML workloads and streaming data, Databricks is worth the setup cost. If you inherited one and it is working, do not switch just because it is not what you would have chosen.
Implement data governance from the start. This is the part I got wrong on my first engagement. I treated governance as a phase-two problem. It is not. Set up Unity Catalog or Snowflake's access controls in week 7, not week 20. Define who can see what. Tag PII columns. Create a service account strategy. The PE firm will ask about data governance at the first portfolio review, and "we are planning to address that next quarter" is not the answer they want.
Build the orchestration layer. Replace the cron jobs and manual triggers with proper orchestration. Airflow, Dagster, or Prefect, depending on the team's Python comfort level and the complexity of the dependency graph. I lean toward Dagster for newer setups because the asset-based model maps well to how portfolio companies think about their data, but I have seen all three work well when set up properly.
Create the monitoring and alerting baseline. Data quality checks on every critical pipeline. Freshness alerts. Schema change detection. Cost anomaly alerts on the cloud billing. I use dbt tests for data quality and simple Slack alerts for freshness. Nothing fancy. The goal is that when something breaks at 2 AM, someone knows about it before the CFO opens their dashboard at 8 AM.
Document everything you built. Not as an afterthought. As you go. A README in every dbt project. A runbook for every pipeline. An architecture diagram that reflects reality, not aspiration. The person who takes over this stack after you, or the person you hire to run it full-time, should be able to understand every decision you made and why.
What I wish someone had told me
The 90-day timeline is real, but the work does not end at day 90. What you are building in those three months is the foundation. The team will spend the next two quarters building on top of it.
Do not try to be perfect. Try to be deliberate. Every decision you make in those 90 days will be load-bearing for the next 18 months.
The biggest risk is not picking the wrong tool. It is spending so long evaluating tools that you run out of time to build. Pick, commit, iterate. The PE operating partner does not care whether you chose Databricks or Snowflake. They care whether the portfolio company can answer basic business questions with data that everyone trusts.
And if you are walking into one of these engagements for the first time: the architecture diagram they showed you in diligence is fiction. Plan accordingly.
Has anyone else done one of these 90-day resets at a PE portfolio company? I am curious what your weeks 1-2 looked like compared to mine, especially if you inherited a stack that was further along than "Redshift cluster and prayers."
PE-Grade Data & AI
Assessment Platform
Blueprint gives operating partners a clear, benchmarked view of data and AI readiness across portfolio companies—in days, not months. Start with a free self-service questionnaire or connect environments for automated infrastructure scanning.
Explore Blueprint
Blueprint Assess
Self-service questionnaire for rapid portfolio triage
- 10-minute guided assessment
- Benchmarked maturity scores across 6 dimensions
- Prioritized recommendations with estimated ROI
- No environment access required
- Shareable PDF report for deal teams
Blueprint Scan
Automated read-only infrastructure scanner
- Connects to Databricks, Snowflake & Azure Fabric
- SOC 2 Type II & ISO 27001 (pending)
- Zero data movement — read-only metadata analysis
- Cost optimization & architecture recommendations
- Deployment-ready modernization roadmaps