7 Tips for a Strong Data Infrastructure
Everything you want to do when it comes to analytics – from the advanced stuff, like data science and machine learning, to the basics – hinges on a solid data infrastructure.
In this blog, we provide 7 tips from our experience that will help ensure your data infrastructure supports all of your current and future analytics needs. This is not an exhaustive or sequential list, but rather ideas that we have seen have help our clients.
1. Start From the Beginning: Define Your Data and Analytics Strategy
Before you tackle any kind of BI project, consider questions like: Do you have a data and analytics strategy? What is your company’s overall corporate strategy? What is the business reason behind the need for analytics? You need to define what technology, processes, and people to put in place so you can meet your analytics goals.
Our approach to helping our clients define their data and analytics strategy consists of 4 main steps:
- Understand Your Vision – what is the long-term analytics vision, and how does it fit into your overall business strategy?
- Capture Your Current State – this includes interviewing stakeholders, evaluating data sources, and reviewing technologies
- Develop an Analytics Plan – this is a detailed plan that maps out where you want to go and provides a plan to fill in the gaps.
- Deliver Results – at Analytics8, we deliver in a phased approach with short 6-8 cycles so that our clients can provide feedback throughout the process and see results along the way.
How to Get Started:
If you don’t have a well-defined strategy, start making one. A couple of approachable things that anyone could start now include:
- Talk to the business and gather requirements: Instead of asking what they need, ask them to “show you”, then document findings.
- Start making list of sources systems. Interview business to understand source systems and which departments use them.
Learn more about our data strategy services.
2. Prioritize Your Projects
This is a given, but without prioritization, your projects may take turns you never intended. Well-communicated priorities help align projects and programs to its strategies.
Why Prioritize?
- Increases the success rates of strategic projects
- Increases the alignment and focus of management around strategic goals
- Clears doubts for the operational teams when faced with decisions
- Builds an execution mindset and culture
How to Get Started:
Use the Prioritization Matrix
Align each of your analytics activities with your overall corporate goals, then determine the technical feasibility of each.
- Talk to the business and gather requirements and identify KPIs
- Work with users to assign business value and technical feasibility for each use case
- Plot on chart and determine what projects you should start on first
3. Evaluate Environments
Where within your technology stack do you need the setup of the environment? Consider how you move data through the stack. The whole system will run smoother if this is set up well. Some things you should start documenting when evaluating your environments include:
- Security setup considerations
- Data load/storage strategy
- Architecture diagram
- Change management strategy
How to Get Started:
Ensure your environment is set up thoughtfully.
- Look for redundancies: Make sure your system is efficient
- Evaluate your environment: Consider what’s best for your organization (on premises versus the cloud, etc)
- Do you need multiple environments? Do you have Dev, QA, and Prod environments, or is that overkill?
- Refresh data: If you have dev source systems, need to ensure the data is refreshed so you have good data to work with
Read more about our data architecture services.
4. Build a Flexible Data Model
A data model creates the structure the data lives in, and a thoughtfully created model enables flexibility and ease of use. It also defines how things are labeled and organized which determines how your data can and will be used and ultimately what story that information will tell. Finally, a data model helps define the problem, enabling you to consider different approaches and choose the best one.
Example Data Models:
Tools like Qlik, Tableau, PowerBI can help you get better access to your data so you can make better decisions. HOWEVER, if you don’t build a relational data model the solution is not built for the future.
Relational Data Models and Why You Need One
Tools like Qlik, Tableau, PowerBI can help you get better access to your data to better make decisions, but if you don’t build a relational data model, the solution isn’t sustainable.
Why you need a data warehouse:
- No need to access data sources separately and cuts down on data prep
- Automatically integrates disparate data sources along with common attributes
- A [good] data warehouse is designed to be understood by a human, not a computer program
- Reduces the time to analyze data, gives you confidence in your data, invokes higher quality insights, and provides better data security
- Allows for data governance and prevents “wild west” data analysis
How to Get Started:
Use the Bus Matrix. The Bus Matrix contains all of the different core business processes that you’re trying to model along with the common dimensions which is how you will slice the data. It will provide a top-down strategic perspective to ensure data in the data warehouse environment can be integrated across the enterprise, while agile bottom-up delivery occurs by focusing on a single business process at a time.
Bus Matrix Example:
5. Document Data Lineage
This one is boring, but necessary. Without the knowledge of how your data goes from origination to its destination, you could end up rebuilding things later. When you document your data lineage, you’ll be able to:
- Get knowledge about what data is available, its quality, and correctness
- Get knowledge from the head of the ETL developer
- Have more transparency about what’s going on with your data
- Give business users more detail about what they’re using in their reports
- Understand the impact of changes made on a source system
How to Get Started:
Build an ETL mapping document. This is a visual of your existing data flow and lineage, including sources and data dependencies, such as revenue. Doing this step during development will save you so much time later on – trust us on this one!
ETL Mapping Document Example:
6. Step Back and Assess Performance
You’ll want to consider performance needs for both front-end user experience and backend infrastructure. Taking time to do this doing the development process will help ensure optimal performance.
Here are some questions you can ask when assessing performance.
User Experience:
- How long does it take to run reports?
- What factors are affecting performance?
- Are those services really too expensive?
Backend Performance:
- How often does the data need to be refreshed?
- Are you using incremental loads?
- Are you uploading data that nobody uses?
- How is ETL performance?
How to Get Started:
Again, start documenting the current state, both the front-end user experience and the backend infrastructure performance. Capture performance benchmarks, assess factors impacting performance, establish SLAs, and identify areas for improvement.
7. Implement a Data Governance Program
With a properly implemented data governance program, you can gain consistency, get faster time to delivery, lower your maintenance needs, get more quality data, increase user adoption, and a whole lot more. It’s a critical piece to your data and analytics solution, but one that is often overlooked.
How to Get Started:
We’ve identified 8 steps to implement a Data Governance Program. Read more about these 8 steps.
How to Implement a Data Governance Program:
A key point we’d like to highlight: a grass-roots data governance movement will not work. For your data governance program to be successful, you’ll need buy-in from the top and it needs to be championed across the organization. If your team isn’t motivated by the follow the processes laid out, your plan won’t provide its potential benefits.
Starting with the first step, figure out who will be leading the way. You want a leader who looks at data as an asset.