Introducing the Hypothesis-driven Approach to Data Analysis

Let me introduce you to a nifty little approach I’ve been using to help keep analytics teams focused on the main goal and not get stuck down rabbit holes. I’ve borrowed this idea from the world of management consulting and tweaked it a little to work for analytics folks like us. I’ve seen this approach work wonders in aligning what the business is asking for and what the analytics team is cranking out. I hope this method works as well for you as it has for me. It’s changed the way my work is seen internally and has led to some of the best and biggest impact analysis I or my team have done

Hypothesis-driven analysis isn’t new or even fancy. It’s been used for decades in the consulting industry. Analytics folks like us don’t tend to pay it much attention because it can be seen as being too vague and general. But I think it’s worth a second deeper look. I believe that taking its framework and applying it to analysis helps keep analysts on track and ensures we don’t get focused on the model rather than the output. So, if you’ve ever found yourself presenting work you are proud of only to have a business stakeholder say, ‘That’s great but how do I use it’ or ‘Thanks for that but how does it address the business problem I asked about’, this approach could be for you.

 

One of the main reasons I love the hypothesis-driven approach is how collaborative and open it can be. As you’ll see there is opportunity to get feedback at every step. This means stakeholders can see exactly what the business’ problems are, and therefore value you are trying to bring to the business. They can also see and be part of generating ideas about what you should look at next and have say about what tasks should be done in what order. It means they can know what you are doing now and what you are planning to do next. Finally, it means that when they turn up with an unplanned request, you can take them to your hypothesis list and ask if they think the new request is more or less likely to address the main issue rather than what has already been worked out.

 

Ok, that’s enough gushing about how I love this approach – let’s get on with it then.

 

Start with the Business Problem/Question

The business problem is going to be the centre and focus of everything we do from here on. This will be the thing we are trying to solve and think about. Everything else we do and plan from here focuses on this business statement. I’ve listed 3 made up examples here:

 

1. Why isn’t the USA growing as fast as other parts of the world?

 

2. Should we expand into Africa?

 

3. Is digital marketing right for our product?

 

My experience has been that most of the time the question that arrives at the door of the analytics team is vague or seems to have missing context. In our made-up examples, you can assume there is some missing context that might be helpful. We can ask questions like:

 

1. Why do you think the USA isn’t growing as fast? Is there a number you’ve seen that would help me understand what you are thinking about?

 

2. Why has Africa been picked? Are there any other countries on the list for expansion?

 

3. Have we tried digital marketing already? What were the results so far? Are you pleased or disappointed with the results so far?

 

Spending a bit of time to understand the question is always worthwhile. Making sure you understand the business’ challenges is one of the most underrated skills of an analytics team. Too often us analytics folks want to jump into writing code and using a new technique we’ve learnt. I’m not going to spend much time on this section in this article, but one useful quick thing you can do is to tell another person in your team what you are working on and try to explain the business question you’ve been given. If you can clearly communicate the business question to another analyst and they don’t have any questions that leave you unsure, you might be ready to move on to hypothesis generation.

 

Hypothesis Generation

A hypothesis is an idea or concept that you can test to find out if it’s true or not. That’s a simple definition. They can get a lot more complex, but for our purpose this will do. You don’t have to believe the hypothesis – it just needs to be something you can investigate and find out if it’s true or not. I’m not going into the accept reject null hypothesis here – it doesn’t matter for our purpose. There are a lot of ways to write a good hypothesis – and I’m a big fan of doing it well – but in this case, it’s not going to be the most important thing to have a perfect hypothesis. The main thing you need is to have a statement you can prove or disprove.

 

We are going to take our business problem and come up with as many hypotheses as we can that could be reasons for, or explain, the business problem. Here are some made up examples:

 

Business problem hypothesis

 

At this stage, it doesn’t matter how good or bad the hypothesis is, the goal is to just get them written down. You’ll note that the five hypotheses I’ve listed aren’t great – and that’s ok. They might be the best you have right now. We aren’t looking for perfection here. We just want to get some ideas about why the issue might be happening. The next step is to rank the hypothesis based on which ones you think are the most likely to be the reasons out of the ones we’ve listed. Again, the hypothesis you have might not be great – but it’s a start and that’s what matters most right now. A better outcome might be that it starts the conversation about what else could be the reason for the business problem, and that means you’ll get some new hypotheses to add to the list and rank.

 

How you phrase the hypothesis will matter at some point. I’ll write about how to turn a business question into an analytics task soon. What you’ll find is your first attempt at the hypothesis might be hard to prove. This is because we just got the hypothesis ideas out of our head quickly. Now we need to check if they are actually something we can answer. For example, my hypothesis could be ‘Bigfoot causes 10% of all damage to our hire cars’. That could work but it’s a bit vague and I’m not sure how I’d go about proving it – would I have to prove Bigfoot is real to answer this hypothesis? Let’s rephrase it to be ‘10% of all car insurance claims are for damage caused by Bigfoot’. Ok I can work with that – I’m not trying to prove Bigfoot is real or that Bigfoot actually did it. I’m now trying to prove that of all the reasons listed for car damage, 10% is listed as being from Bigfoot’.

 

When we have our hypothesis, we can start to come up with ideas about how to answer it – how to prove/disprove the hypothesis. What we come up with at this point is the actual analytics task we will work on. Now is the time to brainstorm ideas about how to get data that would prove/disprove the hypothesis.

 

Some of your hypothesis will not be things you can answer today – maybe you don’t have the data, or it’s not actually something analytics can answer. Maybe the answer to a hypothesis will need to be handed to a different team like customer research for user testing. That’s ok – we are trying to answer a business problem and there is nothing wrong with an analytics team saying they can’t answer a question and having multiple parts of the business engaged with getting the answers.

 

Business problem hypothesis, who, what

 

At this stage you are basically done. You’ve got your business problem, you’ve got a bunch of hypotheses ranked by which ones you think are likely having the biggest impact on the business problem, and you’ve got the analytics tasks to prove or disprove each hypothesis. Now it’s time to get started on answering hypothesis #1 to see if it really is having the impact you think it is.

 

As you work your way through each hypothesis, you’ll likely come up with more hypothesis – great! Write them down and add them to your list. Then regularly go back and re-rank the hypothesis. In my experience, you’ll find that you’ve come up with new hypothesis that are better than your original list – fantastic! Finish off the hypothesis you are working on and go grab the next highest on the list. The iterative nature of this process works in perfectly with the way analytics people tend to think and do their work. We are always looking at results and thinking, ‘Hmm, I wonder if the result I see is because of xyz’, and off we go to look at xyz. The challenge for analytics people is that we end up down a rabbit hole chasing a lead that isn’t significant enough to make a difference to the business. This hypothesis-driven approach should help us stay looking at the most likely to have an impact on the questions.

 

This top-down approach has the advantage of keeping our analytics work very focused on the main outcome. By keeping the business question at the centre and then coming up with a hypothesis about why it could be happening, we don’t leave a lot of room to drift away from the main issue. This means when we get to generating analytics tasks, they are all focused on proving/disproving the hypothesis. You don’t write down any analytics tasks that don’t address the hypothesis. So, if you’ve kept everything tight and focused directly on the question above it, you’ll have a list of very specific questions that will address the main business issue pretty quickly.

 

The top-down approach can also be done from the bottom up, but we’ll talk about that in another article as it’s a different way of thinking.

Hayden is a Principal Data Analyst/Scientist specialising in marketing and commercial analysis. He has 10 years of experience in retail, travel, tourism and education sectors. He lives in Taupo, New Zealand with his wife, three kids and two cats.

Get In Touch