Preparing for a Data Science Interview

Artificial Intelligence (AI) is exponentially rising in momentum and popularity. With the rise in popularity, the demand for highly sought-after data scientists is also on the rise. Preparing for a data science interview can also be an intense process given the diverse skills required, from statistical knowledge and computer programming to problem-solving and communication. In order to keep up with competition and secure a dream data science role, it is vital to put sufficient preparation into the interview process.

This article provides a guide to preparing sufficiently for the interview process, including explanations and examples of behavioural questions, technical questions and computer programming tests.

Preparing for a Data Science Interview
Revise Core Data Science Concepts

Studying the core data science concepts ensures that the interviewee has a solid foundation to prepare for the diverse challenges presented during an interview. By understanding these core concepts, you can answer questions more confidently, articulate your responses more clearly and demonstrate your ability to apply theoretical knowledge to real-life business problems. A few core data science concepts are provided below:

 

Statistics & Probability – Brush up on the fundamentals of probability distributions, hypothesis testing, p-values, confidence intervals and statistical significance.
Machine Learning – Gain an understanding of supervised and unsupervised learning algorithms. Know when to use each type and be prepared to explain statistical modelling approaches such as linear regression, logistic regression, decision trees, random forests, k-means clustering and neural networks.
Python/R Proficiency – Python is the dominant language in data science. Practice writing functions, handling exceptions, and using libraries like NumPy, Pandas and Scikit-learn. If R is the preferred language of the interviewee, or is the language required for the role, brush up on fundamental data manipulation and visualisation libraries such as data.table and ggplot2, or machine learning libraries such as caret and xgboost.
SQL Proficiency – Data scientists often need to extract and manipulate data using SQL. Practice writing queries involving joins, grouping, filtering and nested subqueries.
Data Wrangling – Be comfortable with data cleaning, transformation and manipulation. This often involves using tools like Pandas in Python or data.table in R.

 

Prepare Responses to Behavioural Questions
The beginning of most interviews begins with the interview question ’Tell me about yourself’. Prepare a concise summary of your professional background. Ensure that key skills and experience relevant to the role is highlighted. This is usually the starting point of an interview, and a well-prepared response is a great opportunity to overcome any natural nervousness and make a good first impression.

 

A large component of interviews in any industry is a variety of behavioural questions. Preparing a detailed response for as many questions as possible is invaluable. Common examples of behavioural questions are:

 

Describe a time where you handled tight deadlines or multiple projects: Provide examples that demonstrate your time management and prioritisation skills.
Describe a time when you had to communicate complex information to a non-technical audience: Explain how you made data-driven insights accessible to stakeholders.
Tell me about a time you failed and what you learned from it: Be honest and reflective, focusing on growth and what you would do differently in the future.

 

The STAR is a method for structuring responses to behaviour-based interview questions, by structuring answers in a Situation, Task, Action and Result (STAR) format. The STAR method provides a clear narrative, making it easier for interviewers to follow the story.

 

An example of a prepared behavioural question response is provided below. Notice that a thorough overview of the approach is described, including the analytical tools and techniques, in addition to how they were applied to solve a business problem. Utilising the STAR structure, with a particular focus on the data science aspects of the example, allows the interviewee to showcase their knowledge and expertise.

 

Describe a challenging data science problem that you’ve worked on.

 

Situation: I worked at a bank that aimed to become the market leading mortgage providers.

 

Task: To enhance efficiency and digitalise the pricing process of products at the bank, I was responsible for building a mortgage price optimisation model.

 

Action: SAS was the programming language of choice for data extraction and model building, which was the preferred programming language at the bank. The problem was split into two separate components: customer acquisition and customer retention. For acquisition, a linear regression model was built to predict the optimal price of products i.e. how much customers are willing to pay and how sensitive they are to price changes. For retention, a logistic regression model was built to predict whether a customer will be retained once their product matures, with a binary yes/no outcome. For customers that are predicted to retain at product maturity, it was also predicted which mortgage product the customer would select. To do so, the logistic regression output fed into a number of additional logistic regression models, each representing the likelihood of taking out a particular product, again with a binary yes/no outcome. To measure performance of the price optimisation model, AIC was the model performance metric of choice. AIC is useful in comparison with other AIC scores for the same dataset. Selecting AIC as the model performance metric enabled the model to run on an automated basis by automatically selecting the result that produced the minimum AIC. Using Visual Basics for Applications (VBA), Microsoft Excel’s in-built programming language, a dashboard was created to display the model’s results. VBA was selected so that the model could be run by stakeholders without a programming background, which automatically updates the elasticity curve based on the selection of interest rates set by the user. If a product is elastic, then small changes in price have a large impact on demand. For a more elastic product the elasticity curve will be horizontal, whereas a less elastic product will tilt more vertically.

 

Result: Prior to implementation of the model, mortgage product managers spent a significant amount of time each month manually comparing potential product prices with prices of prior products or the current products of competitors. With the introduction of the model, pricing could be completed with the click of a button, thus freeing up a significant amount of time for mortgage product managers and refocusing their attention to the organisation’s goal of becoming the market leading mortgage providers.

Retention model structureElasticity curve

 

Technical Exam Preparation
The usage of programming languages can vary across industries and organisations; however, Python & R are the dominant languages, and SQL is almost guaranteed to be used in addition to other programming languages. Research the programming language of choice for the organisation that are holding the interview and prepare for the programming language(s) accordingly.


An example of a technical examination is provided below, which is loosely based on real SQL tests provided during interview processes from past experiences. The test covers a range of SQL skills including joins, aggregations and conditional logic.

 

Datasets

Table 1 products

Table 2 customers

Table 3 orders

Table 4 orders

Questions
1. Total Spend Per Customer:
Write a SQL query to calculate the total amount spent by each customer. The result should include the customer’s first_name, last_name and the total_spent.
2. Top-Selling Products:
Write a SQL query to find the top 3 products based on the total quantity sold. Include the product_name, category, and the total_quantity_sold.
3. Customer Order Summary:
Write a SQL query to get a summary of each customer’s orders, including the customer_id, total_orders and the average_order_value.
4. Monthly Revenue:
Write a SQL query to calculate the total revenue generated for each month. The result should include month (in the format YYYY-MM) and total_revenue.
5. Products with No Sales:
Write a SQL query to identify products that have never been sold. Include the product_id and product_name.

 

Conclusion

• Brush up on knowledge about core data science concepts. Data science is applicable to a range of industries and companies, and the questions asked in interviews may vary dramatically. However, understanding the key concepts is applicable and transferrable irrespective of the industry or company.
• Prepare a fully documented response to as many behavioural questions as possible. Ensure that no stone is left unturned with regards to the data science tools and techniques that are used.
• Complete several mock technical exams that are tailored to interview-style questions. Ensure that SQL knowledge is covered, in addition to the key data science languages such as Python & R.

Josh Pearce
Data Scientist with professional experience in multiple industries across the world.
https://www.linkedin.com/in/joshkylepearce/
For a detailed course on preparing for technical examinations during the interview process, please refer to my Udemy Instructor profile.
https://www.udemy.com/user/josh-pearce-8/

 

See Josh’s profile here.

Get In Touch