Time series forecasting is pivotal for retail, guiding strategic decisions and operational efficiencies through historical data analysis to predict trends and inventory demands. The article explores various forecasting approaches, assessing their strengths and weaknesses, with a specific focus on the STL decomposition method for its robustness in capturing seasonal and trend components accurately.
The Crucial Role of Time Series Forecasting in Retail
Time series forecasting has been a game-changer in the retail industry, essential for strategic planning and daily operations. By analysing historical data collected over time, it predicts future trends, demand, sales and inventory needs. Its applications are vast and diverse, offering critical insights that can propel retail businesses to success.
For example, at a macro level, forecasting helps retailers understand seasonal trends and predict sales volumes during peak shopping periods such as Black Friday or Christmas. This enables them to optimize stock levels, reducing both overstock and stockouts, and ensure they can meet customer demand without excessive inventory holding costs.
At a micro level, time series forecasting assists in managing daily operations. Retailers can predict daily customer footfall, adjust staffing levels accordingly, and even optimize energy consumption based on expected store activity. Furthermore, it’s instrumental in pricing strategies – predicting when prices should be dropped for clearance or raised during high demand periods.
In this article, first, we review some of the most well-known techniques in time series forecasting. Next, we will review STL (Seasonal and Trend decomposition using Loess) as one of the most effective approaches in my experience.
A Review on the Most Well-known Time Series Forecasting Techniques
Various time series forecasting techniques have been developed and refined to suit the diverse needs of the retail sector. Each technique offers unique advantages and is chosen based on the specific requirements and data characteristics of the forecasting task at hand. In the following, we review some of the well-known techniques. To find out more about other methods or greater details, articles [1] and [2] could be a good start.
ARIMA (Autoregressive Integrated Moving Average): ARIMA models are widely used for their flexibility in modelling data that shows trends and seasonal patterns. They are particularly useful when data exhibits non-stationary characteristics, requiring differencing to make it stationary. This approach is extensively used in financial planning since it can handle non-stationary data.
Holt-Winters Method: The Holt-Winters method is an extension of exponential smoothing designed specifically to address data with trends and seasonal variations. This method applies three types of smoothing (level, trend and seasonal) to decompose the time series and forecast future values. The level equation adjusts the series to the average value, the trend equation estimates the change in the series over time, and the seasonal equation captures repeating short-term cycles within the data.
Holt-Winters can be implemented in either an additive or multiplicative form, depending on the nature of the seasonal effect. It’s particularly effective for short-term forecasts in retail, telecommunications and inventory management where seasonal patterns are pronounced.
Neural Networks: These models can capture complex nonlinear relationships that other models might miss. In retail, neural networks are beneficial for large datasets with intricate patterns that simpler linear models cannot adequately model.
Prophet: Developed by Facebook, Prophet is designed for forecasting at scale, handling seasonal data typical in retail with ease. It excels in daily data predictions with strong multiple-seasonalities and holiday effects, which are common in retail sales data.
LSTM (Long Short-Term Memory): A type of recurrent neural network, LSTMs are particularly adept at making predictions on time series data where long-term dependencies are crucial. They are ideal for forecasting sales in fashion retail, where past trends can significantly influence future sales.
Hybrid Methods: Hybrid methods in time series forecasting combine two or more different forecasting techniques to leverage the strengths of each. By integrating diverse approaches, hybrid models can often achieve higher accuracy than single-model approaches, especially in complex scenarios involving non-linear patterns and multiple seasonal cycles.
Common hybrid models include combining ARIMA with machine learning techniques like neural networks (ARIMA-NN), which allows the model to capture both linear relationships and complex non-linear patterns. Another example is the integration of exponential smoothing models with artificial neural networks to enhance predictive performance while accounting for both trend and seasonality. Hybrid models are particularly useful in highly volatile environments, such as financial markets and energy demand forecasting, where the dynamics of the data are too complex for a single method to model effectively.
Standard Machine Learning Techniques: This approach involves breaking down a time series into several components – typically trend, seasonality and residuals. By isolating these elements, one can apply machine learning methods more effectively, tailoring models to the unique characteristics of each component.
In my practical experience, applying time series decomposition before using machine learning techniques allows for more precise adjustments to models based on identified patterns. For example, once the trend and seasonal component are isolated, a model can be specifically trained to predict the remaining variations, which are paramount in retail forecasting. This method not only improves the accuracy of the predictions but also enhances the interpretability of the models, providing clearer insights into what drives changes in sales and customer behaviour.
Standard machine learning models are favoured for their practicality, offering data scientists ample flexibility to engineer features tailored to specific problems. Given the widespread familiarity with these models among professionals, I will next introduce one of the most efficient non-parametric methods for time series decomposition.
STL: Seasonal-Trend Decomposition Procedure Based on LOESS
In this section, we explore STL, a non-parametric technique for decomposing time series data into trend, seasonality and residuals. To fully understand STL, we first examine its core component, the LOESS method, which forms the foundation of this approach.
LOESS, which stands for Locally Estimated Scatterplot Smoothing, is one of the most effective methods I have used in extracting trends in time series data, especially when dealing with non-linear trends in retail sales. LOESS [3], first introduced to fit a scatter plot to enhance the visualisation to identify the patterns in the data, is a non-parametric technique that focuses on creating a smooth line through a scatter plot providing a robust, flexible method for smoothing and forecasting time series data. In summary, LOESS works as follows:
1. Choosing the Span: This defines the fraction of the total data points used to fit each local model. A larger span means more smoothing, as the fit uses more data points around each target point.
2. Weighting Function: Typically, LOESS uses a weighting function to give different weights to data points based on their distance from the target point being estimated. The weights decrease with distance, making the estimation less sensitive to outliers. A common choice is the tricube weight function, defined as:
Here x is the scaled distance from the target point.
3. Fitting the Model: For each point in the dataset, a low-degree polynomial (commonly quadratic or linear) is fitted to the subset of data points selected by the span and weighted by the function. This local fitting is typically done using least squares.
Figure 1 illustrates the operation of the LOESS method. In the figure, the red data points represent the selected sample for fitting a local model. The green curve shows the weights assigned to these local points, and the straight line represents the model fitted to these points, which in this case is linear. Figure 2 illustrates the performance of LOESS in practice in a real-world data. As can be seen, LOESS is quite powerful in extracting the trend.
Figure 1. Illustration of how LOESS method works.
Figure 2. Trend extracted from real-world time series data.
In the following section, we discuss STL that uses LOESS in its core.
STL: Seasonal-Trend Decomposition Procedure Based on LOESS
Building on the principles of LOESS, the STL (Seasonal and Trend decomposition using Loess) method offers a powerful way to decompose a time series into three components: seasonal, trend and residual as follows:
In STL, the process of decomposing a time series into its components includes an iterative structure with two types of loops: an inner loop and an outer loop. These loops work together to refine the seasonal and trend components while managing the influence of outliers effectively.
Inner Loop: Decomposition of Trend and Seasonality
The inner loop is the core of the STL method, focusing on separating the trend and seasonal components from the time series data. Here’s how it operates:
1. Trend Estimation:
a) Start with an initial estimate of the trend using Loess on the original time series data.
b) This trend is then subtracted from the original data to detrend it, isolating potential seasonal effects.
2. Seasonal Estimation:
a) With the trend component removed, apply Loess to each distinct seasonal cycle (e.g., monthly, quarterly) across the entire series to capture the repeating seasonal pattern.
b) This seasonal estimation is then removed from the original data to focus on refining the trend.
3. Updating Components:
a) After removing the estimated seasonal component, the trend is re-estimated using Loess on the remaining data.
b) This step is repeated several times, constantly updating the trend and seasonal estimates to closely fit the data.
The inner loop focuses on accuracy and fidelity in capturing the inherent patterns of the data, constantly refining the separation of trend and seasonality without yet addressing outliers explicitly.
Outer Loop: Robustness Against Outliers
The outer loop of the STL method adds a layer of robustness to the decomposition process by adjusting weights to mitigate the effect of outliers. This is crucial for ensuring that unusual data points do not skew the overall estimates of the trend and seasonal components.
1. Weight Assignment:
a) Initially, all data points are given equal weights.
b) As the inner loop processes and updates the trend and seasonal estimates, residuals (differences between the data and the fitted values) are calculated.
2. Identifying Outliers:
a) These residuals are analysed to identify outliers – points where the residual is significantly larger or smaller than the typical range.
3. Adjusting Weights:
a) Weights are adjusted based on the size of the residuals; points with larger residuals (likely outliers) are given lower weights.
b) This weighting means that in subsequent iterations of the inner loop, the trend and seasonal estimates are less influenced by these outliers.
4. Iterative Refinement:
a) With the new weights, the inner loop is run again, refining the trend and seasonal components with reduced influence from outliers.
b) The process repeats, gradually improving the robustness of the decomposition.
The combination of these loops in STL allows for a detailed and nuanced decomposition of time series data. The inner loop focuses on accurately capturing the underlying patterns, which allows for different smoothing parameters for the seasonal and trend components, while the outer loop ensures that this accuracy is not compromised by outliers, making STL a powerful tool for time series analysis in environments where data can be noisy or contain anomalies, such as in retail sales or industrial data monitoring.
In Figure 3, we apply the STL decomposition method to the real-world data series, denoted as 𝑦. Initially, as discussed in the previous section, the LOESS technique effectively extracts the underlying trend, demonstrating its robustness. Furthermore, the STL method accurately captures the seasonality, which notably varies over time, highlighting STL’s adaptability. The residual component, representing the unexplained variation, could be further analysed using additional machine learning algorithms and comprehensive feature engineering to enhance predictive accuracy.
Figure 3. STL decomposition.
Fortunately, thanks to statsmodels library, STL is very easy to implement. In Figure 4, I have brought a code snippet which describes how STL can be used in practice.
Figure 4. How STL can be applied in practice.
Conclusion
In this article, we kicked off by emphasizing the critical role of time series forecasting in various retail functions like pricing, inventory management, staffing and provisioning. We then explored popular forecasting techniques, from statistical models to neural networks and traditional machine learning algorithms. Our spotlight was on the STL (Seasonal and Trend decomposition using Loess) method, which breaks down time series into trend, seasonal and residual components. This method shines in its ability to capture underlying patterns due to its flexibility. By utilizing STL and other advanced techniques, retailers can gain deeper insights, optimize operations, and ultimately make smarter decisions that boost profitability. The ever-evolving field of time series forecasting continues to present exciting opportunities for enhanced accuracy and efficiency in the retail industry.
References
[1] C. Deb, F. Zhang, J. Yang, S. E. Lee, and K. W. Shah, “A review on time series forecasting techniques for building energy consumption,” Renewable and Sustainable Energy Reviews, vol. 74, pp. 902–924, Jul. 2017
[2] G. Mahalakshmi, S. Sridevi and S. Rajaram, “A survey on forecasting of time series data,” 2016 International Conference on Computing Technologies and Intelligent Data Engineering (ICCTIDE’16), Kovilpatti, India, 2016, pp. 1-8
[3] W. S. Cleveland, “Robust Locally Weighted Regression and Smoothing Scatterplots,” Journal of the American Statistical Association, vol. 74, no. 368, pp. 829–836, Dec. 1979
Armin Bazrafkan is a senior data scientist at Sportsbet, working on the development of token allocation models – a pioneering initiative in the global sports betting industry. Armin has extensive experience across finance, sensor networks and the utility sectors.
LinkedIn: https://www.linkedin.com/in/armin-bazrafkan/
See Armin’s profile here.