Holt-Winters Exponential Smoothing method for time series forecasting

Introduction:

We recently built a platform which empowers our stakeholder Sauti East Africa Organization to make timely decisions on where to direct relief fund when the monitored market prices in East African markets are likely to go beyond forecasted price alert boundaries. The project was sponsored by Lambda School through its lab projects, which invited many partner organizations/companies, to get their students work on real-world big data problems in team efforts. One of the task for data scientists was to select and forecast the daily market prices for thousands of produces in over 100 markets based on 4-years of market price data. I focused my efforts on time series forecasting and hereby report some of my methodologies and findings.

After data wrangling, all our preprocessed data was stored in cloud database(AWS RDS). I selected 5 time series to represent 5 different trends of the time series from the database and started the process of model selection. The candidate neural network models, including Multilayer perceptron (MLP), Long short-term memory (LSTM), and convolutional neural network (CNN), are [compared with] classic autoregressive integrated moving average (ARIMA) and exponential smoothing models. Based on the preliminary results, statistical methods outperform the tested neural network models. This finding also agrees with previous studies, which compared classical statistical methods and machine learning methods on univariate time series forecasting [1, 2]. Based on literature and my own findings, I selected Holt-Winters Exponential Smoothing as my final forecast model.

Luckily statsmodels library has included Holter-Winters model and some optimization option. When the Holt-Winters model optimization is set to true, some of the parameters are automatically tuned during fitting. These parameters are smoothing level, smoothing slope, smoothing seasonal, and damping slope. To find the best combination for the rest of the unoptimized parameters, We can implement search method, combined with a forward sliding window validation on our time series dataset. In our project, I used grid-search which works better for the domains of searched parameters, ie, trend, dampening, seasonality, seasonal period, Box-Cox transform, removal of the bias at fitting). The RMSPE (root mean squared percentage error) is used to evaluate the model performance and the smallest RMSPE indicates the best combination for all model parameters.

Methods description:

Data preprocess. Time series sequences are preprocessed (null removed, zero removed, typo corrected, duplication removed, missing entries imputed) and stored in our analytical database in AWS Cloud RDS. The test sequences are extracted from analytical database giving the unique combination of product_name, market_id, and source_id. The Python classes and functions for the data preprocess can be found in data_process_v2.py.
Grid-searching with slide window forecasting. First we remove the outliers and pinned a day-by-day timeframe on the time series, then we interpolated the data to fill small data gaps. Then we split the data into initial train, initial test, and validation periods. A forecast window, with default length of 30 days, slides down the end of the initial train at a default pace of 14 days per slide. The 144 combinations of model parameters are fed into the model for each window forecast and model scoring. Finally, the best score returned from all valid windows and its correspondent model parameters are recorded and saved to our analytical database. In this process, each time series (with a unique source-market-product combination) gets its own optimized model configuration. The Python classes and functions created for this step can be found in classic_forecast_v2.py.
Exceptions. Poor configuration of model parameters can result in failures during model fitting. Sometimes model fitting does not converge during certain slide, in which case the window is discarded during grid searching. For time series with very poor data quality (e.g., sparse data with years of data gaps), model fitting fails at all windows and the grid searching returns none.

Configurations:

As a first approach, test_65_sequence.py selects 65 sequences from our 4-year East Africa Market Price Dataset provided by our stakeholder. The default configurations for our forecast model are as follows:

Data split (in days):
train (start) = 692, test(start) = 1038, val =30, window length = 30, sliding distance = 14
Holt-Winters Exponential Smoothing model parameters:
A total of 144 configurations (144=3x3x2x2x2x2).

The searched parameters t, d, s, p, b, r are trend type, dampening type, seasonality type, seasonal period, Box-Cox transform, removal of the bias at fitting, respectively. Their search domains are:
t_params = [‘add’, ‘mul’, None]
d_params = [True, False]
s_params = [‘add’, ‘mul’, None]
p_params = [12, 365]
b_params = [True, False]
r_params = [True, False]

The ‘add’ (additive) method is preferred when the seasonal variations are roughly constant through the series, while the ‘mul’ (multiplicative) method is preferred when the seasonal variations are changing proportional to the level of the series.

Model output:

Example of one time series: sale type = retail, market = ‘Dar Es Salaam’, product = ‘ Morogoro Rice’:

Best configuration = [‘add’, True, ‘add’, 12, False, False]

Root mean square percentage error (RMSPE): 5.40% for 30 days forecasting, test dataset and predictions 1.17% for 30 days forecasting, validation data and predictions
All the time series metadata, best parameter configuration, model forecasts for the validation period and the correspondent RMSPE are saved to database table ‘hw_params_wholesale’ and ‘hw_params_retail’ for future reference.

Methodology Pros and Cons:

Pros:

Model parameters can be customized. User has total control over window size (number of days to predict), sliding pace, train-test-val split, and grid-search domain.
Model is highly tolerant, suitable for all time series, even the flat data. Will always return the best model configuration when data pass QC check.
Model forecast results are much more accurate than Facebook Prophet forecast method, and several tested Deep Learning methods.
Model is also adaptive. User can add own model evaluation metrics, add random search instead of grid search, etc.

Cons:

User needs to be familiar with python and database basics.
Time consuming. The grid search method desires large computational power. To get a better idea of computational intensity for the grid search method, we tested one time sequence: Dar Es Salaam Morogoro Rice, which has a total length of 1760 days after interpolation. On a stand-alone machine (2.6 GHz processor, 8 GB RAM), it took 108.73 min to finish grid searching of 144 configurations; while on a virtual machine with 64 GB RAM and 16 GPU cores (an AWS EC2 instance type m5ad.4xlarge), it took 17.38 min to complete the same task. For 65 sequences, it took 11.38 hours on the stand-alone machine, and 3.06 hours using the AWS EC2 virtual machine.

Improvement suggestion:

Since we have built data quality index for every time series in our AWS database, the future developer can use DQI metric to define prediction resolution for each time series, instead of using an universal day-by-day time frame. This customized forecast resolution feature could reduce uncertainty due to interpolation and reduce runtime and improve overall accuracy.
For time series where a large data gap exists, the interpolation method can result in a flat line, and the search method will favor dampening this flat trend. Since we apply a sliding window, and select best parameters based on smallest error over all sliding steps, this data gap effect could have been largely reduced. At this point, the data gap effect is still not clear and need further investigation.
Random search and Bayesian Optimization could reduce the computation time, and are best-suited for optimization over continuous domains. These search methods worth exploring, keeping in mind of the discrete nature of the Holt-Winters model parameter domains (except for seasonal period p, which can be set as continuous integers).