AutoGluon-TimeSeries : Creating Powerful Ensemble Forecasts - Complete Tutorial
Amazon's framework for time-series forecasting has it all.
Welcome to the 2nd edition!
The 1st edition briefly discussed AutoGluon–TimeSeries (AG-TS), an extension of Amazon’s popular Autogluon framework.
In this article, we delve deeper into AG-TS. We build an end-to-end project that showcases the full capabilities of AG-TS in time series forecasting.
Project Outline
Introduction
Load Data
Preprocess Data
Model Building
Fast Training mode
Medium quality mode
Best Quality mode
Cross Validation
Best Score (Best Quality + Cross Validation + Tuned Hyperparams)
Extra covariates / Exogenous variables
Hyperparameter tuning
Closing Remarks
Let’s dive in!
Horizon AI Forecast is a reader-supported newsletter, featuring occasional bonus posts for supporters. Your support through a paid subscription would be greatly appreciated.
Introduction
AutoGluon–TimeSeries (AG-TS) is a robust framework for effective time-series forecasting. AG-TS is ideal because it supports:
Cutting-Edge Model Selection: AG-TS provides state-of-the-art models (statistical, ML, DL).
Wide Integration: Despite being developed by Amazon researchers, AG-TS integrates popular models from other libraries (like the Auto-* statistical models from Nixtla’s StatsForecast library)
Auto Training Presets: AG-TS provides 4 pre-defined ‘quality configurations’ for training multiple models: Fast, Medium, High, and Best qualities. For example, Fast quality contains simple statistical models, like Theta.
Ensemble Boost: AG-TS enhances accuracy by training a final ensemble model using all specified models – whether these models are specified by an automatic preset, or manually by the user.
Note: Ensembling in forecasting significantly enhances accuracy, as documented here.
Load Data
You can find the notebook with the full code here: Get Project #1
We'll use the Tourism dataset from Kaggle's Tourism forecasting competition, which can be directly loaded from GluonTS:
!pip install gluonts
!pip install autogluon
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import subprocess
from gluonts.dataset.repository import get_dataset, dataset_names
from gluonts.dataset.util import to_pandas
from gluonts.evaluation.metrics import mse
from autogluon.timeseries import TimeSeriesDataFrame, TimeSeriesPredictor
dataset = get_dataset("tourism_monthly")
Regarding the tourism-monthly dataset:
The dataset contains 366 time series.
It comes pre-split into train and test datasets.
The test dataset contains the same data as the original data.
In train data, the last
prediction_length
time steps are removed from the end of each time series.Our goal is to predict the next 2 years = 24 months, so we consider
prediction_length
= 24
Regarding evaluation:
Testing evaluation will be conducted on the last
prediction_length
values of each test series. The final test score will be the average across all those time series.Similarly, validation will be performed on the last
prediction_length
values of each train series (followed by averaging). We'll explore more sophisticated validation techniques later.The evaluation metric we'll use is MSE, though we'll explore MASE later.
Next, let’s plot the first time series in the tourism-monthly dataset:
train_entry = next(iter(dataset.train))
test_entry = next(iter(dataset.test))
test_series = to_pandas(test_entry)
train_series = to_pandas(train_entry)
fig, ax = plt.subplots(2, 1, sharex=True, sharey=True, figsize=(10, 7))
train_series.plot(ax=ax[0])
ax[0].grid(which="both")
ax[0].legend(["train series"], loc="upper left")
test_series.plot(ax=ax[1])
ax[1].axvline(train_series.index[-1], color="r")
ax[1].grid(which="both")
ax[1].legend(["test series", "end of train series"], loc="upper left")
plt.show()
Note: In most time series libraries, we usually split data into train and test datasets, as 2 separate dataframes. In AT-GS, data wrangling is easier if you consider the original time-series as the test set and the original time-series minus the
prediction_length(s)
as the train set.