Retail Forecasting with IBM’s Tiny Time Mixers (TTM): Step-by-Step Tutorial
Putting the popular foundation model to the test!
Foundation models are already reshaping time-series forecasting.
Take the VN1[1] retail forecasting competition for example:
Nixtla’s TimeGPT ranked 2nd using zero-shot forecasting[2] — no training, ensembling, or postprocessing (e.g., no manual forecasting of zero-sales products).
Similarly, finetuned MOIRAI-base achieved 1st place in the same competition[3]!
This shows strong potential for foundation models — when used correctly. But not all are mature enough for every case. Retail forecasting often needs exogenous variables, which some zero-shot models don’t support yet.
In this article, we’ll walk through a hands-on tutorial using IBM’s Tiny-Time-Mixers on a retail forecasting task. We’ll use a Kaggle dataset with deeper hierarchies than VN1.
Let’s dive in!
✅ Find the Tiny-Time-Mixers notebook for this article in the AI Projects folder (Project 17)
Enter Tiny-Time-Mixers
I’ve covered TTM extensively on my blog, including a hands-on tutorial for Electricity Demand Forecasting:
To recap, there are the advantages of TTM:
Multi-level modeling: TTM first trains on univariate sequences, then integrates cross-channel mixing during finetuning to learn multivariate dependencies.
Dynamic Patching: TTM adjusts patch lengths across layers, letting each time series use its optimal resolution for better generalization.
Frequency-Aware Encoding: TTM embeds time-series frequency (e.g., monthly, minutely) to improve prediction accuracy across different temporal resolutions.
Open-Source: Apache License!
Moreover, TTM is actively developed. Two months ago, version 2.1 was released — better suited for daily and weekly seasonalities. That’s the variant we’ll use here.
Prepare the Dataset
In this project, we'll use a dataset from the Kaggle Tabular Competition [4].
This dataset tracks daily sales of 4 books across 2 stores in 6 different countries, spanning from 2017 to 2021. Our goal is to predict sales for each book, in each store, across every country.
This is a perfect dataset for TTM, because:
It includes additional static features and hierarchies of countries, stores, and products.
It includes the COVID-19 pandemic, which introduced a major regime change in sales patterns.
Let’s start with some visualizations of the dataset:
This lets us evaluate how well the foundation model handles sudden, real-world changes. Then: