iTransformer: Using Transformers for Time-Series Forecasting the Right Way
An innovative approach to forecasting with Attention-Based Models
Using Transformers outside NLP is both intriguing and, at times, controversial.
For instance, in a milestone paper [1], Google DeepMind researchers examined the efficiency of Vision Transformers versus ConvNets. Even in NLP, we now have non-Attention models that challenge Transformers in some tasks (e.g. Mamba and xLSTM)
We've previously discussed forecasting Transformers here and here.
However, one model employs a simple trick to boost forecasting Transformer performance by 30-60%: the Inverted Transformer, or iTransformer.
This technique is now widely used, with variations, across many attention-based and foundation TS models. However, iTransformer was the first to demonstrate those advantages.
In this article, we’ll dive into iTransformer—exploring how this model works and why a straightforward transformation allows it to outperform other Transformer-based models (and beyond).
Let’s get started.
✅ Find the hands-on project for iTransformer in the AI Projects folder (Project 8), along with other cool projects!
Enter iTransformer
Inverted Transformer (iTransformer) is a Transformer-based forecasting model that applies attention to inverted dimensions - meaning over feature dimension, not time dimension, significantly enhancing performance.
If this seems complex, don’t worry — I’ll illustrate it with an example later.
Key features of iTransformer:
Open-Source: iTransformer is open-source and integrated into popular libraries like Nixtla’s NeuralForecast.
Multiple Channels: It handles multiple time-series channels, modeling them jointly.
Multivariate Modeling: Attention over feature dimensions captures multivariate correlations across time series.
Speed: Despite being Transformer-based, its inverted attention approach is faster.
Template Flexibility: Though a standalone model, iTransformer’s inverted technique can be applied to any attention-based model.
Next, let’s look at how this inversion works.
Note: In keeping with the paper's terminology, the terms features, variates, and channels are used interchangeably and mean the same thing.
What Problem Does iTransformer Solve?
While "inverted dimensions" may sound complex, iTransformer’s approach is straightforward.