AI Horizon Forecast

AI Horizon Forecast

Share this post

AI Horizon Forecast
AI Horizon Forecast
iTransformer: Using Transformers for Time-Series Forecasting the Right Way

iTransformer: Using Transformers for Time-Series Forecasting the Right Way

An innovative approach to forecasting with Attention-Based Models

Nikos Kafritsas's avatar
Nikos Kafritsas
Nov 05, 2024
∙ Paid
9

Share this post

AI Horizon Forecast
AI Horizon Forecast
iTransformer: Using Transformers for Time-Series Forecasting the Right Way
1
Share

Using Transformers outside NLP is both intriguing and, at times, controversial.

For instance, in a milestone paper [1], Google DeepMind researchers examined the efficiency of Vision Transformers versus ConvNets. Even in NLP, we now have non-Attention models that challenge Transformers in some tasks (e.g. Mamba and  xLSTM)

We've previously discussed forecasting Transformers here and here.

However, one model employs a simple trick to boost forecasting Transformer performance by 30-60%: the Inverted Transformer, or iTransformer.

This technique is now widely used, with variations, across many attention-based and foundation TS models. However, iTransformer was the first to demonstrate those advantages.

In this article, we’ll dive into iTransformer—exploring how this model works and why a straightforward transformation allows it to outperform other Transformer-based models (and beyond).

Let’s get started.

✅ Find the hands-on project for iTransformer in the AI Projects folder (Project 8), along with other cool projects!

Enter iTransformer

Inverted Transformer (iTransformer) is a Transformer-based forecasting model that applies attention to inverted dimensions - meaning over feature dimension, not time dimension, significantly enhancing performance.

If this seems complex, don’t worry — I’ll illustrate it with an example later.

Key features of iTransformer:

  • Open-Source: iTransformer is open-source and integrated into popular libraries like Nixtla’s NeuralForecast.

  • Multiple Channels: It handles multiple time-series channels, modeling them jointly.

  • Multivariate Modeling: Attention over feature dimensions captures multivariate correlations across time series.

  • Speed: Despite being Transformer-based, its inverted attention approach is faster.

  • Template Flexibility: Though a standalone model, iTransformer’s inverted technique can be applied to any attention-based model.

Next, let’s look at how this inversion works.

Note: In keeping with the paper's terminology, the terms features, variates, and channels are used interchangeably and mean the same thing.


What Problem Does iTransformer Solve?

While "inverted dimensions" may sound complex, iTransformer’s approach is straightforward.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Nikos Kafritsas
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share