ChronosX: Extending Time-Series Foundation Models to Support Exogenous Variables

A plug-and-play mechanism for Chronos, extending to any foundation TS model

Apr 16, 2025

Foundation models excel in univariate time-series benchmarks.

Nixtla’s mega-study showed this, as we discussed here. However, external information is often key to temporal decisions in real-world datasets.

Extending foundation models to use covariates is challenging. How do you build a pretrained model that adapts to new correlations in unseen data? It seems impossible at first.

Some models use workarounds (more on that later) — but they don’t cover every case. This paper introduces a new plug-and-play method by the Chronos and Amazon team:

ChronosX adds an adapter on top of Chronos, enabling the model to use past observed and future known covariates.

This adapter-based method works with any univariate foundation model!

In this article, we’ll break down how it works — and dive into some interesting benchmarks.

Let’s get started!

✅ 83% of my paid subscribers have stayed for over a year since I launched. If you're in it for the long run, consider switching to an annual subscription—you'll save 23%!
Get the Annual subscription! 🎁

Preliminaries

First, a quick look at how current foundation time-series models work.

The only model that natively supports covariates is MOIRAI. It uses a mechanism called any-variate attention to capture interdependencies between features:

MOIRAI: Zero-Shot Forecasting Without Training - Complete Tutorial

Nikos Kafritsas

August 21, 2024

Read full story

Some models include covariates only during fine-tuning. One example is Tiny-Time-Mixers (TTM), which uses channel mixing and an exogenous infusion module to handle past observed, future known, and static variables:

Tiny-Time-Mixers R2 (TTM): More Accurate Predictions with Exogenous Feature Mixing - Complete Tutorial

Nikos Kafritsas

Mar 24

Read full story

An exception is TabPFN-TS — a tabular regression foundation model repurposed for time-series forecasting. It’s trained on synthetic multivariate data and can use exogenous covariates during inference (except for past observed inputs).

TabPFN-TS: A Surprising New Breakthrough in Time-Series Forecasting

Nikos Kafritsas

Feb 28

Read full story

Other foundation models like TimesFM and Chronos are pretrained as univariate models. A common workaround is to train a separate regression model for covariates. This works well in most cases — but still doesn’t support past observed inputs.

Enter ChronosX

The Chronos research team presents an innovative solution to solve these challenges—the ChronosX model.

Interoperability: It was built for Chronos but can be applied to any univariate pretrained model — including patching-based ones like TimesFM.
Simplicity: The module consists solely of linear layers.
Versatility: Supports both past-observed and future-known covariates.
Ease-of-use: You can finetune the full model & the adapters, or just the adapter modules.

The authors tested various architectures and identified the optimal setup. Specifically, they use 2 blocks (Figure 1):

Input Injection Block: Updates pretrained token embeddings of the target variable using past covariates.
Output Injection Block: Adjusts the output distribution (logits) using future covariates.

**Figure 1:** *Illustration of the 2 adapter blocks (Source [1])*

Let’s explore each block.

Input Injection Block

The Input Injection Block (IIB) is illustrated below:

Each equation and operation is detailed below:

The module is simple and efficient—it merges past covariate information with target embeddings using basic projections.

Output Injection Block

The Output Injection Block (OIB) is displayed in Figure 3:

The equations below describe the operations inside OIB (Figure 3) in more detail:

Again, the adapter is simple—the goal is to adjust the model’s output based on future-known variables.

Benchmarks

Next, the authors put their adapter module to the test. Due to limited time-series datasets with covariates, they also created a synthetic one (in addition to a real dataset, discussed next).

Setup Dataset Configuration

The synthetic dataset has 2 variants:

Simple: Contains only simple sinusoidal signals.
Complex: Contains combined sinusoidal signals, with added noise.

Both variants are augmented using external covariates, applied via addition or multiplication. The 4 covariate types are:

Spikes – Model abrupt, short-lived events like strikes or power outages.
Steps – Capture sudden, lasting shifts such as promotional discounts in retail.
Bells – Represent smooth, temporary fluctuations, for example, during holiday seasons.
Autoregressive – Simulate scenarios where covariate values depend on their own historical patterns.

Figure 4 shows an example of this augmentation:

**Figure 4:** Synthetic time series are generated by applying one of 4 external covariates to one of 4 base signals using either addition or multiplication. This process produces realistic time series with corresponding covariates. (Source [1])

The model receives both the augmented signal and its covariates as input— and it should ideally forecast accurately the augmented signal by understanding the context (the covariates).

Modeling Configuration

Next, the authors configured how the adapter model is used. There are 2 modes:

Adapter-only: Only the adapters are trained — the backbone architecture is frozen.
Fully Fine-Tuned (FF): All parameters, including the pretrained model’s, are trained.

For example, ChronosX refers to the adapter-only trained model, and ChronosX(FF) refers to the fully-finetuned variants with the adapters.

Regarding metrics, the authors use the following:

Weighted quantile loss (WQL): Evaluates how well predicted quantiles align with actual values, computed over quantiles {0.1, 0.2, ..., 0.9}.
Mean absolute scaled error (MASE): Measures deviation between the forecast median and the in-sample Seasonal Naive
Aggregated metrics: Both WQL and MASE are reported as geometric means, normalized by baseline performance.

Benchmark on Synthetic Datasets

The authors benchmarked multiple model types in this setup — statistical, DL and foundation models. Foundation models fall into 3 categories:

Zero-shot: e.g. Chronos-small
Zero-shot where only the adapters are trained: e.g. ChronosX
Fully Finetuned with adapters: e.g. ChronosX(FF)

Aside from Chronos, the authors applied their covariate adapter mechanism to TimesFM and MOMENT as well.

Figure 5 summarizes the results:

**Figure 5:** *The synthetic dataset benchmark (Source [1])*

We notice the following:

Covariate incorporation enhances performance: Adapter-only approaches (including ChronosX) and their fine-tuned variants, which integrate covariates, significantly outperform their zero-shot pretrained counterparts— highlighting the value of covariate integration.
ChronosX’s notable improvement: ChronosX improves ~22% over Chronos-Small on both WQL and MASE — showing its strength as a forecasting model.
Full finetuning helps: Full fine-tuning generally boosts performance further, as demonstrated by ChronosX (FF) and MOMENTX (FF), although slight performance decreases can occur in specific cases, such as with TimesFMX (FF) on complex datasets.
Contextual Performance of Baselines: Methods like TFT, DeepAR, and PatchTSTx perform well on simple data, but their advantage decreases on complex data.

There are some things to consider here:

ChronosX uses Chronos-Small (46M) — a less accurate base model.
The authors didn’t use newer (and stronger) foundation models — e.g., TimesFM-2.0 instead of Times-1.0, Chronos-Bolt instead of Chronos, MOIRAI-MOE instead of MOIRAI.
They do use the newer TTM-R2 variant, but with a limited 512 context length — less performant than the better version, TTM-A (as explained here)
In most cases, they could’ve used larger context lengths — foundation models often perform better at max context length.

So every foundation model is at a disadvantage in this benchmark. Still, the experiments show that the covariates adapter is useful.

Benchmark on Real Datasets

The models are then tested on real-world datasets with covariates — carefully selected to avoid data leakage from pretraining.

The results are shown in Figure 6:

**Figure 6:** *The real-world dataset benchmark (Source [1])*

Let’s discuss the results:

Competitive Forecast Accuracy of ChronosX: ChronosX scored best on WQL and ranked top-5 on MASE — showing strong overall forecasting performance.
Chronos vs. other pretrained models: ChronosX consistently beats its zero-shot version and other adapters like TimesFMX and MOMENTX.
Improvement over original models: Even weak performers like MOMENT improved significantly with covariate adaptation (e.g., MOMENTX) — highlighting the effectiveness of these extensions.
Fine-tuned adapters perform worse: This seems strange at first — however, the authors explain that some datasets had only sparse single sequences. In such cases, pretrained models with more trainable parameters suffer.

Overall, the benchmark supports ChronosX’s broad applicability and performance across frequencies and forecast horizons. More details on the dataset and the benchmark configurations can be found in the original paper [1].

Ablation Studies

Finally, the authors conduct several ablation studies to assess the impact of using covariates.

In one experiment, they test whether performance gains come from the covariates rather than the extra linear layers in the adapters. To isolate this, they remove the weight matrices tied to covariates—specifically, the WIIB_cov and WOIB_covmatrices mentioned earlier.

Thus, they create 2 models: ChronosX (NC) and ChronosX (FF) (NC)—no-covariate versions of ChronosX and ChronosX (FF). Benchmarks are shown in Figure 7:

**Figure 7:** *Ablation study regarding the weight matrices of the covariate modules (Source [1])*

As expected, the ChronosX (NC) and ChronosX (FF) (NC) perform worse —highlighting the value of the adapter covariate modules.

Feel free to read the original paper for more ablation experiments — they also contain experiments regarding the architecture of the adapter, the model size, and so on.

Note: ChronosX will be open-sourced (the Github repo is here). When that happens, I’ll add it to the AI Projects Folder—stay tuned!

Closing Remarks

This paper offers a simple, effective way to boost foundation model performance universally by adding covariates.

Even though older foundation models versions were used (a disadvantage vs. newer ones), the results clearly show that adapter-augmented models outperform the simpler versions.

One limitation of this approach is that the adapter requires training—even if it's lightweight. This technically breaks the idea of zero-shot inference. Still, I don’t see this as a major issue. In practice, any competitive zero-shot model ends up needing some finetuning.

The next milestone for pretrained time series models is to bring covariates into the inference step—similar to how LLMs handle context through prompts. We've already seen this with the Context-In-Key model, which uses LLaMA to integrate additional covariates.

We’ll explore these approaches in the future — so stay tuned!

Thank you for reading!

Horizon AI Forecast is a reader-supported newsletter, featuring occasional bonus posts for supporters. Your support through a paid subscription would be greatly appreciated.

Also, check the AI Projects folder for some cool hands-on projects!

References

[1] Arango et al. ChronosX: Adapting Pretrained Time Series Models with Exogenous Variables

AI Horizon Forecast

MOIRAI: Zero-Shot Forecasting Without Training - Complete Tutorial

Tiny-Time-Mixers R2 (TTM): More Accurate Predictions with Exogenous Feature Mixing - Complete Tutorial

TabPFN-TS: A Surprising New Breakthrough in Time-Series Forecasting

Discussion about this post