AI Horizon Forecast

Nov 14Edited

Thank you Konrad! The authors clearly mentioned in the paper the datasets they used. This one was not mentioned, and one of the authors saw my benchmark (he would have pointed it out if there had been any leakage).

I tried a trick in the past with the first generation of TSFMs, and it often worked: select a dataset and compute the ratio MSE(dataset_predictions) / MSE(dataset+noise_predictions). If the ratio drops significantly, that’s an indication of data leakage. I repeated this test recently with MOIRAI-2, and it seems it doesn’t work anymore. I guess this is because TSFMs use extensive data augmentation, sampling, subsetting, etc., and the original time series may not be used directly during pretraining.

This brings me to my next point. The Chronos variant trained only on synthetic data performed just slightly worse than the released version. Last week TabPFN-2.5 also showed decent scores despite being trained on synthetic data as well. Of course, these are benchmark and not real-word results, but is synthetic data the key to structured models?

Expand full comment

Konrad Banachewicz

Nov 14

1. The ratio is a nice one - hadn't thought of that.

2. Could very well be, although I always felt like synthetic data was just kicking the can down the road - at some point you're going to run out of it, and the tea does not get sweeter from stirring alone.

Expand full comment

Certainly, that's true. But it's still amazing how synthetic-only data improves so much an attention-based model on a "structured" modality like tabular data and time series. There might be a new a paradigm here.

Expand full comment

Graeme

Hi Nikos this is an excellent article in your current run on LLM models. In a previous post you mentioned that number of observations dictated the utility of LLM based forecasters ( I can’t remember the suggest length 1200 obs?). I wondered if given the cross learning capability of chronos 2 with panel sets of related data if the length of the series could allow it to work better with shorter series than other LLMs?

Expand full comment

Reply (2)

https://aihorizonforecast.substack.com/p/using-moirai-2-to-outperform-statistical

Thank you, Graeme! That observation of mine is a rule of thumb based on my current experience. In general, increasing the context length to around 1000 also improves performance. This is especially true for data with clear, multiple, or hidden seasonalities.

However, this isn’t always the case. If you have intermittent or irregular data where part of the signal is actually noise, increasing the context length might not help (I mention this in my previous article). You can also check my MOIRAI-2 tutorial, where I test it on the BOOM dataset (observability data). You’ll see that if you increase the context length, the model can actually perform worse:

Shorter-length data is not a problem either. Cross-learning compensates for this by learning cross-dependencies across multiple time series. If there is useful signal, cross-learning will capture it. The Chronos Benchmark II was built specifically to test this behaviour.

One more note: Chronos-2 is not an LLM (although it borrows some architectural elements from LLMs). LLMs learn the conditional probability of the next word/token given an input context by selecting the most probable token from a fixed vocabulary, minimizing cross-entropy loss.

I mention this because there are native LLM-based foundation models like TimeLLM and Lllmtime that use an LLM as a backbone. These have not been successful, and they are generally not worth spending time testing ;)

Expand full comment

Graeme

Thank you for the through response Nikos, now you’ve mentioned it I’m not sure why I though cross learning would help with shorter series. I largely neglected foundational models. This article has drawn me to it, as exogenous variables and cross learning are critical in a current project, which I wasn’t certain was an option with foundational models before, or enough so to make them useful. Glad to how much progress is happening. It’s great to have someone like yourself providing material on this. Much appreciated.

Expand full comment