Toto Part 2: A Hands-On Guide to Zero-Shot Forecasting
2 practical examples: Electricity Demand and Sparse data forecasting
Part 1 explored Toto and highlighted its unique features.
To recap:
Toto is a 151M parameter model, pretrained on 2.36 trillion tokens with ~70% coming from Datadog’s private telemetry dataset.
Datadog also released along with Toto the BOOM dataset, a new dataset with 350M observations across 2807 distinct multivariate time series—twice the size of the GIFT-eval benchmark.
In this 2nd part, we’ll walk through 2 tutorials and use Toto for:
Long-context forecasting on the Electricity dataset — with rolling forecasts across the full test series for more rigorous evaluation.
Zero-shot forecasting on sparse time series — using an example from the BOOM dataset.
Let’s get started!