Timer-S1

Timer-S1 is a time series foundation model with 8.3B total parameters, 0.75B activated parameters per token, and a context length of 11,520.

The model supports zero-shot forecasting (predicting without dataset-specific training) at different quantile levels.

For more details, please refer to our technical report.

image

Architecture: Timer-S1 is a decoder-only Mixture-of-Experts (MoE) Transformer. For time series forecasting (a sequential problem where each step depends on previous ones), we propose TimeSTP, enabling multi-step prediction with cost-effective serial computations. image

Performance: Timer-S1 achieves state-of-the-art results on GIFT-Eval. The model excels particularly at medium-term and long-term forecasting tasks.

image

image

Post Training: Timer-S1 undergoes post-training, including continued pre-training (CPT) and long-context extension (LCE), which improves short-term and long-context performance.

image

Quickstart

pip install torch accelerate transformers~=4.57.1
import torch
from transformers import AutoModelForCausalLM

# load pretrain model
# supports different lookback/forecast lengths
model = AutoModelForCausalLM.from_pretrained(
    'bytedance-research/Timer-S1',
    trust_remote_code=True,
    device_map="auto"
)

# use local model
# model = AutoModelForCausalLM.from_pretrained(
#     'path_to_timer_s1',
#     trust_remote_code=True,
#     device_map="auto"
# )

# prepare input
batch_size, lookback_length = 64, 11520 
seqs = torch.randn(batch_size, lookback_length).to(model.device)

# Note that Timer-S1 generates predictions at fixed quantile levels
forecast_length = 256

output = model.generate(seqs, max_new_tokens=forecast_length, revin=True)

# produce quantile forecasts in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
print(output.shape) # batch_size x quantile_num(9) x forecast_length

# produce the median forecast of the first sample
print(output[0][4])

This model support inference using either CPU or GPU. To load this model on GPU, we recommend a GPU with at least 40GB VRAM (e.g., A100 40GB/80GB, or H100). Encounter out-of-memory at runtime? Try the following options:

# Option 1: reduce batch size or context length
batch_size, lookback_length = 1, 2880

# Option 2: disable KV cache at runtime (or edit it in config.json for a permanent change)
model.config.use_cache = False # there is no efficiency impact for cases where the prediction horizon does not exceed 256.

Specification

  • Architecture: decoder-only Transformer with MoE
  • Context Length: up to 11,520
  • ReNorm: default=True
  • KV Cache: default=True
  • Patch Length: 16
  • Total Parameters: 8.3B
  • Activated Parameters: 0.75B
  • Number of Layers: 40

License Agreement

This model is licensed under the Apache-2.0 License.

Citation

If you find Timer-S1 helpful for your research, please cite our paper:

@article{liu2026timer,
  title={Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling},
  author={Liu, Yong and Su, Xingjian and Wang, Shiyu and Zhang, Haoran and Liu, Haixuan and Wang, Yuxuan and Ye, Zhou and Xiang, Yang and Wang, Jianmin and Long, Mingsheng},
  journal={arXiv preprint arXiv:2603.04791},
  year={2026}
}
Downloads last month
67
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train bytedance-research/Timer-S1

Paper for bytedance-research/Timer-S1