Safetensors
English

DIAL Checkpoints

Project Page  |  Paper  |  Code

Model weights for DIAL (Decoupling Intent and Action via Latent World Modeling), an end-to-end Vision-Language-Action (VLA) framework built on NVIDIA Isaac GR00T N1.5 with a Qwen2.5-VL-3B-Instruct backbone.

Available Checkpoints

Checkpoint Training Data Steps Description
DIAL-3B-fewshot EgoDex human data + 10% GR1 simulation data 20K per stage (3-stage) Co-trained with heterogeneous human demonstrations
DIAL-3B-fulldata All GR1 simulation data (~24,000 demos) 40K per stage (2-stage) Trained on full teleoperation trajectories in simulation

For installation, training, and evaluation instructions, please refer to the GitHub repository.

Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train xpeng-robotics/DIAL_checkpoints