DIAL Checkpoints

Project Page | Paper | Code

Model weights for DIAL (Decoupling Intent and Action via Latent World Modeling), an end-to-end Vision-Language-Action (VLA) framework built on NVIDIA Isaac GR00T N1.5 with a Qwen2.5-VL-3B-Instruct backbone.

Available Checkpoints

Checkpoint	Training Data	Steps	Description
`DIAL-3B-fewshot`	EgoDex human data + 10% GR1 simulation data	20K per stage (3-stage)	Co-trained with heterogeneous human demonstrations
`DIAL-3B-fulldata`	All GR1 simulation data (~24,000 demos)	40K per stage (2-stage)	Trained on full teleoperation trajectories in simulation

For installation, training, and evaluation instructions, please refer to the GitHub repository.

Downloads last month: 9

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

xpeng-robotics
/

DIAL_checkpoints

DIAL Checkpoints

Available Checkpoints

Dataset used to train xpeng-robotics/DIAL_checkpoints