DIAL Checkpoints
Project Page | Paper | Code
Model weights for DIAL (Decoupling Intent and Action via Latent World Modeling), an end-to-end Vision-Language-Action (VLA) framework built on NVIDIA Isaac GR00T N1.5 with a Qwen2.5-VL-3B-Instruct backbone.
Available Checkpoints
| Checkpoint | Training Data | Steps | Description |
|---|---|---|---|
DIAL-3B-fewshot |
EgoDex human data + 10% GR1 simulation data | 20K per stage (3-stage) | Co-trained with heterogeneous human demonstrations |
DIAL-3B-fulldata |
All GR1 simulation data (~24,000 demos) | 40K per stage (2-stage) | Trained on full teleoperation trajectories in simulation |
For installation, training, and evaluation instructions, please refer to the GitHub repository.
- Downloads last month
- 9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support