Add comprehensive model card for dUltra
#1
by
nielsr HF Staff - opened
README.md
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
pipeline_tag: text-generation
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning
|
| 7 |
+
|
| 8 |
+
dUltra is an on-policy reinforcement learning framework based on Group Relative Policy Optimization (GRPO) that learns unmasking strategies for efficient parallel decoding in Masked Diffusion Language Models (MDLMs). By training an unmasking planner head, dUltra enables diffusion language models to achieve state-of-the-art performance in terms of accuracy and efficiency trade-offs.
|
| 9 |
+
|
| 10 |
+
- **Paper:** [dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning](https://huggingface.co/papers/2512.21446)
|
| 11 |
+
- **GitHub Repository:** [https://github.com/chinsengi/dUltra-os](https://github.com/chinsengi/dUltra-os)
|
| 12 |
+
|
| 13 |
+
## Model Description
|
| 14 |
+
|
| 15 |
+
Masked diffusion language models offer the potential for parallel token generation. dUltra introduces an unmasking planner head that predicts per-token unmasking likelihoods under independent Bernoulli distributions. The framework jointly optimizes the base diffusion LLM and the unmasking order planner using reward signals combining verifiable reward, distillation reward, and the number of unmasking steps. dUltra achieves superior accuracy-efficiency trade-offs across mathematical reasoning and code generation tasks.
|
| 16 |
+
|
| 17 |
+
## Usage
|
| 18 |
+
|
| 19 |
+
To use the dUltra model, you can load it with the `transformers` library. Note that `trust_remote_code=True` is required to load the custom model architecture.
|
| 20 |
+
|
| 21 |
+
```python
|
| 22 |
+
import torch
|
| 23 |
+
from model.llada.lladou import LLaDOUModelLM
|
| 24 |
+
from transformers import AutoTokenizer
|
| 25 |
+
|
| 26 |
+
model = LLaDOUModelLM.from_pretrained(
|
| 27 |
+
"sengi/dUltra-math",
|
| 28 |
+
trust_remote_code=True,
|
| 29 |
+
torch_dtype=torch.bfloat16,
|
| 30 |
+
)
|
| 31 |
+
tokenizer = AutoTokenizer.from_pretrained("sengi/dUltra-math")
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
+
## Citation
|
| 35 |
+
|
| 36 |
+
```bibtex
|
| 37 |
+
@misc{chen2025dultraultrafastdiffusionlanguage,
|
| 38 |
+
title={dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning},
|
| 39 |
+
author={Shirui Chen and Jiantao Jiao and Lillian J. Ratliff and Banghua Zhu},
|
| 40 |
+
year={2025},
|
| 41 |
+
eprint={2512.21446},
|
| 42 |
+
archivePrefix={arXiv},
|
| 43 |
+
primaryClass={cs.LG},
|
| 44 |
+
url={https://arxiv.org/abs/2512.21446},
|
| 45 |
+
}
|
| 46 |
+
```
|