Update README.md (#8)
Browse files- Update README.md (21c479517e333bbcb2e6ae0297b53d321b60bf59)
Co-authored-by: Jian Hu <jianh-nvidia@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -18,6 +18,7 @@ library_name: transformers
|
|
| 18 |

|
| 19 |
|
| 20 |
## News
|
|
|
|
| 21 |
- [2025-08-11] ProRL V2 blog post is released: [ProRL V2 - Prolonged Training Validates RL Scaling Laws](https://research.nvidia.com/labs/lpr/prorlv2/).
|
| 22 |
- [2025-07-23] Nemotron-Research-Reasoning-Qwen-1.5B-v2 is released.
|
| 23 |
- [2025-05-29] Nemotron-Research-Reasoning-Qwen-1.5B is released.
|
|
@@ -59,7 +60,8 @@ Table 1: Performance (pass@1) comparison for benchmarks across Math domain.
|
|
| 59 |
| DeepScaleR-1.5B | 40.21 | 31.46 | 73.04 | 89.36 | 41.57 | 51.63 | 54.54 |
|
| 60 |
| *DeepSeek-R1-Distill-Qwen-7B* | 53.54 | 40.83 | 82.83 | 93.68 | 50.60 | 57.66 | 63.19 |
|
| 61 |
| **Nemotron-Research-Reasoning-Qwen-1.5B** | 48.13 | 33.33 | 79.29 | 91.89 | 47.98 | 60.22 | 60.14 |
|
| 62 |
-
| **Nemotron-Research-Reasoning-Qwen-1.5B-v2** |
|
|
|
|
| 63 |
|
| 64 |
Table 2: Performance (pass@1) comparison across benchmarks for Code. We abbreviate benchmarks names for codecontests (cc), codeforces (cf), humanevalplus (human), and livecodebench (LCB).
|
| 65 |
| Model | apps | cc | cf | taco | human | LCB | Avg |
|
|
@@ -68,7 +70,8 @@ Table 2: Performance (pass@1) comparison across benchmarks for Code. We abbrevia
|
|
| 68 |
| DeepCoder-1.5B | 30.37 | 23.76 | 21.70 | 13.76 | 73.40 | 22.76 | 30.96 |
|
| 69 |
| *DeepSeek-R1-Distill-Qwen-7B* | 42.08 | 32.76 | 33.08 | 19.08 | 83.32 | 38.04 | 41.39 |
|
| 70 |
| **Nemotron-Research-Reasoning-Qwen-1.5B** | 41.99 | 31.80 | 34.50 | 20.81 | 72.05 | 23.81 | 37.49 |
|
| 71 |
-
| **Nemotron-Research-Reasoning-Qwen-1.5B-v2** |
|
|
|
|
| 72 |
|
| 73 |
Table 3: Performance comparison on STEM reasoning (GPQA Diamond), instruction following (IFEval), and logic puzzles (Reasoning Gym) tasks. We also present results on OOD tasks: acre, boxnet, and game_of_life_halting (game).
|
| 74 |
| Model | GPQA | IFEval | Reasoning | acre | boxnet | game |
|
|
@@ -102,6 +105,10 @@ tokenizer = AutoTokenizer.from_pretrained("nvidia/Nemotron-Research-Reasoning-Qw
|
|
| 102 |
model = AutoModelForCausalLM.from_pretrained("nvidia/Nemotron-Research-Reasoning-Qwen-1.5B", revision="v1")
|
| 103 |
```
|
| 104 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
|
| 106 |
## License/Terms of Use
|
| 107 |
cc-by-nc-4.0
|
|
@@ -112,7 +119,7 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
|
|
| 112 |
Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
|
| 113 |
|
| 114 |
## Citation
|
| 115 |
-
If you find our dataset helpful, please cite the following [paper](https://arxiv.org/abs/2505.24864):
|
| 116 |
|
| 117 |
```
|
| 118 |
@article{liu2025prorl,
|
|
@@ -124,4 +131,11 @@ If you find our dataset helpful, please cite the following [paper](https://arxiv
|
|
| 124 |
primaryClass = {cs.CL},
|
| 125 |
url={https://arxiv.org/abs/2505.24864},
|
| 126 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 127 |
```
|
|
|
|
| 18 |

|
| 19 |
|
| 20 |
## News
|
| 21 |
+
- [2025-11-20] [Nemotron-Research-Reasoning-Qwen-1.5B-BroRL](https://huggingface.co/nvidia/Nemotron-Research-Reasoning-Qwen-1.5B/tree/brorl) is released.
|
| 22 |
- [2025-08-11] ProRL V2 blog post is released: [ProRL V2 - Prolonged Training Validates RL Scaling Laws](https://research.nvidia.com/labs/lpr/prorlv2/).
|
| 23 |
- [2025-07-23] Nemotron-Research-Reasoning-Qwen-1.5B-v2 is released.
|
| 24 |
- [2025-05-29] Nemotron-Research-Reasoning-Qwen-1.5B is released.
|
|
|
|
| 60 |
| DeepScaleR-1.5B | 40.21 | 31.46 | 73.04 | 89.36 | 41.57 | 51.63 | 54.54 |
|
| 61 |
| *DeepSeek-R1-Distill-Qwen-7B* | 53.54 | 40.83 | 82.83 | 93.68 | 50.60 | 57.66 | 63.19 |
|
| 62 |
| **Nemotron-Research-Reasoning-Qwen-1.5B** | 48.13 | 33.33 | 79.29 | 91.89 | 47.98 | 60.22 | 60.14 |
|
| 63 |
+
| **Nemotron-Research-Reasoning-Qwen-1.5B-v2** | 49.58 | **36.04** | 82.53 | **92.49** | **49.03** | 60.44 | 61.69 |
|
| 64 |
+
| **Nemotron-Research-Reasoning-Qwen-1.5B-BroRL** | **60.42** | 35.63 | **83.06** | 92.20 | 48.58 | **62.11** | **63.66** |
|
| 65 |
|
| 66 |
Table 2: Performance (pass@1) comparison across benchmarks for Code. We abbreviate benchmarks names for codecontests (cc), codeforces (cf), humanevalplus (human), and livecodebench (LCB).
|
| 67 |
| Model | apps | cc | cf | taco | human | LCB | Avg |
|
|
|
|
| 70 |
| DeepCoder-1.5B | 30.37 | 23.76 | 21.70 | 13.76 | 73.40 | 22.76 | 30.96 |
|
| 71 |
| *DeepSeek-R1-Distill-Qwen-7B* | 42.08 | 32.76 | 33.08 | 19.08 | 83.32 | 38.04 | 41.39 |
|
| 72 |
| **Nemotron-Research-Reasoning-Qwen-1.5B** | 41.99 | 31.80 | 34.50 | 20.81 | 72.05 | 23.81 | 37.49 |
|
| 73 |
+
| **Nemotron-Research-Reasoning-Qwen-1.5B-v2** | 46.39 | 35.59 | 40.75 | 22.89 | 72.89 | **27.69** | **41.03** |
|
| 74 |
+
| **Nemotron-Research-Reasoning-Qwen-1.5B-BroRL** | **50.61** | **38.71** | **45.88** | **25.90** | - | - | - |
|
| 75 |
|
| 76 |
Table 3: Performance comparison on STEM reasoning (GPQA Diamond), instruction following (IFEval), and logic puzzles (Reasoning Gym) tasks. We also present results on OOD tasks: acre, boxnet, and game_of_life_halting (game).
|
| 77 |
| Model | GPQA | IFEval | Reasoning | acre | boxnet | game |
|
|
|
|
| 105 |
model = AutoModelForCausalLM.from_pretrained("nvidia/Nemotron-Research-Reasoning-Qwen-1.5B", revision="v1")
|
| 106 |
```
|
| 107 |
|
| 108 |
+
## BroRL
|
| 109 |
+
In BroRL, we continued training for 419 steps based on a nearly fully trained ProRLv2 checkpoint, increasing the number of samples per prompt from 16 to 512. We found that the improvement of BroRL over ProRLv2 was greater than that of ProRLv2 over ProRLv1.
|
| 110 |
+
|
| 111 |
+
Link to [BroRL 419 steps checkpoint]((https://huggingface.co/nvidia/Nemotron-Research-Reasoning-Qwen-1.5B/tree/brorl))
|
| 112 |
|
| 113 |
## License/Terms of Use
|
| 114 |
cc-by-nc-4.0
|
|
|
|
| 119 |
Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
|
| 120 |
|
| 121 |
## Citation
|
| 122 |
+
If you find our dataset helpful, please cite the following [ProRL paper](https://arxiv.org/abs/2505.24864) and [BroRL paper](https://arxiv.org/abs/2510.01180):
|
| 123 |
|
| 124 |
```
|
| 125 |
@article{liu2025prorl,
|
|
|
|
| 131 |
primaryClass = {cs.CL},
|
| 132 |
url={https://arxiv.org/abs/2505.24864},
|
| 133 |
}
|
| 134 |
+
|
| 135 |
+
@article{hu2025brorl,
|
| 136 |
+
title={Brorl: Scaling reinforcement learning via broadened exploration},
|
| 137 |
+
author={Hu, Jian and Liu, Mingjie and Lu, Ximing and Wu, Fang and Harchaoui, Zaid and Diao, Shizhe and Choi, Yejin and Molchanov, Pavlo and Yang, Jun and Kautz, Jan and others},
|
| 138 |
+
journal={arXiv preprint arXiv:2510.01180},
|
| 139 |
+
year={2025}
|
| 140 |
+
}
|
| 141 |
```
|