shizhediao2 jianh-nvidia commited on
Commit
147cf36
·
verified ·
1 Parent(s): 9050857

Update README.md (#8)

Browse files

- Update README.md (21c479517e333bbcb2e6ae0297b53d321b60bf59)


Co-authored-by: Jian Hu <jianh-nvidia@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +17 -3
README.md CHANGED
@@ -18,6 +18,7 @@ library_name: transformers
18
  ![Comparison between DeepSeek-R1-1.5B and Nemotron-Research-Reasoning-Qwen-1.5B](./assets/deepseek_vs_nvidia102.png)
19
 
20
  ## News
 
21
  - [2025-08-11] ProRL V2 blog post is released: [ProRL V2 - Prolonged Training Validates RL Scaling Laws](https://research.nvidia.com/labs/lpr/prorlv2/).
22
  - [2025-07-23] Nemotron-Research-Reasoning-Qwen-1.5B-v2 is released.
23
  - [2025-05-29] Nemotron-Research-Reasoning-Qwen-1.5B is released.
@@ -59,7 +60,8 @@ Table 1: Performance (pass@1) comparison for benchmarks across Math domain.
59
  | DeepScaleR-1.5B | 40.21 | 31.46 | 73.04 | 89.36 | 41.57 | 51.63 | 54.54 |
60
  | *DeepSeek-R1-Distill-Qwen-7B* | 53.54 | 40.83 | 82.83 | 93.68 | 50.60 | 57.66 | 63.19 |
61
  | **Nemotron-Research-Reasoning-Qwen-1.5B** | 48.13 | 33.33 | 79.29 | 91.89 | 47.98 | 60.22 | 60.14 |
62
- | **Nemotron-Research-Reasoning-Qwen-1.5B-v2** | **49.58** | **36.04** | **82.53** | **92.49** | **49.03** | **60.44** | **61.69** |
 
63
 
64
  Table 2: Performance (pass@1) comparison across benchmarks for Code. We abbreviate benchmarks names for codecontests (cc), codeforces (cf), humanevalplus (human), and livecodebench (LCB).
65
  | Model | apps | cc | cf | taco | human | LCB | Avg |
@@ -68,7 +70,8 @@ Table 2: Performance (pass@1) comparison across benchmarks for Code. We abbrevia
68
  | DeepCoder-1.5B | 30.37 | 23.76 | 21.70 | 13.76 | 73.40 | 22.76 | 30.96 |
69
  | *DeepSeek-R1-Distill-Qwen-7B* | 42.08 | 32.76 | 33.08 | 19.08 | 83.32 | 38.04 | 41.39 |
70
  | **Nemotron-Research-Reasoning-Qwen-1.5B** | 41.99 | 31.80 | 34.50 | 20.81 | 72.05 | 23.81 | 37.49 |
71
- | **Nemotron-Research-Reasoning-Qwen-1.5B-v2** | **46.39** | **35.59** | **40.75** | **22.89** | 72.89 | **27.69** | **41.03** |
 
72
 
73
  Table 3: Performance comparison on STEM reasoning (GPQA Diamond), instruction following (IFEval), and logic puzzles (Reasoning Gym) tasks. We also present results on OOD tasks: acre, boxnet, and game_of_life_halting (game).
74
  | Model | GPQA | IFEval | Reasoning | acre | boxnet | game |
@@ -102,6 +105,10 @@ tokenizer = AutoTokenizer.from_pretrained("nvidia/Nemotron-Research-Reasoning-Qw
102
  model = AutoModelForCausalLM.from_pretrained("nvidia/Nemotron-Research-Reasoning-Qwen-1.5B", revision="v1")
103
  ```
104
 
 
 
 
 
105
 
106
  ## License/Terms of Use
107
  cc-by-nc-4.0
@@ -112,7 +119,7 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
112
  Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
113
 
114
  ## Citation
115
- If you find our dataset helpful, please cite the following [paper](https://arxiv.org/abs/2505.24864):
116
 
117
  ```
118
  @article{liu2025prorl,
@@ -124,4 +131,11 @@ If you find our dataset helpful, please cite the following [paper](https://arxiv
124
  primaryClass = {cs.CL},
125
  url={https://arxiv.org/abs/2505.24864},
126
  }
 
 
 
 
 
 
 
127
  ```
 
18
  ![Comparison between DeepSeek-R1-1.5B and Nemotron-Research-Reasoning-Qwen-1.5B](./assets/deepseek_vs_nvidia102.png)
19
 
20
  ## News
21
+ - [2025-11-20] [Nemotron-Research-Reasoning-Qwen-1.5B-BroRL](https://huggingface.co/nvidia/Nemotron-Research-Reasoning-Qwen-1.5B/tree/brorl) is released.
22
  - [2025-08-11] ProRL V2 blog post is released: [ProRL V2 - Prolonged Training Validates RL Scaling Laws](https://research.nvidia.com/labs/lpr/prorlv2/).
23
  - [2025-07-23] Nemotron-Research-Reasoning-Qwen-1.5B-v2 is released.
24
  - [2025-05-29] Nemotron-Research-Reasoning-Qwen-1.5B is released.
 
60
  | DeepScaleR-1.5B | 40.21 | 31.46 | 73.04 | 89.36 | 41.57 | 51.63 | 54.54 |
61
  | *DeepSeek-R1-Distill-Qwen-7B* | 53.54 | 40.83 | 82.83 | 93.68 | 50.60 | 57.66 | 63.19 |
62
  | **Nemotron-Research-Reasoning-Qwen-1.5B** | 48.13 | 33.33 | 79.29 | 91.89 | 47.98 | 60.22 | 60.14 |
63
+ | **Nemotron-Research-Reasoning-Qwen-1.5B-v2** | 49.58 | **36.04** | 82.53 | **92.49** | **49.03** | 60.44 | 61.69 |
64
+ | **Nemotron-Research-Reasoning-Qwen-1.5B-BroRL** | **60.42** | 35.63 | **83.06** | 92.20 | 48.58 | **62.11** | **63.66** |
65
 
66
  Table 2: Performance (pass@1) comparison across benchmarks for Code. We abbreviate benchmarks names for codecontests (cc), codeforces (cf), humanevalplus (human), and livecodebench (LCB).
67
  | Model | apps | cc | cf | taco | human | LCB | Avg |
 
70
  | DeepCoder-1.5B | 30.37 | 23.76 | 21.70 | 13.76 | 73.40 | 22.76 | 30.96 |
71
  | *DeepSeek-R1-Distill-Qwen-7B* | 42.08 | 32.76 | 33.08 | 19.08 | 83.32 | 38.04 | 41.39 |
72
  | **Nemotron-Research-Reasoning-Qwen-1.5B** | 41.99 | 31.80 | 34.50 | 20.81 | 72.05 | 23.81 | 37.49 |
73
+ | **Nemotron-Research-Reasoning-Qwen-1.5B-v2** | 46.39 | 35.59 | 40.75 | 22.89 | 72.89 | **27.69** | **41.03** |
74
+ | **Nemotron-Research-Reasoning-Qwen-1.5B-BroRL** | **50.61** | **38.71** | **45.88** | **25.90** | - | - | - |
75
 
76
  Table 3: Performance comparison on STEM reasoning (GPQA Diamond), instruction following (IFEval), and logic puzzles (Reasoning Gym) tasks. We also present results on OOD tasks: acre, boxnet, and game_of_life_halting (game).
77
  | Model | GPQA | IFEval | Reasoning | acre | boxnet | game |
 
105
  model = AutoModelForCausalLM.from_pretrained("nvidia/Nemotron-Research-Reasoning-Qwen-1.5B", revision="v1")
106
  ```
107
 
108
+ ## BroRL
109
+ In BroRL, we continued training for 419 steps based on a nearly fully trained ProRLv2 checkpoint, increasing the number of samples per prompt from 16 to 512. We found that the improvement of BroRL over ProRLv2 was greater than that of ProRLv2 over ProRLv1.
110
+
111
+ Link to [BroRL 419 steps checkpoint]((https://huggingface.co/nvidia/Nemotron-Research-Reasoning-Qwen-1.5B/tree/brorl))
112
 
113
  ## License/Terms of Use
114
  cc-by-nc-4.0
 
119
  Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
120
 
121
  ## Citation
122
+ If you find our dataset helpful, please cite the following [ProRL paper](https://arxiv.org/abs/2505.24864) and [BroRL paper](https://arxiv.org/abs/2510.01180):
123
 
124
  ```
125
  @article{liu2025prorl,
 
131
  primaryClass = {cs.CL},
132
  url={https://arxiv.org/abs/2505.24864},
133
  }
134
+
135
+ @article{hu2025brorl,
136
+ title={Brorl: Scaling reinforcement learning via broadened exploration},
137
+ author={Hu, Jian and Liu, Mingjie and Lu, Ximing and Wu, Fang and Harchaoui, Zaid and Diao, Shizhe and Choi, Yejin and Molchanov, Pavlo and Yang, Jun and Kautz, Jan and others},
138
+ journal={arXiv preprint arXiv:2510.01180},
139
+ year={2025}
140
+ }
141
  ```