nvidia
/

Nemotron-Research-Reasoning-Qwen-1.5B

Text Generation

Transformers

Safetensors

English

qwen2

conversational

text-generation-inference

Model card Files Files and versions

xet

Community

shizhediao2

jianh-nvidia commited on Nov 21, 2025

Commit

147cf36

verified ·

1 Parent(s): 9050857

Update README.md (#8)

Browse files

- Update README.md (21c479517e333bbcb2e6ae0297b53d321b60bf59)

Co-authored-by: Jian Hu <jianh-nvidia@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +17 -3

README.md CHANGED Viewed

@@ -18,6 +18,7 @@ library_name: transformers
 ![Comparison between DeepSeek-R1-1.5B and Nemotron-Research-Reasoning-Qwen-1.5B](./assets/deepseek_vs_nvidia102.png)
 ## News
 - [2025-08-11] ProRL V2 blog post is released: [ProRL V2 - Prolonged Training Validates RL Scaling Laws](https://research.nvidia.com/labs/lpr/prorlv2/).
 - [2025-07-23] Nemotron-Research-Reasoning-Qwen-1.5B-v2 is released.
 - [2025-05-29] Nemotron-Research-Reasoning-Qwen-1.5B is released.
@@ -59,7 +60,8 @@ Table 1: Performance (pass@1) comparison for benchmarks across Math domain.
 | DeepScaleR-1.5B               | 40.21  | 31.46  | 73.04 | 89.36 | 41.57    | 51.63    | 54.54  |
 | *DeepSeek-R1-Distill-Qwen-7B* | 53.54  | 40.83  | 82.83 | 93.68 | 50.60    | 57.66    | 63.19  |
 | **Nemotron-Research-Reasoning-Qwen-1.5B**                 | 48.13 | 33.33 | 79.29 | 91.89 | 47.98 | 60.22 | 60.14 |
-| **Nemotron-Research-Reasoning-Qwen-1.5B-v2**                 | **49.58** | **36.04** | **82.53** | **92.49** | **49.03** | **60.44** | **61.69** |
 Table 2: Performance (pass@1) comparison across benchmarks for Code. We abbreviate benchmarks names for codecontests (cc), codeforces (cf), humanevalplus (human), and livecodebench (LCB).
 | Model                          | apps  | cc    | cf    | taco  | human | LCB   | Avg    |
@@ -68,7 +70,8 @@ Table 2: Performance (pass@1) comparison across benchmarks for Code. We abbrevia
 | DeepCoder-1.5B                | 30.37  | 23.76 | 21.70 | 13.76 | 73.40 | 22.76 | 30.96  |
 | *DeepSeek-R1-Distill-Qwen-7B* | 42.08  | 32.76 | 33.08 | 19.08 | 83.32 | 38.04 | 41.39  |
 | **Nemotron-Research-Reasoning-Qwen-1.5B**                 | 41.99 | 31.80 | 34.50 | 20.81 | 72.05 | 23.81 | 37.49 |
-| **Nemotron-Research-Reasoning-Qwen-1.5B-v2**                 | **46.39** | **35.59** | **40.75** | **22.89** | 72.89 | **27.69** | **41.03** |
 Table 3: Performance comparison on STEM reasoning (GPQA Diamond), instruction following (IFEval), and logic puzzles (Reasoning Gym) tasks. We also present results on OOD tasks: acre, boxnet, and game_of_life_halting (game).
 | Model                          | GPQA  | IFEval | Reasoning | acre  | boxnet | game  |
@@ -102,6 +105,10 @@ tokenizer = AutoTokenizer.from_pretrained("nvidia/Nemotron-Research-Reasoning-Qw
 model = AutoModelForCausalLM.from_pretrained("nvidia/Nemotron-Research-Reasoning-Qwen-1.5B", revision="v1")
 ```
 ## License/Terms of Use
 cc-by-nc-4.0
@@ -112,7 +119,7 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
 Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
 ## Citation
-If you find our dataset helpful, please cite the following [paper](https://arxiv.org/abs/2505.24864):
 ```
 @article{liu2025prorl,
@@ -124,4 +131,11 @@ If you find our dataset helpful, please cite the following [paper](https://arxiv
   primaryClass = {cs.CL},
   url={https://arxiv.org/abs/2505.24864},
 }
 ```

 ![Comparison between DeepSeek-R1-1.5B and Nemotron-Research-Reasoning-Qwen-1.5B](./assets/deepseek_vs_nvidia102.png)
 ## News
+- [2025-11-20] [Nemotron-Research-Reasoning-Qwen-1.5B-BroRL](https://huggingface.co/nvidia/Nemotron-Research-Reasoning-Qwen-1.5B/tree/brorl) is released.
 - [2025-08-11] ProRL V2 blog post is released: [ProRL V2 - Prolonged Training Validates RL Scaling Laws](https://research.nvidia.com/labs/lpr/prorlv2/).
 - [2025-07-23] Nemotron-Research-Reasoning-Qwen-1.5B-v2 is released.
 - [2025-05-29] Nemotron-Research-Reasoning-Qwen-1.5B is released.
 | DeepScaleR-1.5B               | 40.21  | 31.46  | 73.04 | 89.36 | 41.57    | 51.63    | 54.54  |
 | *DeepSeek-R1-Distill-Qwen-7B* | 53.54  | 40.83  | 82.83 | 93.68 | 50.60    | 57.66    | 63.19  |
 | **Nemotron-Research-Reasoning-Qwen-1.5B**                 | 48.13 | 33.33 | 79.29 | 91.89 | 47.98 | 60.22 | 60.14 |
+| **Nemotron-Research-Reasoning-Qwen-1.5B-v2**                 | 49.58 | **36.04** | 82.53 | **92.49** | **49.03** | 60.44 | 61.69 |
+| **Nemotron-Research-Reasoning-Qwen-1.5B-BroRL**                 | **60.42** | 35.63 | **83.06** | 92.20 | 48.58 | **62.11** | **63.66** |
 Table 2: Performance (pass@1) comparison across benchmarks for Code. We abbreviate benchmarks names for codecontests (cc), codeforces (cf), humanevalplus (human), and livecodebench (LCB).
 | Model                          | apps  | cc    | cf    | taco  | human | LCB   | Avg    |
 | DeepCoder-1.5B                | 30.37  | 23.76 | 21.70 | 13.76 | 73.40 | 22.76 | 30.96  |
 | *DeepSeek-R1-Distill-Qwen-7B* | 42.08  | 32.76 | 33.08 | 19.08 | 83.32 | 38.04 | 41.39  |
 | **Nemotron-Research-Reasoning-Qwen-1.5B**                 | 41.99 | 31.80 | 34.50 | 20.81 | 72.05 | 23.81 | 37.49 |
+| **Nemotron-Research-Reasoning-Qwen-1.5B-v2**                 | 46.39 | 35.59 | 40.75 | 22.89 | 72.89 | **27.69** | **41.03** |
+| **Nemotron-Research-Reasoning-Qwen-1.5B-BroRL**                 | **50.61** | **38.71** | **45.88** | **25.90** | - | - | - |
 Table 3: Performance comparison on STEM reasoning (GPQA Diamond), instruction following (IFEval), and logic puzzles (Reasoning Gym) tasks. We also present results on OOD tasks: acre, boxnet, and game_of_life_halting (game).
 | Model                          | GPQA  | IFEval | Reasoning | acre  | boxnet | game  |
 model = AutoModelForCausalLM.from_pretrained("nvidia/Nemotron-Research-Reasoning-Qwen-1.5B", revision="v1")
 ```
+## BroRL
+In BroRL, we continued training for 419 steps based on a nearly fully trained ProRLv2 checkpoint, increasing the number of samples per prompt from 16 to 512. We found that the improvement of BroRL over ProRLv2 was greater than that of ProRLv2 over ProRLv1.
+Link to [BroRL 419 steps checkpoint]((https://huggingface.co/nvidia/Nemotron-Research-Reasoning-Qwen-1.5B/tree/brorl))
 ## License/Terms of Use
 cc-by-nc-4.0
 Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
 ## Citation
+If you find our dataset helpful, please cite the following [ProRL paper](https://arxiv.org/abs/2505.24864) and [BroRL paper](https://arxiv.org/abs/2510.01180):
 ```
 @article{liu2025prorl,
   primaryClass = {cs.CL},
   url={https://arxiv.org/abs/2505.24864},
 }
+@article{hu2025brorl,
+  title={Brorl: Scaling reinforcement learning via broadened exploration},
+  author={Hu, Jian and Liu, Mingjie and Lu, Ximing and Wu, Fang and Harchaoui, Zaid and Diao, Shizhe and Choi, Yejin and Molchanov, Pavlo and Yang, Jun and Kautz, Jan and others},
+  journal={arXiv preprint arXiv:2510.01180},
+  year={2025}
+}
 ```