OpenSec GDPO 4B (Phase 2)

A 4B-parameter LLM security agent fine-tuned with GDPO (Group reward-Decoupled normalization Policy Optimization) for the OpenSec dual-control environment.

Model Details

  • Base model: Qwen3-4B-Instruct-2507
  • Training framework: Slime (Megatron + SGLang async on-policy RL)
  • Algorithm: GRPO with GDPO per-axis reward normalization
  • Training epochs: 8 (best checkpoint at epoch 4)
  • Hardware: 2x NVIDIA H100 PCIe (80GB)
  • Attacker: GPT-5.2 replay cache (deterministic, 2,263+ cached decisions)

Training Configuration

Parameter Value
Temperature 0.6
Beta (KL coef) 0.06 -> 0.04 (linear decay)
Samples per prompt 8
Clean mixing ratio 0.5 (ep0-3), 0.3 (ep4-7)
Efficiency scale 0.0 (ep0-1), 0.5 (ep2+)
Training seeds 160
Eval seeds 40 (standard tier)

Evaluation Results (ep4 checkpoint, 40 standard eval seeds)

Metric Baseline (Qwen3-4B) Trained Delta
EGAR (Evidence-Gated Action Rate) 0.708 0.721 +0.013
False Positive Rate 0.675 0.750 +0.075
Containment Executed Rate 0.975 1.000 +0.025
Report Submitted Rate 1.000 1.000 0.000
Blast Radius 0.525 0.483 -0.042
TTFC (Time to First Containment) 2.900 3.125 +0.225
Injection Violation Rate 0.325 0.300 -0.025
Mean Reward 2.720 3.238 +0.518

Reward Axes

Training uses 5 reward axes with per-axis GDPO normalization:

  • Attribution: Correct entity identification in reports
  • Containment: +1.0 per correct action, -1.0 per false positive (uncapped)
  • Gating: -1.5 per containment action without prior trusted evidence
  • Efficiency: -0.1 per step (scaled by epoch)
  • Report: +3.0 for correct submission, -3.0 for malformed/missing

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "Jarrodbarnes/opensec-gdpo-4b",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Jarrodbarnes/opensec-gdpo-4b")

For evaluation within the OpenSec environment:

python scripts/eval.py --model Jarrodbarnes/opensec-gdpo-4b --seeds standard-40

Limitations

  • EGAR improvement is modest (+0.013) and not statistically significant (95% CI: [-0.067, +0.100])
  • FP rate increased (+0.075), indicating the model learned to always execute containment rather than improving discrimination
  • Training with GDPO per-axis normalization showed signal starvation at low temperatures; future work should evaluate standard GRPO normalization

Citation

@misc{opensec2026,
  title={OpenSec: A Dual-Control RL Environment for Evaluating LLM Security Agents},
  author={Barnes, Jarrod},
  year={2026},
  url={https://github.com/jarrodbarnes/opensec-env}
}
Downloads last month
12
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
Input a message to start chatting with Jarrodbarnes/opensec-gdpo-4b.

Model tree for Jarrodbarnes/opensec-gdpo-4b

Finetuned
(1653)
this model
Quantizations
1 model

Dataset used to train Jarrodbarnes/opensec-gdpo-4b

Collection including Jarrodbarnes/opensec-gdpo-4b