Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
THU-KEG 's Collections
CaRR & C-GRPO
WildReward
LLaDA-8B-BGPO
DeepPrune
SIRI
VerIF
AdaptThink
LongWriter-V
OpenSAE-LLaMA-3.1-8B
Crab
ADELIE

CaRR & C-GRPO

updated Mar 25

Data and models for the paper "Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards".

Upvote
1

  • THU-KEG/CaRR-DeepDive

    Preview • Updated Mar 25 • 393 • 1

  • THU-KEG/DeepDive-4B-SFT

    4B • Updated Mar 25 • 10

  • THU-KEG/DeepDive-4B-C-GRPO

    4B • Updated Mar 25 • 3

  • THU-KEG/DeepDive-30B-A3B-SFT

    31B • Updated Mar 25 • 4

  • THU-KEG/DeepDive-30B-A3B-C-GRPO

    31B • Updated Mar 25 • 1

  • Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards

    Paper • 2601.06021 • Published Jan 9 • 48
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs