SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model Paper • 2501.18636 • Published Jan 28, 2025 • 31
MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models Paper • 2505.22101 • Published May 28, 2025
GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning Paper • 2505.22661 • Published May 28, 2025 • 1
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Paper • 2504.10481 • Published Apr 14, 2025 • 85