Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published Jan 17 • 32
Temporal Reasoning with Large Language Models Augmented by Evolving Knowledge Graphs Paper • 2509.15464 • Published Sep 18, 2025
How Much Reasoning Do Retrieval-Augmented Models Add beyond LLMs? A Benchmarking Framework for Multi-Hop Inference over Hybrid Knowledge Paper • 2602.10210 • Published 10 days ago • 1
How Much Reasoning Do Retrieval-Augmented Models Add beyond LLMs? A Benchmarking Framework for Multi-Hop Inference over Hybrid Knowledge Paper • 2602.10210 • Published 10 days ago • 1
How Much Reasoning Do Retrieval-Augmented Models Add beyond LLMs? A Benchmarking Framework for Multi-Hop Inference over Hybrid Knowledge Paper • 2602.10210 • Published 10 days ago • 1
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection Paper • 2505.03793 • Published May 1, 2025 • 1
Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning Paper • 2505.16122 • Published May 22, 2025 • 5
Reasoning of Large Language Models over Knowledge Graphs with Super-Relations Paper • 2503.22166 • Published Mar 28, 2025 • 3
When Heterophily Meets Heterogeneity: New Graph Benchmarks and Effective Methods Paper • 2407.10916 • Published Jul 15, 2024 • 1
Reasoning of Large Language Models over Knowledge Graphs with Super-Relations Paper • 2503.22166 • Published Mar 28, 2025 • 3
Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning Paper • 2505.16122 • Published May 22, 2025 • 5 • 2