Enterprise Agents and Benchmarks

ibm-research 's Collections

Time Series Models

Table + Text IR Evaluation

REAL-MM-RAG-Bench_BEIR

AI-Agent-4-Industry-4.0

Otter-Knowledge

REAL-MM-RAG-Bench

Granite 3.2 Models (GGUF)

Materials

updated 21 days ago

Enterprise agent ecosystem featuring AssetOpsBench (industrial) and ITBench (SRE, FinOps, CISO), CUGA to accelerate AI Automation

Upvote

Running

19

AssetOpsBench

🚀

19

Generate and benchmark machine learning models with ease
Running

Featured

99

CUGA Agent

🤖

99

Configurable Generalist Agent, leader in AppWorld Benchmark
Running

7

ITBench-Lite-Space

🚀

7

Develop and run interactive code notebooks with JupyterLab
Running

18

VAKRA Leaderboard

🏆

18

Evaluate AI agents on multi‑hop, multi‑source enterprise tasks
ibm-research/AssetOpsBench

Viewer • Updated 23 days ago • 467 • 697 • 24
ibm-research/ITBench-Lite

Updated 9 days ago • 1.13k • 5
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance

Paper • 2506.03828 • Published Jun 4, 2025 • 20
ibm-research/ITBench-Trajectories

Updated Jan 19 • 254 • 3
From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

Paper • 2603.22386 • Published Mar 23 • 57
ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks

Paper • 2502.05352 • Published Feb 7, 2025 • 2
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20, 2025 • 97
SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search

Paper • 2512.23167 • Published Dec 29, 2025 • 1
ibm-research/VAKRA

Viewer • Updated 29 days ago • 1.33k • 1.09k • 43
General Agent Evaluation

Paper • 2602.22953 • Published Feb 26 • 11
ibm-research/ScarfBench

Updated 21 days ago • 986 • 6
Sleeping

1

ScarfBench

🐠

1

Java framework migration

Upvote

Enterprise Agents and Benchmarks

AssetOpsBench

CUGA Agent

ITBench-Lite-Space

VAKRA Leaderboard

ScarfBench