From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents Paper • 2603.22386 • Published 11 days ago • 54
SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search Paper • 2512.23167 • Published Dec 29, 2025 • 1
view article Article OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments +3 Feb 12 • 31
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 Feb 4 • 88
Enterprise Agents and Benchmarks Collection Enterprise agent ecosystem featuring AssetOpsBench (industrial) and ITBench (SRE, FinOps, CISO), CUGA to accelerate AI Automation • 14 items • Updated 10 days ago • 15
view article Article AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality Jan 21 • 31
AI-Agent-4-Industry-4.0 Collection This category highlights the collective efforts of the AI Automation team in advancing Industry 4.0 applications and exploring innovations beyond it. • 6 items • Updated Oct 8, 2025 • 7
Granite Docling Models Collection Models for parsing complex PDFs and structured documents, designed to complement Docling. • 4 items • Updated 2 days ago • 60
Granite Time Series Models Collection Time series models for forecasting, anomaly detection, classification, and more. • 9 items • Updated 2 days ago • 47
Granite Embedding Models Collection Embedding models (bi‑encoders and rerankers) for RAG, semantic search, and retrieval tasks. • 7 items • Updated 2 days ago • 33
Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies Paper • 2502.02533 • Published Feb 4, 2025 • 4
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance Paper • 2506.03828 • Published Jun 4, 2025 • 19
Compound AI Systems Optimization: A Survey of Methods, Challenges, and Future Directions Paper • 2506.08234 • Published Jun 9, 2025 • 9
SmartPilot: A Multiagent CoPilot for Adaptive and Intelligent Manufacturing Paper • 2505.06492 • Published May 10, 2025 • 2
FailureSensorIQ: A Multi-Choice QA Dataset for Understanding Sensor Relationships and Failure Modes Paper • 2506.03278 • Published Jun 3, 2025 • 7