Catwalk: A Unified Language Model Evaluation Framework for Many Datasets Paper • 2312.10253 • Published Dec 15, 2023 • 8