adamkarvonen
/

saebench_pythia-160m-deduped_width-2pow14_date-0108

Feature Extraction

Model card Files Files and versions

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

This repository contains models described in the paper SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability. SAEBench is a comprehensive evaluation suite that measures SAE performance across seven diverse metrics, spanning interpretability, feature disentanglement and practical applications like unlearning.

Project Page: https://saebench.xyz
Code: https://github.com/adamkarvonen/SAEBench

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for adamkarvonen/saebench_pythia-160m-deduped_width-2pow14_date-0108

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

Paper • 2503.09532 • Published Mar 12, 2025