OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models Paper • 2306.02272 • Published Jun 4, 2023
INSTA-BNN: Binary Neural Network with INSTAnce-aware Threshold Paper • 2204.07439 • Published Apr 15, 2022
AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models Paper • 2509.12019 • Published Sep 15, 2025 • 2