RetMask Collection Trained checkpoints for the paper "From Interpretability to Performance: Optimizing Retrieval Heads for Long-Context Language Models" • 4 items • Updated 23 days ago
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5 Text Generation • 8B • Updated Jun 25, 2025 • 1.86k • • 19
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.5 Text Generation • 8B • Updated Jun 25, 2025 • 1.86k • • 19
tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4 Text Generation • 71B • Updated Jul 1, 2025 • 215 • • 13
tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3 Text Generation • 71B • Updated Apr 2, 2025 • 460 • • 13
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3 Text Generation • 8B • Updated Apr 2, 2025 • 3.76k • • 24
tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.2 Text Generation • 8B • Updated Apr 2, 2025 • 117 • • 16