DevQuasar/NovaSky-AI.Sky-T1-32B-Flash-GGUF Text Generation ⢠33B ⢠Updated Feb 21, 2025 ⢠9 ⢠1
view post Post 7270 š¢ New Research Alert: Making Language Models Smaller & Smarter!Thrilled to share the latest technical report demonstrating how to reduce language model parameters by 77% while maintaining performance. The secret? Grouped pointwise convolutions. Yes. We brought a method from computer vision to the transformers arena.š Key Findings:⢠77% parameter reduction.⢠Maintained model capabilities.⢠Improved generalization.Paper: https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORTCode: https://github.com/joaopauloschuler/less-parameters-llm See translation 2 replies Ā· š 19 19 š„ 8 8 𤯠3 3 š 2 2 š§ 1 1 + Reply
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper ⢠2501.15570 ⢠Published Jan 26, 2025 ⢠25
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer Paper ⢠2410.10812 ⢠Published Oct 14, 2024 ⢠18
Addition is All You Need for Energy-efficient Language Models Paper ⢠2410.00907 ⢠Published Oct 1, 2024 ⢠151
upstage/solar-pro-preview-instruct Text Generation ⢠22B ⢠Updated Sep 20, 2024 ⢠11.5k ⢠456
mattshumer/Reflection-Llama-3.1-70B Text Generation ⢠71B ⢠Updated Sep 24, 2024 ⢠244 ⢠1.71k