Nanbeige/Nanbeige4.1-3B
Text Generation • 4B • Updated
• 365k • • 924
datatrove for all things web-scale data preparation: https://github.com/huggingface/datatrovenanotron for lightweight 4D parallelism LLM training: https://github.com/huggingface/nanotronlighteval for in-training fast parallel LLM evaluations: https://github.com/huggingface/lighteval