Stefan Schweter's picture

In a Training Loop 🔄

Stefan Schweter PRO

stefan-it

·

https://schweter.bayern

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨

Recent Activity

liked a dataset 5 days ago

allenai/Dolci-Think-SFT-7B

upvoted a collection 5 days ago

upvoted a paper 7 days ago

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

View all activity

Organizations

upvoted a collection 5 days ago

fiNERweb

A multilingual dataset for NER covering 91 langauges and 25 scripts • 3 items • Updated Dec 16, 2025 • 3

upvoted a paper 7 days ago

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

Paper • 2603.19223 • Published 11 days ago • 30

upvoted 2 collections 7 days ago

Nemotron-Post-Training-v3

Collection of datasets used in the post-training phase of Nemotron Nano and Super v3. • 28 items • Updated 6 days ago • 104

Nemotron-Cascade 2

Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation • 4 items • Updated 6 days ago • 41

upvoted a changelog 10 days ago

Hugging Face Changelog

Protected Spaces with Public URLs

10 days ago

• 101

upvoted a collection 13 days ago

Olmo Hybrid

6 items • Updated 26 days ago • 21

upvoted a paper 13 days ago

Omnilingual MT: Machine Translation for 1,600 Languages

Paper • 2603.16309 • Published 14 days ago • 20

upvoted an article 13 days ago

Article

State of Open Source on Hugging Face: Spring 2026

13 days ago

•

73

upvoted an article 14 days ago

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Oct 7, 2024

•

69

upvoted 3 papers 14 days ago

Information Asymmetry across Language Varieties: A Case Study on Cantonese-Mandarin and Bavarian-German QA

Paper • 2603.14782 • Published 15 days ago • 1

Indirect Question Answering in English, German and Bavarian: A Challenging Task for High- and Low-Resource Languages Alike

Paper • 2603.15130 • Published 15 days ago • 1

Effective Distillation to Hybrid xLSTM Architectures

Paper • 2603.15590 • Published 14 days ago • 32

upvoted 2 articles 17 days ago

Article

Ulysses Sequence Parallelism: Training with Million-Token Contexts

22 days ago

•

24

Article

FlashHead: Accelerating Language Model Inference ~ Efficient drop-in replacement for the classification head

19 days ago

•

2

upvoted a paper 18 days ago

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Paper • 2603.09229 • Published 21 days ago • 81

upvoted a collection 19 days ago

Nemotron-Pre-Training-Datasets

Large scale pre-training datasets used in the Nemotron family of models. • 12 items • Updated 6 days ago • 133

upvoted a paper 19 days ago

Lost in Backpropagation: The LM Head is a Gradient Bottleneck

Paper • 2603.10145 • Published 20 days ago • 11

upvoted a collection 19 days ago

NVIDIA Nemotron v3

Open, Production-ready Enterprise Models • 15 items • Updated 6 days ago • 248

upvoted a collection 21 days ago

MixtureVitae study models and datasets

Collection of models and dataset related to MixtureVitae, open and fully reproducible pretraining dataset built from permissive sources • 16 items • Updated Feb 13 • 2

upvoted an article 25 days ago

Article

Scaling Pedagogical Pre-training: From Optimal Mixing to 10 Billion Tokens

25 days ago

•

4