Bolmo: Byteifying the Next Generation of Language Models Paper β’ 2512.15586 β’ Published Dec 17, 2025 β’ 17
Bolmo: Byteifying the Next Generation of Language Models Paper β’ 2512.15586 β’ Published Dec 17, 2025 β’ 17
Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem Paper β’ 2512.03073 β’ Published Nov 27, 2025 β’ 6
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Paper β’ 2510.24081 β’ Published Oct 28, 2025 β’ 20
The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper β’ 2510.13996 β’ Published Oct 15, 2025 β’ 9
view post Post 658 Something very cool is cooking at Lichess See translation 1 reply Β· π 1 1 + Reply
RewardBench 2: Advancing Reward Model Evaluation Paper β’ 2506.01937 β’ Published Jun 2, 2025 β’ 7
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models Paper β’ 2405.05417 β’ Published May 8, 2024 β’ 1
Command A: An Enterprise-Ready Large Language Model Paper β’ 2504.00698 β’ Published Apr 1, 2025 β’ 29
Retrofitting (Large) Language Models with Dynamic Tokenization Paper β’ 2411.18553 β’ Published Nov 27, 2024 β’ 2
Cross-Tokenizer Distillation via Approximate Likelihood Matching Paper β’ 2503.20083 β’ Published Mar 25, 2025 β’ 1
view post Post 2111 The folks at Foursquare released a dataset of 104.5 million places of interest ( foursquare/fsq-os-places) and here's all of them on a plot See translation 4 replies Β· π₯ 5 5 π 1 1 π 1 1 + Reply
view post Post 2452 The Lichess database of games, puzzles, and engine evaluations is now on the Hub: Lichess Billions of chess data points to download, query, and stream and we're excited to see what you'll build with it! βοΈ π€- https://huggingface.co/collections/Lichess/positions-datasets-66f50837db5cd3287d60d489- https://huggingface.co/collections/Lichess/games-datasets-66f508df78f4b43e1bb2d353 See translation π 7 7 β€οΈ 2 2 π₯ 1 1 + Reply
Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation Paper β’ 2406.16678 β’ Published Jun 24, 2024 β’ 16
Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation Paper β’ 2305.18893 β’ Published May 30, 2023 β’ 2
CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models Paper β’ 2305.14214 β’ Published May 23, 2023