In a Training Loop 🔄

25 32 75

Lj V. Miranda PRO

ljvmiranda921

https://ljvmiranda921.github.io

AI & ML interests

NLP - multilinguality, data-centric AI, post-training

Recent Activity

liked a dataset about 15 hours ago

etdvprg/PHMartialLaw-NER_final

updated a dataset about 24 hours ago

ljvmiranda921/gsd-smith-Indonesian

updated a dataset about 24 hours ago

ljvmiranda921/gsd-smith-Cebuano

View all activity

Organizations

upvoted a paper 29 days ago

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

Paper • 2604.08224 • Published Apr 9 • 51

upvoted an article about 2 months ago

Article

Compute and Competition in AI: Different FlOPs for Different Folks

sasha

•

Feb 12

• 15

upvoted a collection 3 months ago

Tiny Aya

Collection

Bridging Scale and Multilingual Depth • 10 items • Updated Feb 17 • 69

upvoted a paper 3 months ago

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Paper • 2601.22975 • Published Jan 30 • 111

upvoted 2 articles 8 months ago

Article

There is no such thing as a tokenizer-free lunch

catherinearnett

•

Sep 25, 2025

• 98

Article

An Analysis of Multilingual Models on Hugging Face

catherinearnett

•

Sep 18, 2025

• 5

upvoted an article 9 months ago

Article

🇵🇭 FilBench - Can LLMs Understand and Generate Filipino?

ljvmiranda921, acocodes, connermanuel, jcblaise, jcblaise, josephimperial, davanstrien, SaylorTwift, clefourrier

•

Aug 12, 2025

• 23

upvoted a collection 11 months ago

Reward Bench 2

Collection

Datasets, spaces, and models for Reward Bench 2 benchmark and paper! • 11 items • Updated Dec 23, 2025 • 16

upvoted a paper 12 months ago

R3: Robust Rubric-Agnostic Reward Models

Paper • 2505.13388 • Published May 19, 2025 • 11

upvoted 2 papers about 1 year ago

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29, 2025 • 99

The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks

Paper • 2504.15521 • Published Apr 22, 2025 • 64

upvoted a collection about 1 year ago

SEA-VL: Multicultural VL Dataset for Southeast Asia

Collection

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia • 3 items • Updated Apr 12, 2025 • 21

upvoted a paper about 1 year ago

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published Mar 10, 2025 • 101

upvoted 3 papers over 1 year ago

upvoted 3 collections over 1 year ago

Multilingual LLM Evaluation

Collection

Multilingual Evaluation Benchmarks • 8 items • Updated Jul 31, 2025 • 34

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark S

Collection

SEACrowd is a community movement project aimed at centralizing and standardizing AI resources for Southeast Asian languages, cultures, and/or regions. • 3 items • Updated Jun 18, 2024 • 8

OLMo 2

Collection

Artifacts for the OLMo 2 release. • 35 items • Updated Mar 3 • 155

upvoted a paper over 1 year ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 68

Lj V. Miranda PRO

AI & ML interests

Recent Activity

Organizations

ljvmiranda921's activity

Compute and Competition in AI: Different FlOPs for Different Folks

There is no such thing as a tokenizer-free lunch

An Analysis of Multilingual Models on Hugging Face

🇵🇭 FilBench - Can LLMs Understand and Generate Filipino?