4 19 27

Jihan Yang PRO

jihanyang

https://jihanyang.github.io/

AI & ML interests

Computer Vision, Multimodality, Embodied AI

Recent Activity

upvoted a paper 29 days ago

Beyond Language Modeling: An Exploration of Multimodal Pretraining

upvoted a paper about 1 month ago

Solaris: Building a Multiplayer Video World Model in Minecraft

liked a dataset about 1 month ago

nyu-visionx/scale-rae-data

View all activity

Organizations

upvoted a paper 29 days ago

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Paper • 2603.03276 • Published 30 days ago • 102

upvoted a paper about 1 month ago

Solaris: Building a Multiplayer Video World Model in Minecraft

Paper • 2602.22208 • Published Feb 25 • 28

liked a dataset about 1 month ago

nyu-visionx/scale-rae-data

Updated Jan 24 • 42.2k • 2

liked 2 datasets about 2 months ago

allenai/Molmo2-VideoCapQA

Viewer • Updated Feb 11 • 951k • 266 • 6

jasonzhango/SPAR-7M

Viewer • Updated Sep 28, 2025 • 16.3M • 291 • 6

authored a paper 2 months ago

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published Jan 22 • 55

upvoted a paper 2 months ago

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published Jan 22 • 55

upvoted 2 collections 3 months ago

Cambrian-S Models

Collection

18 items • Updated about 1 month ago • 8

Cambrian-S-Data

Collection

Data used during Cambrian-S's 4-stage training • 4 items • Updated Feb 27 • 5

authored 2 papers 5 months ago

Cambrian-S: Towards Spatial Supersensing in Video

Paper • 2511.04670 • Published Nov 6, 2025 • 39

Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts

Paper • 2511.04655 • Published Nov 6, 2025 • 10

upvoted a collection 5 months ago

VSI-SUPER

Collection

VSI-SUPER benchmark proposed in Cambrian-S • 2 items • Updated about 1 month ago • 3

upvoted 2 papers 5 months ago

Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts

Paper • 2511.04655 • Published Nov 6, 2025 • 10

Cambrian-S: Towards Spatial Supersensing in Video

Paper • 2511.04670 • Published Nov 6, 2025 • 39

commented a paper 5 months ago

Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts

Paper • 2511.04655 • Published Nov 6, 2025 • 10 •

upvoted a paper 5 months ago

Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations

Paper • 2510.23607 • Published Oct 27, 2025 • 181

upvoted 3 papers 6 months ago

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13, 2025 • 182

Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13, 2025 • 170

LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26, 2025 • 189

upvoted a paper 8 months ago

MetaCLIP 2: A Worldwide Scaling Recipe

Paper • 2507.22062 • Published Jul 29, 2025 • 37

Jihan Yang PRO

AI & ML interests

Recent Activity

Organizations

jihanyang's activity