oguzhanercan 's Collections Representation Learning
updated
End-to-End Vision Tokenizer Tuning
Paper
• 2505.10562
• Published • 22
Global and Local Entailment Learning for Natural World Imagery
Paper
• 2506.21476
• Published • 1
Paper
• 2508.10104
• Published • 301
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task
Arithmetic
Paper
• 2509.01363
• Published • 60
AToken: A Unified Tokenizer for Vision
Paper
• 2509.14476
• Published • 37
Lost in Embeddings: Information Loss in Vision-Language Models
Paper
• 2509.11986
• Published • 29
Latent Zoning Network: A Unified Principle for Generative Modeling,
Representation Learning, and Classification
Paper
• 2509.15591
• Published • 45
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model
Paper
• 2510.12709
• Published • 13
Better Together: Leveraging Unpaired Multimodal Data for Stronger
Unimodal Models
Paper
• 2510.08492
• Published • 10
GRACE: Generative Representation Learning via Contrastive Policy
Optimization
Paper
• 2510.04506
• Published • 12
Latent Diffusion Model without Variational Autoencoder
Paper
• 2510.15301
• Published • 50
Model Merging with Functional Dual Anchors
Paper
• 2510.21223
• Published • 13
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via
Hierarchical Model Merging
Paper
• 2510.20479
• Published • 12
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization
Formats
Paper
• 2510.25602
• Published • 79
Defeating the Training-Inference Mismatch via FP16
Paper
• 2510.26788
• Published • 31
UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings
Paper
• 2511.00405
• Published • 6
Φeat: Physically-Grounded Feature Representation
Paper
• 2511.11270
• Published • 11
Qwen3-VL Technical Report
Paper
• 2511.21631
• Published • 161
Next-Embedding Prediction Makes Strong Vision Learners
Paper
• 2512.16922
• Published • 88
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking
Paper
• 2601.04720
• Published • 57
Nested Learning: The Illusion of Deep Learning Architectures
Paper
• 2512.24695
• Published • 45
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
Paper
• 2602.08683
• Published • 52
Rethinking Generative Recommender Tokenizer: Recsys-Native Encoding and Semantic Quantization Beyond LLMs
Paper
• 2602.02338
• Published • 42
Unified Vision-Language Modeling via Concept Space Alignment
Paper
• 2603.01096
• Published • 6
InfoNCE Induces Gaussian Distribution
Paper
• 2602.24012
• Published • 13
Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models
Paper
• 2602.24264
• Published • 14