大模型idea
updated
Instruction Following without Instruction Tuning
Paper
• 2409.14254
• Published
• 29
Baichuan Alignment Technical Report
Paper
• 2410.14940
• Published
• 51
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and
Evolution
Paper
• 2410.16256
• Published
• 61
Infinity-MM: Scaling Multimodal Performance with Large-Scale and
High-Quality Instruction Data
Paper
• 2410.18558
• Published
• 18
Self-Consistency Preference Optimization
Paper
• 2411.04109
• Published
• 19
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published
• 441
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper
• 2502.03373
• Published
• 58
Qwen2.5-VL Technical Report
Paper
• 2502.13923
• Published
• 213
Chain of Draft: Thinking Faster by Writing Less
Paper
• 2502.18600
• Published
• 50
URECA: Unique Region Caption Anything
Paper
• 2504.05305
• Published
• 35
An Empirical Study of Qwen3 Quantization
Paper
• 2505.02214
• Published
• 25
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture,
Training and Dataset
Paper
• 2505.09568
• Published
• 99
WorldPM: Scaling Human Preference Modeling
Paper
• 2505.10527
• Published
• 34
Does Math Reasoning Improve General LLM Capabilities? Understanding
Transferability of LLM Reasoning
Paper
• 2507.00432
• Published
• 79
Small Batch Size Training for Language Models: When Vanilla SGD Works,
and Why Gradient Accumulation Is Wasteful
Paper
• 2507.07101
• Published
• 4
Scaling Laws for Optimal Data Mixtures
Paper
• 2507.09404
• Published
• 37
Deep Think with Confidence
Paper
• 2508.15260
• Published
• 90
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs
via Bi-Mode Annealing and Reinforce Learning
Paper
• 2508.21113
• Published
• 110