Interesting Papers
updated
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge
in RAG Systems
Paper
• 2411.02959
• Published
• 71
GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single
In-the-Wild Image using a Dataset with Levels of Details
Paper
• 2411.03047
• Published
• 9
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Paper
• 2411.02336
• Published
• 24
GenXD: Generating Any 3D and 4D Scenes
Paper
• 2411.02319
• Published
• 20
Fashion-VDM: Video Diffusion Model for Virtual Try-On
Paper
• 2411.00225
• Published
• 11
Face Anonymization Made Simple
Paper
• 2411.00762
• Published
• 9
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level
and Fidelity-Rich Conditions in Diffusion Models
Paper
• 2410.22901
• Published
• 8
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse
Autoencoders
Paper
• 2410.22366
• Published
• 84
DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe
Dataset Curation
Paper
• 2410.18666
• Published
• 19
Emu3: Next-Token Prediction is All You Need
Paper
• 2409.18869
• Published
• 97
Hymba: A Hybrid-head Architecture for Small Language Models
Paper
• 2411.13676
• Published
• 47
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations
Paper
• 2411.10818
• Published
• 26
RedPajama: an Open Dataset for Training Large Language Models
Paper
• 2411.12372
• Published
• 56
Generative World Explorer
Paper
• 2411.11844
• Published
• 77
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large
Language Models on Mobile Devices
Paper
• 2411.10640
• Published
• 46
AnimateAnything: Consistent and Controllable Animation for Video
Generation
Paper
• 2411.10836
• Published
• 24
SlimLM: An Efficient Small Language Model for On-Device Document
Assistance
Paper
• 2411.09944
• Published
• 12
FitDiT: Advancing the Authentic Garment Details for High-fidelity
Virtual Try-on
Paper
• 2411.10499
• Published
• 13
StableV2V: Stablizing Shape Consistency in Video-to-Video Editing
Paper
• 2411.11045
• Published
• 11
Region-Aware Text-to-Image Generation via Hard Binding and Soft
Refinement
Paper
• 2411.06558
• Published
• 36
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper
• 2411.10440
• Published
• 129
Cut Your Losses in Large-Vocabulary Language Models
Paper
• 2411.09009
• Published
• 49
Enhancing the Reasoning Ability of Multimodal Large Language Models via
Mixed Preference Optimization
Paper
• 2411.10442
• Published
• 87
From CISC to RISC: language-model guided assembly transpilation
Paper
• 2411.16341
• Published
• 14
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper
• 2411.15124
• Published
• 67
MoViE: Mobile Diffusion for Video Editing
Paper
• 2412.06578
• Published
• 18
GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis
Paper
• 2412.06089
• Published
• 4
Mogo: RQ Hierarchical Causal Transformer for High-Quality 3D Human
Motion Generation
Paper
• 2412.07797
• Published
• 11
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper
• 2412.11768
• Published
• 43
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained
Evidence within Generation
Paper
• 2412.11919
• Published
• 36