LLM - a linxi Collection

linxi 's Collections

LLM

updated May 24, 2025

Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 82
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

Paper • 2309.09958 • Published Sep 18, 2023 • 20
Noise-Aware Training of Layout-Aware Language Models

Paper • 2404.00488 • Published Mar 30, 2024 • 10
Streaming Dense Video Captioning

Paper • 2404.01297 • Published Apr 1, 2024 • 13
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Paper • 2404.00399 • Published Mar 30, 2024 • 42
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

Paper • 2404.03413 • Published Apr 4, 2024 • 27
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models

Paper • 2404.03118 • Published Apr 3, 2024 • 25
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?

Paper • 2404.03411 • Published Apr 4, 2024 • 10
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 107
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3, 2024 • 74
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

Paper • 2404.02733 • Published Apr 3, 2024 • 22
FlowMind: Automatic Workflow Generation with LLMs

Paper • 2404.13050 • Published Mar 17, 2024 • 34
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Paper • 2404.13686 • Published Apr 21, 2024 • 29
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Paper • 2404.14396 • Published Apr 22, 2024 • 19
LAMBDA: A Large Model Based Data Agent

Paper • 2407.17535 • Published Jul 24, 2024 • 37
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents

Paper • 2407.17490 • Published Jul 3, 2024 • 31
Very Large-Scale Multi-Agent Simulation in AgentScope

Paper • 2407.17789 • Published Jul 25, 2024 • 40
Efficient Inference of Vision Instruction-Following Models with Elastic Cache

Paper • 2407.18121 • Published Jul 25, 2024 • 17
VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 41
OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

Paper • 2407.16224 • Published Jul 23, 2024 • 29
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Paper • 2407.15841 • Published Jul 22, 2024 • 39
An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion

Paper • 2408.03178 • Published Aug 6, 2024 • 40
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

Paper • 2408.02629 • Published Aug 5, 2024 • 15
MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3, 2024 • 93
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts

Paper • 2409.00447 • Published Aug 31, 2024 • 3
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

Paper • 2505.16175 • Published May 22, 2025 • 42