Papers - a jxtngx Collection

jxtngx 's Collections

Papers

updated Oct 19, 2025

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 115
LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 21
Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30, 2024 • 21
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Paper • 2407.21770 • Published Jul 31, 2024 • 22
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 58
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 15
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9
8-bit Optimizers via Block-wise Quantization

Paper • 2110.02861 • Published Oct 6, 2021 • 2
RoFormer: Enhanced Transformer with Rotary Position Embedding

Paper • 2104.09864 • Published Apr 20, 2021 • 17
Efficiently Modeling Long Sequences with Structured State Spaces

Paper • 2111.00396 • Published Oct 31, 2021 • 3
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Paper • 2210.17323 • Published Oct 31, 2022 • 10
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 150
The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26, 2024 • 82
RoBERTa: A Robustly Optimized BERT Pretraining Approach

Paper • 1907.11692 • Published Jul 26, 2019 • 10
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
Universal Language Model Fine-tuning for Text Classification

Paper • 1801.06146 • Published Jan 18, 2018 • 8
Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs

Paper • 1603.09320 • Published Mar 30, 2016 • 1
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 19
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework

Paper • 2308.08155 • Published Aug 16, 2023 • 11
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Paper • 2306.05685 • Published Jun 9, 2023 • 40
The Perfect Blend: Redefining RLHF with Mixture of Judges

Paper • 2409.20370 • Published Sep 30, 2024 • 5
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Paper • 1909.08053 • Published Sep 17, 2019 • 5
ReAct: Synergizing Reasoning and Acting in Language Models

Paper • 2210.03629 • Published Oct 6, 2022 • 33
Agent-as-a-Judge: Evaluate Agents with Agents

Paper • 2410.10934 • Published Oct 14, 2024 • 23
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Paper • 2405.17428 • Published May 27, 2024 • 20
Large Concept Models: Language Modeling in a Sentence Representation Space

Paper • 2412.08821 • Published Dec 11, 2024 • 17
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks

Paper • 2503.15478 • Published Mar 19, 2025 • 13
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization

Paper • 2502.02631 • Published Feb 4, 2025 • 4
Revisiting Feature Prediction for Learning Visual Representations from Video

Paper • 2404.08471 • Published Feb 15, 2024 • 1
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 170
FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published Dec 17, 2024 • 75
google-research-datasets/conceptual_captions

Viewer • Updated Jun 17, 2024 • 5.34M • 12.3k • 104
Planning with Reasoning using Vision Language World Model

Paper • 2509.02722 • Published Sep 2, 2025 • 24