Yichen's picture

4

Yichen

YichenLLM

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 18 hours ago

Mixture of Universal Experts: Scaling Virtual Width via Depth-Width Transformation

authored a paper 2 days ago

NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time

authored a paper 2 days ago

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion

View all activity

Organizations

None yet

authored 4 papers 2 days ago

NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time

Paper • 2408.03675 • Published Aug 7, 2024

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion

Paper • 2406.06567 • Published Jun 3, 2024

Mixture of Hidden-Dimensions Transformer

Paper • 2412.05644 • Published Dec 7, 2024 • 1

Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking

Paper • 2502.13842 • Published Feb 19, 2025