27 12 40

Dattu Sharma

imdatta0

https://datta0.github.io/

AI & ML interests

Everything ML. Specifically Deep Learning.

Recent Activity

updated a Space 2 days ago

imdatta0/wordle-grpo-Qwen3-1.7B-test

published a Space 2 days ago

imdatta0/wordle-grpo-Qwen3-1.7B-test

updated a model 8 days ago

imdatta0/nanoqwen-bf16

View all activity

Organizations

New activity in meta-llama/Llama-3.3-70B-Instruct about 1 year ago

Tokenizer doesn't load with transformers 4.34.4

#21 opened about 1 year ago by

imdatta0

New activity in imdatta0/wikipedia_en_sample over 1 year ago

Librarian Bot: Add language metadata for dataset

#2 opened over 1 year ago by

librarian-bot

commented 17 papers over 1 year ago

Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling

Paper • 2410.07145 • Published Oct 9, 2024 • 2 •

Round and Round We Go! What makes Rotary Positional Encodings useful?

Paper • 2410.06205 • Published Oct 8, 2024 • 2 •

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Paper • 2410.00531 • Published Oct 1, 2024 • 33 •

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8, 2024 • 111 •

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 180 •

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Paper • 2410.00531 • Published Oct 1, 2024 • 33 •

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 153 •

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27, 2024 • 42 •

KTO: Model Alignment as Prospect Theoretic Optimization

Paper • 2402.01306 • Published Feb 2, 2024 • 21 •

Planning In Natural Language Improves LLM Search For Code Generation

Paper • 2409.03733 • Published Sep 5, 2024 •

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3, 2024 • 80 •

FocusLLM: Scaling LLM's Context by Parallel Decoding

Paper • 2408.11745 • Published Aug 21, 2024 • 25 •

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Paper • 2408.12570 • Published Aug 22, 2024 • 32 •

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 58 •

New activity in imdatta0/pints over 1 year ago

Librarian Bot: Add language metadata for dataset

#1 opened over 1 year ago by

librarian-bot

Dattu Sharma

AI & ML interests

Recent Activity

Organizations

imdatta0's activity

Tokenizer doesn't load with transformers 4.34.4

Librarian Bot: Add language metadata for dataset

Librarian Bot: Add language metadata for dataset