6 13 49

Aidy Osu

aidystark

AI & ML interests

Vision;Language;Speech

Recent Activity

updated a model about 20 hours ago

aidystark/function

published a model about 20 hours ago

aidystark/function

new activity 6 months ago

fixie-ai/ultravox-v0_6-gemma-3-27b:AttributeError: 'Gemma3Config' object has no attribute 'num_hidden_layers'

View all activity

Organizations

updated a model about 20 hours ago

aidystark/function

Text Generation • 0.3B • Updated about 20 hours ago • 47

published a model about 20 hours ago

aidystark/function

Text Generation • 0.3B • Updated about 20 hours ago • 47

New activity in fixie-ai/ultravox-v0_6-gemma-3-27b 6 months ago

AttributeError: 'Gemma3Config' object has no attribute 'num_hidden_layers'

🔥 2

#1 opened 6 months ago by

aidystark

liked 2 models 6 months ago

ResembleAI/chatterbox

Text-to-Speech • Updated Sep 23, 2025 • 1.73M • • 1.53k

decart-ai/Lucy-Edit-Dev

Video-to-Video • Updated Nov 20, 2025 • 178 • 332

upvoted an article 7 months ago

Article

mmBERT: ModernBERT goes Multilingual

Sep 9, 2025

•

140

liked a model 7 months ago

microsoft/VibeVoice-1.5B

Text-to-Speech • 3B • Updated Jan 22 • 62.6k • 2.28k

reactedto chansung's post with 👍 9 months ago

Post

4707

YAML engineering becomes more and more important than ever from infra provisioning to model training (recipes).

Here, I built a simple editor first for @dstackai , and I will share the live endpoint this week. Let me know what you think about this approach.

Based on this approach, if people think this is useful, I am going to do the same thing for the LLM training recipes for popular frameworks such as Hugging Face open-r1, Axolotl, and so on. Let me hear.

upvoted a collection about 1 year ago

Orpheus TTS

Collection

TTS Towards Human-Sounding Speech • 2 items • Updated Mar 18, 2025 • 77

liked 2 models about 1 year ago

slprl/slam

Audio-to-Audio • Updated Feb 25, 2025 • 8 • 11

saheedniyi/YarnGPT

Text-to-Speech • 0.4B • Updated Mar 14, 2025 • 123 • 47

commented a paper about 1 year ago

Distilling an End-to-End Voice Assistant Without Instruction Training Data

Paper • 2410.02678 • Published Oct 3, 2024 • 23 •

liked a Space about 1 year ago

The Ultra-Scale Playbook

🌌

3.76k

The ultimate guide to training LLM on large GPU Clusters

commented a paper about 1 year ago

SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?

Paper • 2502.12115 • Published Feb 17, 2025 • 46 •

liked a model about 1 year ago

fishaudio/fish-speech-1.5

Text-to-Speech • Updated Mar 25, 2025 • 7.25k • 721

liked 2 Spaces over 1 year ago

Talk To Ultravox

⚡

Talk to Fixie.ai's Ultravox with WebRTC ⚡️

Fish Audio S1

🏆

695

Convert text to natural-sounding speech audio

liked 2 models over 1 year ago

Qwen/Qwen2-VL-7B-Instruct

Image-Text-to-Text • 8B • Updated Feb 6, 2025 • 1.36M • 1.27k

Qwen/Qwen2-VL-2B

Image-Text-to-Text • 2B • Updated Dec 6, 2024 • 3.52k • 61

reactedto merve's post with 🚀 over 1 year ago

Post

2704

small but mighty 🔥
you can fine-tune SmolVLM on an L4 with batch size of 4 and it will only take 16.4 GB VRAM 🫰🏻 also with gradient accumulation simulated batch size is 16 ✨
I made a notebook that includes all the goodies: QLoRA, gradient accumulation, gradient checkpointing with explanations on how they work 💝 https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb

Aidy Osu

AI & ML interests

Recent Activity

Organizations

aidystark's activity

AttributeError: 'Gemma3Config' object has no attribute 'num_hidden_layers'

mmBERT: ModernBERT goes Multilingual

The Ultra-Scale Playbook

Talk To Ultravox

Fish Audio S1