AI & ML interests

None defined yet.

Recent Activity

fffiloniΒ 
posted an update about 10 hours ago
view post
Post
116
βœ… Back up and running!

My TIGER app is now fully working again, with fixes and full compatibility with Gradio 6 πŸš€

It lets you:
- πŸŽ™οΈ Separate multiple speakers from an audio file
- 🎬 Extract each speaker directly from a video
- 🎧 Split audio into dialog, music, and sound effects (DnR)
- πŸŽ₯ Apply DnR separation directly on videos

All powered by lightweight TIGER models for fast and efficient speech separation.

Try it here πŸ‘‰ fffiloni/TIGER-audio-extraction
fffiloniΒ 
posted an update 1 day ago
view post
Post
1407
AniDoc is back πŸŽ‰

I’ve fixed the Space and brought it back to life:
- βœ… Working again after being broken for a while
- βœ… Updated to Gradio 6
- βœ… Compatible with ZeroGPU
- βœ… Output videos now preserve original resolution and FPS

I also added advanced controls so you can experiment more (tracking, seed, motion, sketch).

Try it here: fffiloni/AniDoc
fffiloniΒ 
posted an update 15 days ago
view post
Post
4091
I brought DALLΒ·E mini back to life πŸ€–πŸŽ¨

You can try it here:
fffiloni/dalle-mini-reboot

And I also built a batch version using Hugging Face Jobs (up to 50 images per prompt):
fffiloni/dalle-mini-via-jobs

The goal was to stay close to the original JAX/Flax pipeline, while integrating it with modern tooling (Gradio + Jobs).

It ended up being a fun way to revisit this model β€” still weird, still fun πŸ˜„
  • 3 replies
Β·
fffiloniΒ 
posted an update 20 days ago
view post
Post
476
A clearer demo for TADA (now multilingual) πŸ”ŠπŸŒ

I improved the public demo for TADA β€” a generative framework for speech modeling via text–acoustic dual alignment.

TADA models speech as a joint sequence of text tokens and acoustic tokens, using a transformer backbone to keep text and audio synchronized during generation.

The original demo already exposed these mechanisms, but the workflow made the pipeline hard to understand.

This updated demo makes the process clearer:

β€’ load the model
β€’ prepare a reference voice (optionally with transcript or Whisper auto-transcription)
β€’ generate speech conditioned on that reference

It also adds multilingual support.

Presets are included for a few languages, but the model supports more:

English, French, Spanish, German, Arabic, Mandarin Chinese, Italian, Japanese, Polish, Portuguese

Feel free to try different voices, accents, or languages and see how the alignment behaves.

πŸ‘‰ fffiloni/tada-dual-alignment-tts-demo

Paper
TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (2602.23068)
victorΒ 
posted an update 2 months ago
view post
Post
2390
Interesting article: use Claude Code to help open models write CUDA kernels (for eg) by turning CC traces into Skills. They made a library out of it πŸ‘€

https://huggingface.co/blog/upskill
victorΒ 
posted an update 3 months ago
view post
Post
3469
Nvidia is on a roll lately. Nemotron 3 Nano is my new fav local model, but here's the real flex: they published the entire evaluation setup. Configs, prompts, logs, all of it. This is how you do open models πŸ”₯

https://huggingface.co/blog/nvidia/nemotron-3-nano-evaluation-recipe

multimodalartΒ 
posted an update 6 months ago
view post
Post
23493
Want to iterate on a Hugging Face Space with an LLM?

Now you can easily convert any HF entire repo (Model, Dataset or Space) to a text file and feed it to a language model!

multimodalart/repo2txt
  • 1 reply
Β·
multimodalartΒ 
posted an update 10 months ago
view post
Post
18282
Self-Forcing - a real-time video distilled model from Wan 2.1 by @adobe is out, and they open sourced it 🐐

I've built a live real time demo on Spaces πŸ“ΉπŸ’¨

multimodalart/self-forcing
  • 6 replies
Β·
victorΒ 
posted an update 10 months ago
view post
Post
7629
Open Source Avengers, Assemble! Ask an expert AI agent team to solve complex problems together πŸ”₯

Consilium brings together multiple agents that debate and use live research (web, arXiv, SEC) to reach a consensus. You set the strategy, they find the answer.

Credit to @azettl for this awesome demo: Agents-MCP-Hackathon/consilium_mcp
  • 2 replies
Β·
victorΒ 
posted an update 11 months ago
view post
Post
5180
DIA TTS is just amazing - please share your funniest gens (here is mine) πŸ˜‚
nari-labs/Dia-1.6B
  • 1 reply
Β·
fffiloniΒ 
posted an update about 1 year ago
victorΒ 
posted an update about 1 year ago
view post
Post
6535
Hey everyone, we've given https://hf.co/spaces page a fresh update!

Smart Search: Now just type what you want to doβ€”like "make a viral meme" or "generate music"β€”and our search gets it.

New Categories: Check out the cool new filter bar with icons to help you pick a category fast.

Redesigned Space Cards: Reworked a bit to really show off the app descriptions, so you know what each Space does at a glance.

Random Prompt: Need ideas? Hit the dice button for a burst of inspiration.

We’d love to hear what you thinkβ€”drop us some feedback plz!
  • 6 replies
Β·
fffiloniΒ 
posted an update about 1 year ago
view post
Post
3606
Explain like i'm 5 the last take from @thomwolf on X about Dario's essay on DeepSeek:

β€”β€Ί Open-source AI is like a big cookbook that everyone can read and improve. Instead of a few chefs keeping their recipes secret, anyone can cook, test, and invent new things.

If only one company controls AI, everything stops if they have a problemβ€”like when the internet goes down. With open-source, many people can help, making sure it keeps running smoothly.

AI isn’t just a race between two countries; it’s a team effort around the world. By sharing, we move faster and create safer technology for everyone.
β€”
πŸ€—
victorΒ 
posted an update about 1 year ago
view post
Post
3783
Finally, an open-source AI that turns your lyrics into full songs is hereβ€”meet YuE! Unlike other tools that only create short clips, YuE can make entire songs (up to 5 minutes) with vocals, melody, and instruments all working together. Letsss go!

m-a-p/YuE-s1-7B-anneal-en-cot
victorΒ 
posted an update over 1 year ago
victorΒ 
posted an update over 1 year ago
view post
Post
2664
Perfect example of why Qwen/Qwen2.5-Coder-32B-Instruct is insane?

Introducing: AI Video Composer πŸ”₯
huggingface-projects/ai-video-composer

Drag and drop your assets (images/videos/audios) to create any video you want using natural language!

It works by asking the model to output a valid FFMPEG and this can be quite complex but most of the time Qwen2.5-Coder-32B gets it right (that thing is a beast). It's an update of an old project made with GPT4 and it was almost impossible to make it work with open models back then (~1.5 years ago), but not anymore, let's go open weights πŸš€.
victorΒ 
posted an update over 1 year ago
view post
Post
1905
Qwen2.5-72B is now the default HuggingChat model.
This model is so good that you must try it! I often get better results on rephrasing with it than Sonnet or GPT-4!!
fffiloniΒ 
posted an update over 1 year ago
victorΒ 
posted an update over 1 year ago