FunAudioLLM
/

PrismAudio

Model card Files Files and versions

liuhuadai commited on 3 days ago

Commit

aae5dbe

·

verified ·

1 Parent(s): 1a2736b

Update README.md

Files changed (1) hide show

README.md +26 -0

README.md CHANGED Viewed

@@ -11,6 +11,32 @@ tags:
 ---
 <h1 align="center">PrismAudio</h1>
 ---
 **PrismAudio** is the first framework to integrate Reinforcement Learning into Video-to-Audio (V2A) generation with specialized Chain-of-Thought (CoT) planning. Building upon [ThinkSound](https://arxiv.org/pdf/2506.21448)'s pioneering CoT-based V2A framework, PrismAudio further decomposes monolithic reasoning into four specialized CoT modules (Semantic, Temporal, Aesthetic, and Spatial), each paired with targeted reward functions, enabling multi-dimensional RL optimization that jointly improves reasoning across all perceptual dimensions.

 ---
 <h1 align="center">PrismAudio</h1>
+<p align="center">
+  <img src="https://img.shields.io/badge/ICLR 2026-Main Conference-blue.svg" alt="ICLR 2026"/>
+</p>
+  <a href="https://arxiv.org/abs/2511.18833">
+    <img src="https://img.shields.io/badge/arXiv-2511.18833-b31b1b.svg" alt="arXiv"/>
+  </a>
+  &nbsp;
+  <a href="http://prismaudio-project.github.io/">
+    <img src="https://img.shields.io/badge/Online%20Demo-🌐-blue" alt="Online Demo"/>
+  </a>
+  &nbsp;
+  <a href="https://huggingface.co/spaces/FunAudioLLM/PrismAudio">
+    <img src="https://img.shields.io/badge/HuggingFace-Spaces-orange?logo=huggingface" alt="Hugging Face"/>
+  </a>
+  &nbsp;
+  <a href="https://www.modelscope.cn/studios/iic/PrismAudio">
+    <img src="https://img.shields.io/badge/ModelScope-在线体验-green" alt="ModelScope"/>
+  </a>
+</p>
+<p align="center">
+  If you find this project useful,<br>
+  a star ⭐ on GitHub would be greatly appreciated!
+</p>
 ---
 **PrismAudio** is the first framework to integrate Reinforcement Learning into Video-to-Audio (V2A) generation with specialized Chain-of-Thought (CoT) planning. Building upon [ThinkSound](https://arxiv.org/pdf/2506.21448)'s pioneering CoT-based V2A framework, PrismAudio further decomposes monolithic reasoning into four specialized CoT modules (Semantic, Temporal, Aesthetic, and Spatial), each paired with targeted reward functions, enabling multi-dimensional RL optimization that jointly improves reasoning across all perceptual dimensions.