audio
music
generation
video2audio
liuhuadai commited on
Commit
aae5dbe
·
verified ·
1 Parent(s): 1a2736b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md CHANGED
@@ -11,6 +11,32 @@ tags:
11
  ---
12
  <h1 align="center">PrismAudio</h1>
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
  **PrismAudio** is the first framework to integrate Reinforcement Learning into Video-to-Audio (V2A) generation with specialized Chain-of-Thought (CoT) planning. Building upon [ThinkSound](https://arxiv.org/pdf/2506.21448)'s pioneering CoT-based V2A framework, PrismAudio further decomposes monolithic reasoning into four specialized CoT modules (Semantic, Temporal, Aesthetic, and Spatial), each paired with targeted reward functions, enabling multi-dimensional RL optimization that jointly improves reasoning across all perceptual dimensions.
 
11
  ---
12
  <h1 align="center">PrismAudio</h1>
13
 
14
+ <p align="center">
15
+ <img src="https://img.shields.io/badge/ICLR 2026-Main Conference-blue.svg" alt="ICLR 2026"/>
16
+ </p>
17
+ <a href="https://arxiv.org/abs/2511.18833">
18
+ <img src="https://img.shields.io/badge/arXiv-2511.18833-b31b1b.svg" alt="arXiv"/>
19
+ </a>
20
+ &nbsp;
21
+ <a href="http://prismaudio-project.github.io/">
22
+ <img src="https://img.shields.io/badge/Online%20Demo-🌐-blue" alt="Online Demo"/>
23
+ </a>
24
+ &nbsp;
25
+ <a href="https://huggingface.co/spaces/FunAudioLLM/PrismAudio">
26
+ <img src="https://img.shields.io/badge/HuggingFace-Spaces-orange?logo=huggingface" alt="Hugging Face"/>
27
+ </a>
28
+ &nbsp;
29
+ <a href="https://www.modelscope.cn/studios/iic/PrismAudio">
30
+ <img src="https://img.shields.io/badge/ModelScope-在线体验-green" alt="ModelScope"/>
31
+ </a>
32
+ </p>
33
+
34
+ <p align="center">
35
+ If you find this project useful,<br>
36
+ a star ⭐ on GitHub would be greatly appreciated!
37
+ </p>
38
+
39
+
40
  ---
41
 
42
  **PrismAudio** is the first framework to integrate Reinforcement Learning into Video-to-Audio (V2A) generation with specialized Chain-of-Thought (CoT) planning. Building upon [ThinkSound](https://arxiv.org/pdf/2506.21448)'s pioneering CoT-based V2A framework, PrismAudio further decomposes monolithic reasoning into four specialized CoT modules (Semantic, Temporal, Aesthetic, and Spatial), each paired with targeted reward functions, enabling multi-dimensional RL optimization that jointly improves reasoning across all perceptual dimensions.