Duplicate from neuphonic/neucodec

Browse files

Co-authored-by: harry julian <harryjulian@users.noreply.huggingface.co>

Files changed (5) hide show

.gitattributes +35 -0
NeuCodec-Thumbnail.jpg +0 -0
README.md +111 -0
meta.yaml +2 -0
pytorch_model.bin +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

NeuCodec-Thumbnail.jpg ADDED Viewed

README.md ADDED Viewed

	@@ -0,0 +1,111 @@

+---
+license: apache-2.0
+tags:
+- audio
+- speech
+- audio-to-audio
+- speech-language-models
+datasets:
+- amphion/Emilia-Dataset
+- facebook/multilingual_librispeech
+- CSTR-Edinburgh/vctk
+- google/fleurs
+- mozilla-foundation/common_voice_13_0
+- mythicinfinity/libritts_r
+---
+# NeuCodec 🎧
+[![NeuCodec Intro](NeuCodec-Thumbnail.jpg)](https://www.youtube.com/watch?v=O7XH1lGZyYY)
+*Click the image above to see NeuCodec in action on Youtube!*
+*Created by Neuphonic - building faster, smaller, on-device voice AI*
+A lightweight neural codec that encodes audio at just 0.8 kbps - perfect for researchers and builders who need something that *just works* for training high quality text-to-speech models.
+# Key Features
+* 🔊 Low bit-rate compression - a speech codec that compresses and reconstructs audio with near-inaudible reconstruction loss
+<br>
+* 🎼 Upsamples from 16kHz → 24kHz
+<br>
+* 🌍 Ready for real-world use - train your own SpeechLMs without needing to build your own codec
+<br>
+* 🏢 Commercial use permitted - use it in your own tools or products
+<br>
+* 📊 Released with large pre-encoded datasets - we’ve compressed Emilia-YODAS from 1.7TB to 41GB using NeuCodec, significantly reducing the compute requirements needed for training
+<br>
+# Model Details
+NeuCodec is a Finite Scalar Quantisation (FSQ) based 0.8kbps audio codec for speech tokenization.
+It takes advantage of the following features:
+* FSQ quantisation resulting in a single codebook, making it ideal for downstream modeling with Speech Language Models.
+* Trained with CC data such that there are no Non-Commercial data restrictions.
+* At 50 tokens/sec and 16 bits per token, the overall bit-rate is 0.8kbps.
+* The codec takes in 16kHz input and outputs 24kHz using an upsampling decoder.
+* The FSQ encoding scheme allows for bit-level error resistance suitable for unreliable and noisy channels.
+NeuCodec is largely based on extending the work of [X-Codec2.0](https://huggingface.co/HKUSTAudio/xcodec2).
+- **Developed by:** Neuphonic
+- **Model type:** Neural Audio Codec
+- **License:** apache-2.0
+- **Repository:** https://github.com/neuphonic/neucodec
+- **Paper:** [arXiv](https://arxiv.org/abs/2509.09550)
+- **Pre-encoded Datasets:**
+  - [Emilia-YODAS-EN](https://huggingface.co/datasets/neuphonic/emilia-yodas-english-neucodec)
+  - *More coming soon!*
+# Get Started
+Use the code below to get started with the model.
+To install from pypi in a dedicated environment, using Python 3.10 or above:
+```bash
+conda create -n neucodec python=3.10
+conda activate neucodec
+pip install neucodec
+```
+Then, to use in python:
+```python
+import librosa
+import torch
+import torchaudio
+from torchaudio import transforms as T
+from neucodec import NeuCodec
+model = NeuCodec.from_pretrained("neuphonic/neucodec")
+model.eval().cuda()
+y, sr = torchaudio.load(librosa.ex("libri1"))
+if sr != 16_000:
+    y = T.Resample(sr, 16_000)(y)[None, ...] # (B, 1, T_16)
+with torch.no_grad():
+    fsq_codes = model.encode_code(y)
+    # fsq_codes = model.encode_code(librosa.ex("libri1")) # or directly pass your filepath!
+    print(f"Codes shape: {fsq_codes.shape}")
+    recon = model.decode_code(fsq_codes).cpu() # (B, 1, T_24)
+torchaudio.save("reconstructed.wav", recon[0, :, :], 24_000)
+```
+# Training Details
+The model was trained using the following data:
+* Emilia-YODAS
+* MLS
+* LibriTTS
+* Fleurs
+* CommonVoice
+* HUI
+* Additional proprietary set
+All publically available data was covered by either the CC-BY-4.0 or CC0 license.

meta.yaml ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ author: neuphonic
2	+ license: apache-2.0

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:30c3ea13ceeb2de693c56e5e33a1b7e00d44c95dcdd08a4ed0d552d0bf59ebdf
+size 1160509432