eugenehp harryjulian commited on
Commit
cc40a93
·
0 Parent(s):

Duplicate from neuphonic/neucodec

Browse files

Co-authored-by: harry julian <harryjulian@users.noreply.huggingface.co>

Files changed (5) hide show
  1. .gitattributes +35 -0
  2. NeuCodec-Thumbnail.jpg +0 -0
  3. README.md +111 -0
  4. meta.yaml +2 -0
  5. pytorch_model.bin +3 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
NeuCodec-Thumbnail.jpg ADDED
README.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+
3
+ ---
4
+ license: apache-2.0
5
+ tags:
6
+ - audio
7
+ - speech
8
+ - audio-to-audio
9
+ - speech-language-models
10
+ datasets:
11
+ - amphion/Emilia-Dataset
12
+ - facebook/multilingual_librispeech
13
+ - CSTR-Edinburgh/vctk
14
+ - google/fleurs
15
+ - mozilla-foundation/common_voice_13_0
16
+ - mythicinfinity/libritts_r
17
+ ---
18
+
19
+ # NeuCodec 🎧
20
+
21
+ [![NeuCodec Intro](NeuCodec-Thumbnail.jpg)](https://www.youtube.com/watch?v=O7XH1lGZyYY)
22
+
23
+ *Click the image above to see NeuCodec in action on Youtube!*
24
+
25
+ *Created by Neuphonic - building faster, smaller, on-device voice AI*
26
+
27
+ A lightweight neural codec that encodes audio at just 0.8 kbps - perfect for researchers and builders who need something that *just works* for training high quality text-to-speech models.
28
+
29
+ # Key Features
30
+
31
+ * 🔊 Low bit-rate compression - a speech codec that compresses and reconstructs audio with near-inaudible reconstruction loss
32
+ <br>
33
+ * 🎼 Upsamples from 16kHz → 24kHz
34
+ <br>
35
+ * 🌍 Ready for real-world use - train your own SpeechLMs without needing to build your own codec
36
+ <br>
37
+ * 🏢 Commercial use permitted - use it in your own tools or products
38
+ <br>
39
+ * 📊 Released with large pre-encoded datasets - we’ve compressed Emilia-YODAS from 1.7TB to 41GB using NeuCodec, significantly reducing the compute requirements needed for training
40
+ <br>
41
+
42
+ # Model Details
43
+
44
+ NeuCodec is a Finite Scalar Quantisation (FSQ) based 0.8kbps audio codec for speech tokenization.
45
+ It takes advantage of the following features:
46
+
47
+ * FSQ quantisation resulting in a single codebook, making it ideal for downstream modeling with Speech Language Models.
48
+ * Trained with CC data such that there are no Non-Commercial data restrictions.
49
+ * At 50 tokens/sec and 16 bits per token, the overall bit-rate is 0.8kbps.
50
+ * The codec takes in 16kHz input and outputs 24kHz using an upsampling decoder.
51
+ * The FSQ encoding scheme allows for bit-level error resistance suitable for unreliable and noisy channels.
52
+
53
+ NeuCodec is largely based on extending the work of [X-Codec2.0](https://huggingface.co/HKUSTAudio/xcodec2).
54
+
55
+ - **Developed by:** Neuphonic
56
+ - **Model type:** Neural Audio Codec
57
+ - **License:** apache-2.0
58
+ - **Repository:** https://github.com/neuphonic/neucodec
59
+ - **Paper:** [arXiv](https://arxiv.org/abs/2509.09550)
60
+ - **Pre-encoded Datasets:**
61
+ - [Emilia-YODAS-EN](https://huggingface.co/datasets/neuphonic/emilia-yodas-english-neucodec)
62
+ - *More coming soon!*
63
+
64
+ # Get Started
65
+
66
+ Use the code below to get started with the model.
67
+
68
+ To install from pypi in a dedicated environment, using Python 3.10 or above:
69
+
70
+ ```bash
71
+ conda create -n neucodec python=3.10
72
+ conda activate neucodec
73
+ pip install neucodec
74
+ ```
75
+ Then, to use in python:
76
+
77
+ ```python
78
+ import librosa
79
+ import torch
80
+ import torchaudio
81
+ from torchaudio import transforms as T
82
+ from neucodec import NeuCodec
83
+
84
+ model = NeuCodec.from_pretrained("neuphonic/neucodec")
85
+ model.eval().cuda()
86
+
87
+ y, sr = torchaudio.load(librosa.ex("libri1"))
88
+ if sr != 16_000:
89
+ y = T.Resample(sr, 16_000)(y)[None, ...] # (B, 1, T_16)
90
+
91
+ with torch.no_grad():
92
+ fsq_codes = model.encode_code(y)
93
+ # fsq_codes = model.encode_code(librosa.ex("libri1")) # or directly pass your filepath!
94
+ print(f"Codes shape: {fsq_codes.shape}")
95
+ recon = model.decode_code(fsq_codes).cpu() # (B, 1, T_24)
96
+
97
+ torchaudio.save("reconstructed.wav", recon[0, :, :], 24_000)
98
+ ```
99
+
100
+ # Training Details
101
+
102
+ The model was trained using the following data:
103
+ * Emilia-YODAS
104
+ * MLS
105
+ * LibriTTS
106
+ * Fleurs
107
+ * CommonVoice
108
+ * HUI
109
+ * Additional proprietary set
110
+
111
+ All publically available data was covered by either the CC-BY-4.0 or CC0 license.
meta.yaml ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ author: neuphonic
2
+ license: apache-2.0
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:30c3ea13ceeb2de693c56e5e33a1b7e00d44c95dcdd08a4ed0d552d0bf59ebdf
3
+ size 1160509432