juheon2 commited on
Commit
724fb5a
·
0 Parent(s):

Initial Supertonic 3 release

Browse files
.gitattributes ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ *.onnx filter=lfs diff=lfs merge=lfs -text
2
+ *.png filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ .DS_Store
2
+ dev
3
+ .vscode
LICENSE ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ BigScience Open RAIL-M License
2
+ dated August 18, 2022
3
+
4
+ Section I: PREAMBLE
5
+
6
+ This Open RAIL-M License was created by BigScience, a collaborative open innovation project aimed at
7
+ the responsible development and use of large multilingual datasets and Large Language Models
8
+ (“LLMs”). While a similar license was originally designed for the BLOOM model, we decided to adapt it
9
+ and create this license in order to propose a general open and responsible license applicable to other
10
+ machine learning based AI models (e.g. multimodal generative models).
11
+ In short, this license strives for both the open and responsible downstream use of the accompanying
12
+ model. When it comes to the open character, we took inspiration from open source permissive licenses
13
+ regarding the grant of IP rights. Referring to the downstream responsible use, we added use-based
14
+ restrictions not permitting the use of the Model in very specific scenarios, in order for the licensor to be
15
+ able to enforce the license in case potential misuses of the Model may occur. Even though downstream
16
+ derivative versions of the model could be released under different licensing terms, the latter will always
17
+ have to include - at minimum - the same use-based restrictions as the ones in the original license (this
18
+ license).
19
+ The development and use of artificial intelligence (“AI”), does not come without concerns. The world has
20
+ witnessed how AI techniques may, in some instances, become risky for the public in general. These risks
21
+ come in many forms, from racial discrimination to the misuse of sensitive information.
22
+ BigScience believes in the intersection between open and responsible AI development; thus, this License
23
+ aims to strike a balance between both in order to enable responsible open-science in the field of AI.
24
+ This License governs the use of the model (and its derivatives) and is informed by the model card
25
+ associated with the model.
26
+
27
+ NOW THEREFORE, You and Licensor agree as follows:
28
+
29
+ 1. Definitions
30
+ (a) "License" means the terms and conditions for use, reproduction, and Distribution as defined in
31
+ this document.
32
+ (b) “Data” means a collection of information and/or content extracted from the dataset used with the
33
+ Model, including to train, pretrain, or otherwise evaluate the Model. The Data is not licensed under
34
+ this License.
35
+ (c)“Output” means the results of operating a Model as embodied in informational content resulting
36
+ therefrom.
37
+ (d)“Model” means any accompanying machine-learning based assemblies (including checkpoints),
38
+ consisting of learnt weights, parameters (including optimizer states), corresponding to the model
39
+ architecture as embodied in the Complementary Material, that have been trained or tuned, in whole or
40
+ in part on the Data, using the Complementary Material.
41
+ (e) “Derivatives of the Model” means all modifications to the Model, works based on the Model, or any
42
+ other model which is created or initialized by transfer of patterns of the weights, parameters,
43
+ activations or output of the Model, to the other model, in order to cause the other model to perform
44
+ similarly to the Model, including - but not limited to - distillation methods entailing the use of
45
+ intermediate data representations or methods based on the generation of synthetic data by the Model
46
+ for training the other model.
47
+ (f)“Complementary Material” means the accompanying source code and scripts used to define,
48
+ run, load, benchmark or evaluate the Model, and used to prepare data for training or evaluation, if
49
+ any. This includes any accompanying documentation, tutorials, examples, etc, if any.
50
+ (g) “Distribution” means any transmission, reproduction, publication or other sharing of the Model or
51
+ Derivatives of the Model to a third party, including providing the Model as a hosted service made
52
+ available by electronic or other remote means - e.g. API-based or web access.
53
+ (h) “Licensor” means the copyright owner or entity authorized by the copyright owner that is
54
+ granting the License, including the persons or entities that may have rights in the Model and/or
55
+ distributing the Model.
56
+ (i) "You" (or "Your") means an individual or Legal Entity exercising permissions granted by this
57
+ License and/or making use of the Model for whichever purpose and in any field of use, including
58
+ usage of the Model in an end-use application - e.g. chatbot, translator, image generator.
59
+ (j) “Third Parties” means individuals or legal entities that are not under common control with
60
+ Licensor or You.
61
+ (k) "Contribution" means any work of authorship, including the original version of the Model and
62
+ any modifications or additions to that Model or Derivatives of the Model thereof, that is
63
+ intentionally submitted to Licensor for inclusion in the Model by the copyright owner or by an
64
+ individual or Legal Entity authorized to submit on behalf of the copyright owner. For the
65
+ purposes of this definition,
66
+ “submitted” means any form of electronic, verbal, or written
67
+ communication sent to the Licensor or its representatives, including but not limited to
68
+ communication on electronic mailing lists, source code control systems, and issue tracking
69
+ systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and
70
+ improving the Model, but excluding communication that is conspicuously marked or otherwise
71
+ designated in writing by the copyright owner as "Not a Contribution."
72
+ (l) "Contributor" means Licensor and any individual or Legal Entity on behalf of whom a
73
+ Contribution has been received by Licensor and subsequently incorporated within the Model.
74
+
75
+
76
+ Section II: INTELLECTUAL PROPERTY RIGHTS
77
+
78
+ Both copyright and patent grants apply to the Model, Derivatives of the Model and Complementary
79
+ Material. The Model and Derivatives of the Model are subject to additional terms as described in Section III.
80
+
81
+ 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor
82
+ hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare, publicly display, publicly perform, sublicense, and distribute the
83
+ Complementary Material, the Model, and Derivatives of the Model.
84
+
85
+ 3. Grant of Patent License. Subject to the terms and conditions of this License and where and as
86
+ applicable, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge,
87
+ royalty-free, irrevocable (except as stated in this paragraph) patent license to make, have made, use, offer
88
+ to sell, sell, import, and otherwise transfer the Model and the Complementary Material, where such
89
+ license applies only to those patent claims licensable by such Contributor that are necessarily infringed by
90
+ their Contribution(s) alone or by combination of their Contribution(s) with the Model to which such
91
+ Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim
92
+ or counterclaim in a lawsuit) alleging that the Model and/or Complementary Material or a Contribution
93
+ incorporated within the Model and/or Complementary Material constitutes direct or contributory patent
94
+ infringement, then any patent licenses granted to You under this License for the Model and/or Work shall
95
+ terminate as of the date such litigation is asserted or filed.
96
+ Section III: CONDITIONS OF USAGE, DISTRIBUTION AND REDISTRIBUTION
97
+
98
+ 4. Distribution and Redistribution. You may host for Third Party remote access purposes (e.g.
99
+ software-as-a-service), reproduce and distribute copies of the Model or Derivatives of the Model thereof
100
+ in any medium, with or without modifications, provided that You meet the following conditions:
101
+
102
+ a. Use-based restrictions as referenced in paragraph 5 MUST be included as an enforceable provision
103
+ by You in any type of legal agreement (e.g. a license) governing the use and/or distribution of the
104
+ Model or Derivatives of the Model, and You shall give notice to subsequent users You Distribute to,
105
+ that the Model or Derivatives of the Model are subject to paragraph 5. This provision does not apply
106
+ to the use of Complementary Material.
107
+
108
+ b. You must give any Third Party recipients of the Model or Derivatives of the Model a copy of this
109
+ License;
110
+
111
+ c. You must cause any modified files to carry prominent notices stating that You changed the files;
112
+
113
+ d. You must retain all copyright, patent, trademark, and attribution notices excluding those notices
114
+ that do not pertain to any part of the Model, Derivatives of the Model.
115
+ You may add Your own copyright statement to Your modifications and may provide additional or
116
+ different license terms and conditions - respecting paragraph 4.a.
117
+ - for use, reproduction, or Distribution
118
+ of Your modifications, or for any such Derivatives of the Model as a whole, provided Your use,
119
+ reproduction, and Distribution of the Model otherwise complies with the conditions stated in this License.
120
+
121
+ 5. Use-based restrictions. The restrictions set forth in Attachment A are considered Use-based restrictions.
122
+ Therefore You cannot use the Model and the Derivatives of the Model for the specified restricted uses. You
123
+ may use the Model subject to this License, including only for lawful purposes and in accordance with the
124
+ License. Use may include creating any content with, finetuning, updating, running, training, evaluating and/or
125
+ reparametrizing the Model. You shall require all of Your users who use the Model or a Derivative of the Model
126
+ to comply with the terms of this paragraph (paragraph 5).
127
+
128
+ 6. The Output You Generate. Except as set forth herein, Licensor claims no rights in the Output You
129
+ generate using the Model. You are accountable for the Output you generate and its subsequent uses. No
130
+ use of the output can contravene any provision as stated in the License.
131
+
132
+ Section IV: OTHER PROVISIONS
133
+
134
+ 7. Updates and Runtime Restrictions. To the maximum extent permitted by law, Licensor reserves the
135
+ right to restrict (remotely or otherwise) usage of the Model in violation of this License, update the Model
136
+ through electronic means, or modify the Output of the Model based on updates. You shall undertake
137
+ reasonable efforts to use the latest version of the Model.
138
+
139
+ 8. Trademarks and related. Nothing in this License permits You to make use of Licensors’ trademarks,
140
+ trade names, logos or to otherwise suggest endorsement or misrepresent the relationship between the
141
+ parties; and any rights not expressly granted herein are reserved by the Licensors.
142
+
143
+ 9. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides
144
+ the Model and the Complementary Material (and each Contributor provides its Contributions) on an "AS
145
+ IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied,
146
+ including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT,
147
+ MERCHANTABILITY , or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for
148
+ determining the appropriateness of using or redistributing the Model, Derivatives of the Model, and the
149
+ Complementary Material and assume any risks associated with Your exercise of permissions under this
150
+ License.
151
+
152
+ 10. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence),
153
+ contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or
154
+ agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect,
155
+ special, incidental, or consequential damages of any character arising as a result of this License or out of
156
+ the use or inability to use the Model and the Complementary Material (including but not limited to
157
+ damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other
158
+ commercial damages or losses), even if such Contributor has been advised of the possibility of such
159
+ damages.
160
+
161
+ 11. Accepting Warranty or Additional Liability. While redistributing the Model, Derivatives of the
162
+ Model and the Complementary Material thereof, You may choose to offer, and charge a fee for, acceptance
163
+ of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License.
164
+ However, in accepting such obligations, You may act only on Your own behalf and on Your sole
165
+ responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and
166
+ hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor
167
+ by reason of your accepting any such warranty or additional liability.
168
+
169
+ 12. If any provision of this License is held to be invalid, illegal or unenforceable, the remaining
170
+ provisions shall be unaffected thereby and remain valid as if such provision had not been set forth herein.
171
+
172
+ END OF TERMS AND CONDITIONS
173
+
174
+ Attachment A
175
+
176
+ Use Restrictions
177
+
178
+ You agree not to use the Model or Derivatives of the Model:
179
+ (a) In any way that violates any applicable national, federal, state, local or international law
180
+ or regulation;
181
+ (b) For the purpose of exploiting, harming or attempting to exploit or harm minors in any
182
+ way;
183
+ (c) To generate or disseminate verifiably false information and/or content with the purpose of
184
+ harming others;
185
+ (d) To generate or disseminate personal identifiable information that can be used to harm an
186
+ individual;
187
+ (e) To generate or disseminate information and/or content (e.g. images, code, posts, articles),
188
+ and place the information and/or content in any context (e.g. bot generating tweets)
189
+ without expressly and intelligibly disclaiming that the information and/or content is
190
+ machine generated;
191
+ (f) To defame, disparage or otherwise harass others;
192
+ (g) To impersonate or attempt to impersonate (e.g. deepfakes) others without their consent;
193
+ (h) For fully automated decision making that adversely impacts an individual’s legal rights or
194
+ otherwise creates or modifies a binding, enforceable obligation;
195
+ (i) For any use intended to or which has the effect of discriminating against or harming
196
+ individuals or groups based on online or offline social behavior or known or predicted
197
+ personal or personality characteristics;
198
+ (j) To exploit any of the vulnerabilities of a specific group of persons based on their age,
199
+ social, physical or mental characteristics, in order to materially distort the behavior of a
200
+ person pertaining to that group in a manner that causes or is likely to cause that person or
201
+ another person physical or psychological harm;
202
+ (k) For any use intended to or which has the effect of discriminating against individuals or
203
+ groups based on legally protected characteristics or categories;
204
+ (l) To provide medical advice and medical results interpretation;
205
+ (m) To generate or disseminate information for the purpose to be used for administration of
206
+ justice, law enforcement, immigration or asylum processes, such as predicting an
207
+ individual will commit fraud/crime commitment (e.g. by text profiling, drawing causal
208
+ relationships between assertions made in documents, indiscriminate and
209
+ arbitrarily-targeted use).
README.md ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: openrail
3
+ language:
4
+ - en
5
+ - ko
6
+ - ja
7
+ - ar
8
+ - bg
9
+ - cs
10
+ - da
11
+ - de
12
+ - el
13
+ - es
14
+ - et
15
+ - fi
16
+ - fr
17
+ - hi
18
+ - hr
19
+ - hu
20
+ - id
21
+ - it
22
+ - lt
23
+ - lv
24
+ - nl
25
+ - pl
26
+ - pt
27
+ - ro
28
+ - ru
29
+ - sk
30
+ - sl
31
+ - sv
32
+ - tr
33
+ - uk
34
+ - vi
35
+ pipeline_tag: text-to-speech
36
+ tags:
37
+ - text-to-speech
38
+ - speech-synthesis
39
+ - tts
40
+ - onnx
41
+ - multilingual
42
+ - on-device
43
+ library_name: supertonic
44
+ ---
45
+
46
+ # Supertonic 3 | Lightning Fast, On-Device, Accurate TTS
47
+
48
+ ![Supertonic 3 Preview](img/Supertonic3_HeroImage.png)
49
+
50
+ <p align="center">
51
+ <a href="https://huggingface.co/spaces/Supertone/supertonic-3"><img src="https://img.shields.io/badge/Demo-Hugging_Face-yellow?style=for-the-badge" alt="Demo"></a>
52
+ <a href="https://github.com/supertone-inc/supertonic"><img src="https://img.shields.io/badge/Code-GitHub-black?style=for-the-badge&logo=github" alt="Code"></a>
53
+ <a href="https://pypi.org/project/supertonic/"><img src="https://img.shields.io/badge/Python-SDK-blue?style=for-the-badge&logo=python" alt="Python SDK"></a>
54
+ </p>
55
+
56
+ **Supertonic** is a lightweight text-to-speech system for local inference. It runs with ONNX Runtime entirely on your device, with no cloud call required for synthesis.
57
+
58
+ **Supertonic 3** expands the open-weight release from 5 to **31 languages**, improves reading stability, and reduces repeat/skip failures.
59
+
60
+ ## Quick Start
61
+
62
+ Install the Python SDK and generate speech immediately. On first run, the SDK downloads the model assets from Hugging Face.
63
+
64
+ ```bash
65
+ pip install supertonic
66
+ ```
67
+
68
+ ```python
69
+ from supertonic import TTS
70
+
71
+ tts = TTS(auto_download=True)
72
+ style = tts.get_voice_style(voice_name="M1")
73
+
74
+ text = "A gentle breeze moved through the open window while everyone listened to the story."
75
+ wav, duration = tts.synthesize(text, voice_style=style, lang="en")
76
+
77
+ tts.save_audio(wav, "output.wav")
78
+ print(f"Generated {duration:.2f}s of audio")
79
+ ```
80
+
81
+ ## What's New in Supertonic 3
82
+
83
+ - **31 languages**: expanded from the 5-language Supertonic 2 release.
84
+ - **More stable reading**: fewer repeat and skip failures, especially on short and long utterances.
85
+ - **Higher speaker similarity**: improved similarity across the shared-language set compared with Supertonic 2.
86
+ - **Expression tags**: supports simple tags such as `<laugh>`, `<breath>`, and `<sigh>`.
87
+
88
+ ## Performance Highlights
89
+
90
+ Supertonic 3 is designed for practical on-device inference: compact enough to run locally, while staying competitive with much larger open TTS systems.
91
+
92
+ ### Reading Accuracy
93
+
94
+ <p align="center">
95
+ <img src="img/metrics/s3_vs_measured_wer_range_voxcpm2.png" alt="Supertonic 3 reading accuracy compared with measured model ranges and VoxCPM2">
96
+ </p>
97
+
98
+ Across measured languages, Supertonic 3 stays within a competitive WER/CER range against much larger open TTS models such as VoxCPM2, while preserving a lightweight on-device deployment path. Asterisked languages use CER; the others use WER.
99
+
100
+ ### Supertonic 2 to Supertonic 3
101
+
102
+ <p align="center">
103
+ <img src="img/metrics/supertonic2_vs_3_comparison.png" alt="Supertonic 2 and Supertonic 3 comparison">
104
+ </p>
105
+
106
+ Compared with Supertonic 2, Supertonic 3 reduces repeat and skip failures, improves speaker similarity across the shared-language set, and expands language coverage from 5 to 31 languages.
107
+
108
+ ### Runtime Footprint
109
+
110
+ <p align="center">
111
+ <img src="img/metrics/runtime_cpu_gpu_latency_memory.png" alt="Supertonic CPU runtime compared with GPU baselines">
112
+ </p>
113
+
114
+ Supertonic 3 runs fast on CPU, even compared with larger baselines measured on A100 GPU, and uses substantially less memory. It does not require a GPU, which makes local, browser, and edge deployment much easier.
115
+
116
+ ### Model Size
117
+
118
+ <p align="center">
119
+ <img src="img/metrics/model_size_comparison.png" alt="Model size comparison">
120
+ </p>
121
+
122
+ At about 99M parameters across the public ONNX assets, Supertonic 3 is much smaller than 0.7B to 2B class open TTS systems. The smaller model size is a practical advantage for download size, startup time, and on-device inference.
123
+
124
+ ## Supported Languages
125
+
126
+ | Code | Language | Code | Language | Code | Language | Code | Language |
127
+ |------|----------|------|----------|------|----------|------|----------|
128
+ | `en` | English | `ko` | Korean | `ja` | Japanese | `ar` | Arabic |
129
+ | `bg` | Bulgarian | `cs` | Czech | `da` | Danish | `de` | German |
130
+ | `el` | Greek | `es` | Spanish | `et` | Estonian | `fi` | Finnish |
131
+ | `fr` | French | `hi` | Hindi | `hr` | Croatian | `hu` | Hungarian |
132
+ | `id` | Indonesian | `it` | Italian | `lt` | Lithuanian | `lv` | Latvian |
133
+ | `nl` | Dutch | `pl` | Polish | `pt` | Portuguese | `ro` | Romanian |
134
+ | `ru` | Russian | `sk` | Slovak | `sl` | Slovenian | `sv` | Swedish |
135
+ | `tr` | Turkish | `uk` | Ukrainian | `vi` | Vietnamese | | |
136
+
137
+ ## License
138
+
139
+ This project's sample code is released under the MIT License. See the [GitHub repository](https://github.com/supertone-inc/supertonic) for details.
140
+
141
+ The accompanying model is released under the OpenRAIL-M License. See the [LICENSE](https://huggingface.co/Supertone/supertonic-3/blob/main/LICENSE) file in this repository for details.
142
+
143
+ This model was trained using PyTorch, which is licensed under the BSD 3-Clause License but is not redistributed with this project. See the [PyTorch license](https://docs.pytorch.org/FBGEMM/general/License.html) for details.
144
+
145
+ Copyright (c) 2026 Supertone Inc.
config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "model_name": "Supertonic 3",
3
+ "model_type": "onnx",
4
+ "description": "This is a stub config for Hugging Face download counting. The actual model is located at onnx/"
5
+ }
img/Supertonic3_HeroImage.png ADDED

Git LFS Details

  • SHA256: 7498880b738494d758a6e37b0c730c46ad0f22775c5076cb0e3ad4062ef8e8be
  • Pointer size: 132 Bytes
  • Size of remote file: 1.5 MB
img/metrics/model_size_comparison.png ADDED

Git LFS Details

  • SHA256: c7fcffffdb70b3850f3e3e9552eaf2c263749314f4e39ff9d88e189ce3593322
  • Pointer size: 130 Bytes
  • Size of remote file: 94.5 kB
img/metrics/runtime_cpu_gpu_latency_memory.png ADDED

Git LFS Details

  • SHA256: a80c0def839ea570b207e06cc1d3d5aa99e58a24dc85b3dda2688b2dd2c79ec0
  • Pointer size: 131 Bytes
  • Size of remote file: 262 kB
img/metrics/s3_vs_measured_wer_range_voxcpm2.png ADDED

Git LFS Details

  • SHA256: b04a427ca1f7a97b6021ba4c518f3318104d2e85023ac0af3373b844021a1db1
  • Pointer size: 131 Bytes
  • Size of remote file: 198 kB
img/metrics/supertonic2_vs_3_comparison.png ADDED

Git LFS Details

  • SHA256: b40da99bfe032f8ce4713bb18305842a1a9a23b01a9642992721a4172949fb28
  • Pointer size: 131 Bytes
  • Size of remote file: 162 kB
onnx/duration_predictor.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c3eb91414d5ff8a7a239b7fe9e34e7e2bf8a8140d8375ffb14718b1c639325db
3
+ size 3700147
onnx/text_encoder.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7befd5ea8c3119769e8a6c1486c4edc6a3bc8365c67621c881bbb774b9902ff
3
+ size 36416150
onnx/tts.json ADDED
@@ -0,0 +1,311 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "tts_version": "v1.7.3",
3
+ "split": "opensource-multilingual",
4
+ "ttl": {
5
+ "latent_dim": 24,
6
+ "chunk_compress_factor": 6,
7
+ "batch_expander": {
8
+ "n_batch_expand": 6
9
+ },
10
+ "normalizer": {
11
+ "scale": 0.25
12
+ },
13
+ "text_encoder": {
14
+ "n_langs": 0,
15
+ "lang_emb_dim": 0,
16
+ "text_embedder": {
17
+ "char_emb_dim": 256
18
+ },
19
+ "convnext": {
20
+ "idim": 256,
21
+ "ksz": 5,
22
+ "intermediate_dim": 1024,
23
+ "num_layers": 6,
24
+ "dilation_lst": [
25
+ 1,
26
+ 1,
27
+ 2,
28
+ 2,
29
+ 4,
30
+ 4
31
+ ]
32
+ },
33
+ "attn_encoder": {
34
+ "hidden_channels": 256,
35
+ "filter_channels": 1024,
36
+ "n_heads": 4,
37
+ "n_layers": 4,
38
+ "p_dropout": 0.0
39
+ },
40
+ "proj_out": {
41
+ "idim": 256,
42
+ "odim": 256
43
+ }
44
+ },
45
+ "flow_matching": {
46
+ "sig_min": 1e-08
47
+ },
48
+ "style_encoder": {
49
+ "proj_in": {
50
+ "ldim": 24,
51
+ "chunk_compress_factor": 6,
52
+ "odim": 256
53
+ },
54
+ "convnext": {
55
+ "idim": 256,
56
+ "ksz": 5,
57
+ "intermediate_dim": 1024,
58
+ "num_layers": 6,
59
+ "dilation_lst": [
60
+ 1,
61
+ 1,
62
+ 1,
63
+ 1,
64
+ 1,
65
+ 1
66
+ ]
67
+ },
68
+ "style_token_layer": {
69
+ "input_dim": 256,
70
+ "n_style": 50,
71
+ "style_key_dim": 256,
72
+ "style_value_dim": 256,
73
+ "prototype_dim": 256,
74
+ "n_units": 256,
75
+ "n_heads": 2
76
+ }
77
+ },
78
+ "speech_prompted_text_encoder": {
79
+ "text_dim": 256,
80
+ "style_dim": 256,
81
+ "n_units": 256,
82
+ "n_heads": 2
83
+ },
84
+ "uncond_masker": {
85
+ "prob_both_uncond": 0.04,
86
+ "prob_text_uncond": 0.01,
87
+ "std": 0.1,
88
+ "text_dim": 256,
89
+ "n_style": 50,
90
+ "style_key_dim": 256,
91
+ "style_value_dim": 256
92
+ },
93
+ "vector_field": {
94
+ "n_langs": 0,
95
+ "lang_emb_dim": 0,
96
+ "proj_in": {
97
+ "ldim": 24,
98
+ "chunk_compress_factor": 6,
99
+ "odim": 512
100
+ },
101
+ "time_encoder": {
102
+ "time_dim": 64,
103
+ "hdim": 256
104
+ },
105
+ "main_blocks": {
106
+ "n_blocks": 4,
107
+ "time_cond_layer": {
108
+ "idim": 512,
109
+ "time_dim": 64
110
+ },
111
+ "style_cond_layer": {
112
+ "idim": 512,
113
+ "style_dim": 256
114
+ },
115
+ "text_cond_layer": {
116
+ "idim": 512,
117
+ "text_dim": 256,
118
+ "n_heads": 8,
119
+ "n_units": 512,
120
+ "use_residual": true,
121
+ "rotary_base": 10000,
122
+ "rotary_scale": 10
123
+ },
124
+ "convnext_0": {
125
+ "idim": 512,
126
+ "ksz": 5,
127
+ "intermediate_dim": 2048,
128
+ "num_layers": 4,
129
+ "dilation_lst": [
130
+ 1,
131
+ 2,
132
+ 4,
133
+ 8
134
+ ]
135
+ },
136
+ "convnext_1": {
137
+ "idim": 512,
138
+ "ksz": 5,
139
+ "intermediate_dim": 2048,
140
+ "num_layers": 1,
141
+ "dilation_lst": [
142
+ 1
143
+ ]
144
+ },
145
+ "convnext_2": {
146
+ "idim": 512,
147
+ "ksz": 5,
148
+ "intermediate_dim": 2048,
149
+ "num_layers": 1,
150
+ "dilation_lst": [
151
+ 1
152
+ ]
153
+ }
154
+ },
155
+ "last_convnext": {
156
+ "idim": 512,
157
+ "ksz": 5,
158
+ "intermediate_dim": 2048,
159
+ "num_layers": 4,
160
+ "dilation_lst": [
161
+ 1,
162
+ 1,
163
+ 1,
164
+ 1
165
+ ]
166
+ },
167
+ "proj_out": {
168
+ "idim": 512,
169
+ "chunk_compress_factor": 6,
170
+ "ldim": 24
171
+ }
172
+ }
173
+ },
174
+ "ae": {
175
+ "sample_rate": 44100,
176
+ "n_delay": 0,
177
+ "base_chunk_size": 512,
178
+ "chunk_compress_factor": 1,
179
+ "ldim": 24,
180
+ "encoder": {
181
+ "spec_processor": {
182
+ "n_fft": 2048,
183
+ "win_length": 2048,
184
+ "hop_length": 512,
185
+ "n_mels": 228,
186
+ "sample_rate": 44100,
187
+ "eps": 1e-05,
188
+ "norm_mean": 0.0,
189
+ "norm_std": 1.0
190
+ },
191
+ "ksz_init": 7,
192
+ "ksz": 7,
193
+ "num_layers": 10,
194
+ "dilation_lst": [
195
+ 1,
196
+ 1,
197
+ 1,
198
+ 1,
199
+ 1,
200
+ 1,
201
+ 1,
202
+ 1,
203
+ 1,
204
+ 1
205
+ ],
206
+ "intermediate_dim": 2048,
207
+ "idim": 1253,
208
+ "hdim": 512,
209
+ "odim": 24
210
+ },
211
+ "decoder": {
212
+ "ksz_init": 7,
213
+ "ksz": 7,
214
+ "num_layers": 10,
215
+ "dilation_lst": [
216
+ 1,
217
+ 2,
218
+ 4,
219
+ 1,
220
+ 2,
221
+ 4,
222
+ 1,
223
+ 1,
224
+ 1,
225
+ 1
226
+ ],
227
+ "intermediate_dim": 2048,
228
+ "idim": 24,
229
+ "hdim": 512,
230
+ "head": {
231
+ "idim": 512,
232
+ "hdim": 2048,
233
+ "odim": 512,
234
+ "ksz": 3
235
+ }
236
+ }
237
+ },
238
+ "dp": {
239
+ "latent_dim": 24,
240
+ "chunk_compress_factor": 6,
241
+ "normalizer": {
242
+ "scale": 1.0
243
+ },
244
+ "sentence_encoder": {
245
+ "char_emb_dim": 64,
246
+ "text_embedder": {
247
+ "char_emb_dim": 64
248
+ },
249
+ "convnext": {
250
+ "idim": 64,
251
+ "ksz": 5,
252
+ "intermediate_dim": 256,
253
+ "num_layers": 6,
254
+ "dilation_lst": [
255
+ 1,
256
+ 1,
257
+ 1,
258
+ 1,
259
+ 1,
260
+ 1
261
+ ]
262
+ },
263
+ "attn_encoder": {
264
+ "hidden_channels": 64,
265
+ "filter_channels": 256,
266
+ "n_heads": 2,
267
+ "n_layers": 2,
268
+ "p_dropout": 0.0
269
+ },
270
+ "proj_out": {
271
+ "idim": 64,
272
+ "odim": 64
273
+ }
274
+ },
275
+ "style_encoder": {
276
+ "proj_in": {
277
+ "ldim": 24,
278
+ "chunk_compress_factor": 6,
279
+ "odim": 64
280
+ },
281
+ "convnext": {
282
+ "idim": 64,
283
+ "ksz": 5,
284
+ "intermediate_dim": 256,
285
+ "num_layers": 4,
286
+ "dilation_lst": [
287
+ 1,
288
+ 1,
289
+ 1,
290
+ 1
291
+ ]
292
+ },
293
+ "style_token_layer": {
294
+ "input_dim": 64,
295
+ "n_style": 8,
296
+ "style_key_dim": 0,
297
+ "style_value_dim": 16,
298
+ "prototype_dim": 64,
299
+ "n_units": 64,
300
+ "n_heads": 2
301
+ }
302
+ },
303
+ "predictor": {
304
+ "sentence_dim": 64,
305
+ "n_style": 8,
306
+ "style_dim": 16,
307
+ "hdim": 128,
308
+ "n_layer": 2
309
+ }
310
+ }
311
+ }
onnx/unicode_indexer.json ADDED
The diff for this file is too large to render. See raw diff
 
onnx/vector_estimator.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:883ac868ea0275ef0e991524dc64f16b3c0376efd7c320af6b53f5b780d7c61c
3
+ size 256534781
onnx/vocoder.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:085de76dd8e8d5836d6ca66826601f615939218f90e519f70ee8a36ed2a4c4ba
3
+ size 101424195
voice_styles/F1.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/F2.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/F3.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/F4.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/F5.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/M1.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/M2.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/M3.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/M4.json ADDED
The diff for this file is too large to render. See raw diff
 
voice_styles/M5.json ADDED
The diff for this file is too large to render. See raw diff