pgits commited on
Commit
d4fb4a2
·
verified ·
1 Parent(s): b1e2ad8

Migrate working code: Deploy README.md v1.0.1 to correct Space

Browse files
Files changed (1) hide show
  1. README.md +6 -199
README.md CHANGED
@@ -1,206 +1,13 @@
1
  ---
2
  title: STT GPU Service Python v4
3
  emoji: 🎙️
4
- colorFrom: blue
5
  colorTo: green
6
- sdk: docker
7
- app_port: 7860
8
- hardware: t4-small
9
- sleep_time_timeout: 1800
10
- suggested_storage: small
11
- force_rebuild: true
12
  ---
13
 
14
- # 🎙️ STT GPU Service Python v4
15
 
16
- Real-time Speech-to-Text service using Kyutai's Moshi model with ultra-low latency streaming.
17
-
18
- ## Features
19
-
20
- - **Real-time streaming**: 80ms audio chunk processing with WebSocket interface
21
- - **Low latency**: ~200ms end-to-end transcription latency
22
- - **Multi-language**: English and French support via `kyutai/stt-1b-en_fr`
23
- - **Dual interface**: WebSocket streaming + REST API for testing
24
- - **Production ready**: Optimized Docker image with pre-cached model
25
- - **Resource efficient**: Designed for T4 Small GPU with auto-sleep
26
-
27
- ## API Endpoints
28
-
29
- ### 🌐 WebSocket Streaming `/ws/stream`
30
- Primary interface for real-time speech recognition.
31
-
32
- **Expected format**: 16kHz mono PCM audio in 80ms chunks (2560 bytes per chunk)
33
-
34
- ```javascript
35
- const ws = new WebSocket('wss://your-space-url/ws/stream');
36
-
37
- ws.onopen = function() {
38
- console.log('Connected to STT service');
39
- };
40
-
41
- ws.onmessage = function(event) {
42
- const data = JSON.parse(event.data);
43
- if (data.type === 'transcription') {
44
- console.log('Transcription:', data.text);
45
- console.log('Chunks with timestamps:', data.chunks);
46
- }
47
- };
48
-
49
- // Send 80ms audio chunks
50
- ws.send(audioChunk); // 2560 bytes of 16-bit PCM data
51
- ```
52
-
53
- ### 📡 REST API `/transcribe`
54
- Testing endpoint for complete audio file processing.
55
-
56
- ```bash
57
- curl -X POST "https://your-space-url/transcribe" \
58
- -F "audio_file=@audio.wav" \
59
- -H "Content-Type: multipart/form-data"
60
- ```
61
-
62
- **Response**:
63
- ```json
64
- {
65
- "filename": "audio.wav",
66
- "transcription": "Hello, this is a test transcription.",
67
- "chunks": [
68
- {"text": "Hello,", "timestamp": [0.0, 0.5]},
69
- {"text": "this is a test", "timestamp": [0.5, 1.2]},
70
- {"text": "transcription.", "timestamp": [1.2, 2.0]}
71
- ],
72
- "timestamp": 1703123456.789
73
- }
74
- ```
75
-
76
- ### 💓 Health Check `/health`
77
- Service monitoring endpoint.
78
-
79
- ```bash
80
- curl https://your-space-url/health
81
- ```
82
-
83
- ## Technical Specifications
84
-
85
- - **Model**: `kyutai/stt-1b-en_fr` (1B parameters)
86
- - **Languages**: English, French
87
- - **Latency**: 0.5 second model delay + processing time
88
- - **Audio format**: 16kHz mono PCM
89
- - **Chunk size**: 80ms (2560 bytes)
90
- - **Max connections**: 2 concurrent WebSocket streams
91
- - **GPU**: Optimized for T4 Small
92
- - **Auto-sleep**: 30 minutes of inactivity
93
-
94
- ## Usage Examples
95
-
96
- ### Python WebSocket Client
97
- ```python
98
- import asyncio
99
- import websockets
100
- import json
101
- import wave
102
- import numpy as np
103
-
104
- async def stream_audio():
105
- uri = "wss://your-space-url/ws/stream"
106
-
107
- async with websockets.connect(uri) as websocket:
108
- # Load audio file
109
- with wave.open('audio.wav', 'rb') as wav_file:
110
- frames = wav_file.readframes(wav_file.getnframes())
111
- audio_data = np.frombuffer(frames, dtype=np.int16)
112
-
113
- # Send in 80ms chunks (1280 samples at 16kHz)
114
- chunk_size = 1280
115
- for i in range(0, len(audio_data), chunk_size):
116
- chunk = audio_data[i:i+chunk_size]
117
- await websocket.send(chunk.tobytes())
118
-
119
- # Receive transcription
120
- response = await websocket.recv()
121
- data = json.loads(response)
122
- if data.get('type') == 'transcription':
123
- print(f"Transcription: {data['text']}")
124
-
125
- asyncio.run(stream_audio())
126
- ```
127
-
128
- ### JavaScript Browser Client
129
- ```html
130
- <!DOCTYPE html>
131
- <html>
132
- <body>
133
- <button id="startBtn">Start Recording</button>
134
- <button id="stopBtn">Stop Recording</button>
135
- <div id="transcription"></div>
136
-
137
- <script>
138
- let ws, mediaRecorder, audioContext;
139
-
140
- document.getElementById('startBtn').onclick = async () => {
141
- // Connect WebSocket
142
- ws = new WebSocket('wss://your-space-url/ws/stream');
143
-
144
- // Get microphone access
145
- const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
146
- audioContext = new AudioContext({ sampleRate: 16000 });
147
- const source = audioContext.createMediaStreamSource(stream);
148
-
149
- // Process audio in 80ms chunks
150
- const processor = audioContext.createScriptProcessor(1280, 1, 1);
151
- processor.onaudioprocess = (e) => {
152
- const inputData = e.inputBuffer.getChannelData(0);
153
- const pcmData = new Int16Array(inputData.length);
154
-
155
- for (let i = 0; i < inputData.length; i++) {
156
- pcmData[i] = Math.max(-32768, Math.min(32767, inputData[i] * 32768));
157
- }
158
-
159
- if (ws.readyState === WebSocket.OPEN) {
160
- ws.send(pcmData.buffer);
161
- }
162
- };
163
-
164
- source.connect(processor);
165
- processor.connect(audioContext.destination);
166
-
167
- ws.onmessage = (event) => {
168
- const data = JSON.parse(event.data);
169
- if (data.type === 'transcription' && data.text) {
170
- document.getElementById('transcription').innerHTML += data.text + ' ';
171
- }
172
- };
173
- };
174
-
175
- document.getElementById('stopBtn').onclick = () => {
176
- if (ws) ws.close();
177
- if (audioContext) audioContext.close();
178
- };
179
- </script>
180
- </body>
181
- </html>
182
- ```
183
-
184
- ## Deployment
185
-
186
- This Space is configured for:
187
- - **Hardware**: T4 Small GPU
188
- - **Sleep timeout**: 30 minutes
189
- - **Docker**: Single-stage build with pre-cached model
190
- - **Port**: 7860
191
-
192
- ## Performance Notes
193
-
194
- - First request after cold start: ~30-60 seconds (model loading)
195
- - Subsequent requests: ~200ms latency
196
- - Concurrent connections: Maximum 2 WebSocket streams
197
- - Memory usage: ~6GB GPU memory, ~4GB RAM
198
-
199
- ## Error Handling
200
-
201
- The service includes comprehensive error handling:
202
- - Connection limits (max 2 concurrent)
203
- - Audio format validation
204
- - Model loading verification
205
- - Automatic reconnection support
206
- - Graceful WebSocket disconnection
 
1
  ---
2
  title: STT GPU Service Python v4
3
  emoji: 🎙️
4
+ colorFrom: blue
5
  colorTo: green
6
+ sdk: gradio
7
+ app_file: app.py
8
+ pinned: false
 
 
 
9
  ---
10
 
11
+ # STT GPU Service Python v4
12
 
13
+ Working deployment ready for STT model integration with kyutai/stt-1b-en_fr.