Spaces:

pgits
/

stt-gpu-service-python-v4

Runtime error

App Files Files Community

pgits commited on Sep 3, 2025

Commit

d4fb4a2

verified ·

1 Parent(s): b1e2ad8

Migrate working code: Deploy README.md v1.0.1 to correct Space

Browse files

Files changed (1) hide show

README.md +6 -199

README.md CHANGED Viewed

@@ -1,206 +1,13 @@
 ---
 title: STT GPU Service Python v4
 emoji: 🎙️
-colorFrom: blue
 colorTo: green
-sdk: docker
-app_port: 7860
-hardware: t4-small
-sleep_time_timeout: 1800
-suggested_storage: small
-force_rebuild: true
 ---
-# 🎙️ STT GPU Service Python v4
-Real-time Speech-to-Text service using Kyutai's Moshi model with ultra-low latency streaming.
-## Features
-- **Real-time streaming**: 80ms audio chunk processing with WebSocket interface
-- **Low latency**: ~200ms end-to-end transcription latency
-- **Multi-language**: English and French support via `kyutai/stt-1b-en_fr`
-- **Dual interface**: WebSocket streaming + REST API for testing
-- **Production ready**: Optimized Docker image with pre-cached model
-- **Resource efficient**: Designed for T4 Small GPU with auto-sleep
-## API Endpoints
-### 🌐 WebSocket Streaming `/ws/stream`
-Primary interface for real-time speech recognition.
-**Expected format**: 16kHz mono PCM audio in 80ms chunks (2560 bytes per chunk)
-```javascript
-const ws = new WebSocket('wss://your-space-url/ws/stream');
-ws.onopen = function() {
-    console.log('Connected to STT service');
-};
-ws.onmessage = function(event) {
-    const data = JSON.parse(event.data);
-    if (data.type === 'transcription') {
-        console.log('Transcription:', data.text);
-        console.log('Chunks with timestamps:', data.chunks);
-    }
-};
-// Send 80ms audio chunks
-ws.send(audioChunk); // 2560 bytes of 16-bit PCM data
-```
-### 📡 REST API `/transcribe`
-Testing endpoint for complete audio file processing.
-```bash
-curl -X POST "https://your-space-url/transcribe" \
-     -F "audio_file=@audio.wav" \
-     -H "Content-Type: multipart/form-data"
-```
-**Response**:
-```json
-{
-    "filename": "audio.wav",
-    "transcription": "Hello, this is a test transcription.",
-    "chunks": [
-        {"text": "Hello,", "timestamp": [0.0, 0.5]},
-        {"text": "this is a test", "timestamp": [0.5, 1.2]},
-        {"text": "transcription.", "timestamp": [1.2, 2.0]}
-    ],
-    "timestamp": 1703123456.789
-}
-```
-### 💓 Health Check `/health`
-Service monitoring endpoint.
-```bash
-curl https://your-space-url/health
-```
-## Technical Specifications
-- **Model**: `kyutai/stt-1b-en_fr` (1B parameters)
-- **Languages**: English, French
-- **Latency**: 0.5 second model delay + processing time
-- **Audio format**: 16kHz mono PCM
-- **Chunk size**: 80ms (2560 bytes)
-- **Max connections**: 2 concurrent WebSocket streams
-- **GPU**: Optimized for T4 Small
-- **Auto-sleep**: 30 minutes of inactivity
-## Usage Examples
-### Python WebSocket Client
-```python
-import asyncio
-import websockets
-import json
-import wave
-import numpy as np
-async def stream_audio():
-    uri = "wss://your-space-url/ws/stream"
-    async with websockets.connect(uri) as websocket:
-        # Load audio file
-        with wave.open('audio.wav', 'rb') as wav_file:
-            frames = wav_file.readframes(wav_file.getnframes())
-            audio_data = np.frombuffer(frames, dtype=np.int16)
-        # Send in 80ms chunks (1280 samples at 16kHz)
-        chunk_size = 1280
-        for i in range(0, len(audio_data), chunk_size):
-            chunk = audio_data[i:i+chunk_size]
-            await websocket.send(chunk.tobytes())
-            # Receive transcription
-            response = await websocket.recv()
-            data = json.loads(response)
-            if data.get('type') == 'transcription':
-                print(f"Transcription: {data['text']}")
-asyncio.run(stream_audio())
-```
-### JavaScript Browser Client
-```html
-<!DOCTYPE html>
-<html>
-<body>
-    <button id="startBtn">Start Recording</button>
-    <button id="stopBtn">Stop Recording</button>
-    <div id="transcription"></div>
-    <script>
-        let ws, mediaRecorder, audioContext;
-        document.getElementById('startBtn').onclick = async () => {
-            // Connect WebSocket
-            ws = new WebSocket('wss://your-space-url/ws/stream');
-            // Get microphone access
-            const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
-            audioContext = new AudioContext({ sampleRate: 16000 });
-            const source = audioContext.createMediaStreamSource(stream);
-            // Process audio in 80ms chunks
-            const processor = audioContext.createScriptProcessor(1280, 1, 1);
-            processor.onaudioprocess = (e) => {
-                const inputData = e.inputBuffer.getChannelData(0);
-                const pcmData = new Int16Array(inputData.length);
-                for (let i = 0; i < inputData.length; i++) {
-                    pcmData[i] = Math.max(-32768, Math.min(32767, inputData[i] * 32768));
-                }
-                if (ws.readyState === WebSocket.OPEN) {
-                    ws.send(pcmData.buffer);
-                }
-            };
-            source.connect(processor);
-            processor.connect(audioContext.destination);
-            ws.onmessage = (event) => {
-                const data = JSON.parse(event.data);
-                if (data.type === 'transcription' && data.text) {
-                    document.getElementById('transcription').innerHTML += data.text + ' ';
-                }
-            };
-        };
-        document.getElementById('stopBtn').onclick = () => {
-            if (ws) ws.close();
-            if (audioContext) audioContext.close();
-        };
-    </script>
-</body>
-</html>
-```
-## Deployment
-This Space is configured for:
-- **Hardware**: T4 Small GPU
-- **Sleep timeout**: 30 minutes
-- **Docker**: Single-stage build with pre-cached model
-- **Port**: 7860
-## Performance Notes
-- First request after cold start: ~30-60 seconds (model loading)
-- Subsequent requests: ~200ms latency
-- Concurrent connections: Maximum 2 WebSocket streams
-- Memory usage: ~6GB GPU memory, ~4GB RAM
-## Error Handling
-The service includes comprehensive error handling:
-- Connection limits (max 2 concurrent)
-- Audio format validation
-- Model loading verification
-- Automatic reconnection support
-- Graceful WebSocket disconnection

 ---
 title: STT GPU Service Python v4
 emoji: 🎙️
+colorFrom: blue
 colorTo: green
+sdk: gradio
+app_file: app.py
+pinned: false
 ---
+# STT GPU Service Python v4
+Working deployment ready for STT model integration with kyutai/stt-1b-en_fr.