Spaces:

tarinmoy
/

AutoShorts-Engine

Sleeping

App Files Files Community

tarinmoy commited on Mar 22

Commit

93e8e20

verified ·

1 Parent(s): 406faad

Upload 20 files

Browse files

Files changed (20) hide show

.env +20 -0
.env.example +27 -0
.gitignore +7 -0
README.md +26 -6
README_LOCAL.md +94 -0
agent.py +328 -0
app.py +642 -0
asset_checker.py +198 -0
debug_assembly.py +51 -0
main.py +192 -0
media_fetcher.py +206 -0
packages.txt +1 -0
requirements.txt +25 -0
test_hindi.py +21 -0
test_moviepy.py +7 -0
test_moviepy_safe.py +15 -0
test_pexels.py +35 -0
test_repair.py +22 -0
video_assembler.py +806 -0
voice_generator.py +259 -0

.env ADDED Viewed

	@@ -0,0 +1,20 @@

+# ============================================================
+# Autonomous Short-Form Video Engine — Environment Variables
+# ============================================================
+# ── AI Brain (Nemotron-3 Super via OpenRouter) ─────────────
+OPENROUTER_API_KEY=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
+# ── Asset Check (Nemotron-2 VL via NVIDIA NIM) ────────────
+# Currently skipping VL QA (use checkbox in UI) or get key from build.nvidia.com
+NVIDIA_API_KEY=your_nvidia_api_key_here
+# ── Media (Pexels) ───────────────────────────────────────
+PEXELS_API_KEY=GNbkCWpjvQ7sdYj5SJLPDU68QUDvKQcfYMkZrBEJoJzyoyILuX3VZ9B0
+# ── Voiceover (Edge TTS - Free & No Key Needed) ──────────
+# We are using Edge-TTS because you don't have a Google Cloud key.
+USE_EDGE_TTS=true
+# ── Optional: Background music volume (0.0 to 1.0) ───────
+BGM_VOLUME=0.08

.env.example ADDED Viewed

	@@ -0,0 +1,27 @@

+# ============================================================
+# Autonomous Short-Form Video Engine — Environment Variables
+# ============================================================
+# 1. Copy this file to .env
+# 2. Fill in your actual API keys below
+# 3. Never commit .env to git
+# ── AI Brain (Nemotron-3 Super via OpenRouter) ─────────────
+# Get from: https://openrouter.ai/keys  (free, no credit card)
+OPENROUTER_API_KEY=your_openrouter_api_key_here
+# ── Asset Check (Nemotron-2 VL via NVIDIA NIM) ────────────
+# Get from: https://build.nvidia.com  (free credits)
+NVIDIA_API_KEY=your_nvidia_api_key_here
+# ── Media (Pexels) ───────────────────────────────────────
+# Get from: https://www.pexels.com/api/  (free)
+PEXELS_API_KEY=your_pexels_api_key_here
+# ── Voiceover (Google Cloud TTS) ─────────────────────────
+# 1. Create a GCP project, enable Cloud Text-to-Speech API
+# 2. Create a service account, download the JSON key
+# 3. Set the path to that JSON file here:
+GOOGLE_APPLICATION_CREDENTIALS=path/to/your-gcp-key.json
+# ── Optional: Background music volume (0.0 to 1.0) ───────
+BGM_VOLUME=0.08

.gitignore ADDED Viewed

	@@ -0,0 +1,7 @@

+.env
+assets/
+output/
+__pycache__/
+*.pyc
+*.pyo
+.DS_Store

README.md CHANGED Viewed

@@ -1,13 +1,33 @@
 ---
-title: AutoShorts Engine
-emoji: 🏆
-colorFrom: gray
-colorTo: red
 sdk: gradio
-sdk_version: 6.9.0
 app_file: app.py
 pinned: false
 license: apache-2.0
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: AutoShorts - Autonomous Video Engine
+emoji: 🎬
+colorFrom: purple
+colorTo: indigo
 sdk: gradio
+sdk_version: 4.44.1
 app_file: app.py
 pinned: false
 license: apache-2.0
 ---
+# 🎬 AutoShorts: Autonomous Video Engine
+This is the official deployment of **AutoShorts**, an elite autonomous engine for viral short-form content.
+## 🚀 Deployment Instructions
+To run this Space, you must add the following **Secret Environment Variables**:
+1. `OPENROUTER_API_KEY`: Your OpenRouter API key for the AI brain.
+2. `PEXELS_API_KEY`: Your PEXELS API key for fetching video clips.
+3. `GOOGLE_APPLICATION_CREDENTIALS_JSON`: (Optional) Your Google Cloud Service Account JSON for premium TTS.
+4. `USE_EDGE_TTS`: Set to `true` if you want to use the free Edge TTS instead of Google Cloud.
+## 🛠 Features
+- **AI Brain**: Powered by Llama 3.3 / Gemini via OpenRouter.
+- **Visuals**: Dynamic portrait clips from Pexels.
+- **Voice**: Humanized text-to-speech.
+- **Assembly**: Professional video rendering with MoviePy.
+---
+Developed with ❤️ by the AutoShorts Team.

README_LOCAL.md ADDED Viewed

	@@ -0,0 +1,94 @@

+# 🎬 Autonomous Short-Form Video Engine
+> AI-powered pipeline that transforms a niche into a production-ready short-form video in minutes.
+**Powered by:** Gemini AI · edge-tts · Pexels API · MoviePy · Gradio
+---
+## ⚡ Quick Start
+### 1. Install Python Dependencies
+```bash
+pip install -r requirements.txt
+```
+### 2. Install FFmpeg (required for MoviePy)
+Download from https://ffmpeg.org/download.html → add to your system PATH.
+### 3. Configure API Keys
+```bash
+copy .env.example .env
+# Then edit .env with your keys:
+# GEMINI_API_KEY=...  → https://aistudio.google.com/app/apikey
+# PEXELS_API_KEY=...  → https://www.pexels.com/api/
+```
+### 4. Launch the Web UI
+```bash
+python app.py
+# Open: http://localhost:7860
+```
+### 4b. Or use the CLI
+```bash
+# Full video generation
+python main.py --niche "AI Tools"
+# With specific topic
+python main.py --niche "Motivation" --topic "Why 99% fail at their goals"
+# JSON only (no rendering)
+python main.py --niche "Wealth" --dry-run
+```
+---
+## 🏗 Architecture
+```
+Niche Input
+    ↓
+agent.py          ← Gemini AI (Script + Scenes + SEO in JSON)
+    ↓
+voice_generator.py ← edge-tts (free neural TTS per scene)
+media_fetcher.py   ← Pexels API (portrait video/image per scene)
+    ↓
+video_assembler.py ← MoviePy (9:16 MP4 with captions + audio)
+    ↓
+output/ folder     ← Final MP4 + metadata JSON
+```
+---
+## 📁 Project Structure
+| File | Purpose |
+|---|---|
+| `agent.py` | Gemini AI brain — generates strict JSON |
+| `voice_generator.py` | Neural TTS via edge-tts (calm/energetic/monotone) |
+| `media_fetcher.py` | Pexels video/image fetcher with local caching |
+| `video_assembler.py` | MoviePy assembly — captions, 9:16 crop, audio sync |
+| `main.py` | CLI orchestrator |
+| `app.py` | Gradio web UI |
+| `requirements.txt` | Python dependencies |
+| `.env.example` | API key template |
+---
+## 🎯 Supported Niches
+- AI Tools
+- Motivation
+- Wealth & Finance
+- Mind-Blowing Facts
+- Productivity
+- Crypto & Web3
+---
+## 📤 Output
+Each run produces:
+- `output/<title>_<timestamp>.mp4` — 1080×1920 portrait video (Shorts/Reels ready)
+- `output/metadata_<timestamp>.json` — Full AI-generated content + SEO metadata

agent.py ADDED Viewed

	@@ -0,0 +1,328 @@

+"""
+agent.py
+─────────────────────────────────────────────────────────────
+Autonomous Short-Form Video Engine — AI Brain
+Uses Nemotron-3 Super via OpenRouter (free tier) to transform
+a niche topic into a production-ready structured JSON package.
+─────────────────────────────────────────────────────────────
+"""
+import os
+import re
+import json
+import time
+import logging
+from openai import OpenAI
+from dotenv import load_dotenv
+load_dotenv()
+logger = logging.getLogger(__name__)
+# ── OpenRouter client setup ───────────────────────────────
+OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY", "")
+# Fallback models (VERIFIED ALIVE for free tier)
+MODELS = [
+    "meta-llama/llama-3.3-70b-instruct:free",
+    "mistralai/mistral-small-3.1-24b-instruct:free",
+    "google/gemma-3-27b-it:free",
+    "minimax/minimax-m2.5:free",
+    "nvidia/nemotron-3-super-120b-a12b:free",
+]
+DEFAULT_MODEL = MODELS[0]
+client = OpenAI(
+    base_url="https://openrouter.ai/api/v1",
+    api_key=OPENROUTER_API_KEY,
+    default_headers={
+        "HTTP-Referer": "https://automate-ai.local",
+        "X-Title": "Automate AI Video Engine",
+    },
+)
+# ── Language Support ──────────────────────────────────────
+LANGUAGE_MAP = {
+    "English":  "English (Standard)",
+    "Hindi":    "Hindi (written in Devanagari script)",
+    "Hinglish": "Hinglish (a natural mix of Hindi and English, written in Latin/Roman script)",
+}
+# ── System Prompt ─────────────────────────────────────────
+SYSTEM_PROMPT = """You are a short-form video scriptwriter.
+Transform the topic into a production-ready JSON package in {language_desc}.
+Output ONLY valid JSON. No markdown, no filler.
+Schema:
+{{
+  "niche": "string",
+  "hook": {{ "text": "string (≤15 words in {language_desc})", "duration_seconds": 3 }},
+  "scenes": [
+    {{
+      "scene_number": 1,
+      "type": "hook | meat | cta",
+      "script_text": "vocal script in {language_desc}",
+      "on_screen_text": "short caption in {language_desc}",
+      "duration_seconds": 5,
+      "pexels_keywords": ["English keywords"],
+      "visual_description": "English description"
+    }}
+  ],
+  "voiceover_settings": {{ "mood": "energetic", "gender_preference": "male" }},
+  "seo": {{ "title": "Title in {language_desc}", "description": "...", "hashtags": [], "keywords": [] }},
+  "total_duration_seconds": 60
+}}
+"""
+USER_TEMPLATE = "Create a short-form video JSON for: {niche} (Style: {style}, Lang: {language_desc})."
+SCRIPT_TO_JSON_PROMPT = """You are a script-to-video parser.
+Convert the provided raw text script into a scene-by-scene package.
+Target Language for text/captions: {language_desc}
+Provide:
+1. script_text: Exact portion in {language_desc}.
+2. on_screen_text: Small caption in {language_desc}.
+3. pexels_keywords: Search terms (ALWAYS English).
+4. visual_description: Visuals (ALWAYS English).
+JSON ONLY matching:
+{{
+  "niche": "Custom Script",
+  "scenes": [
+    {{
+      "scene_number": 1,
+      "script_text": "string",
+      "on_screen_text": "string",
+      "duration_seconds": 5,
+      "pexels_keywords": ["kw1"],
+      "visual_description": "English description"
+    }}
+  ],
+  "voiceover_settings": {{"mood": "energetic", "gender_preference": "male"}},
+  "seo": {{"title": "Title in {language_desc}", "description": "...", "hashtags": [], "keywords": []}},
+  "total_duration_seconds": 60
+}}
+SCRIPT:
+{script}
+"""
+def generate_video_package(niche: str, style: str = "engaging and educational", language: str = "English", model: str = None) -> dict:
+    """
+    Call Nemotron-3 Super via OpenRouter to generate the full video JSON.
+    """
+    if not OPENROUTER_API_KEY:
+        raise EnvironmentError("OPENROUTER_API_KEY is not set.")
+    # Model rotation logic: start with requested, then try the pool
+    model_queue = MODELS[:]
+    if model and model in model_queue:
+        model_queue.remove(model)
+        model_queue.insert(0, model)
+    elif model:
+        model_queue.insert(0, model)
+    lang_desc = LANGUAGE_MAP.get(language, "English")
+    sys_prompt = SYSTEM_PROMPT.format(language_desc=lang_desc)
+    user_prompt = USER_TEMPLATE.format(niche=niche, style=style, language_desc=lang_desc)
+    last_error = None
+    for attempt in range(len(model_queue) * 2): # Try each model twice if needed
+        active_model = model_queue[attempt % len(model_queue)]
+        logger.info(f"[Agent] Attempt {attempt+1} — using {active_model}...")
+        try:
+            response = client.chat.completions.create(
+                model=active_model,
+                messages=[
+                    {"role": "system", "content": sys_prompt},
+                    {"role": "user", "content": user_prompt},
+                ],
+                temperature=0.7,
+                max_tokens=2500,
+            )
+            content = response.choices[0].message.content
+            if not content:
+                raise ValueError("Model returned empty content")
+            raw = content.strip()
+            # Strip markdown code fences if model wraps in ```json ... ```
+            if raw.startswith("```"):
+                raw = raw.split("```")[1]
+                if raw.startswith("json"):
+                    raw = raw[4:]
+                raw = raw.strip()
+            data = _robust_json_parse(raw)
+            data["language"] = language # Store for downstream use
+            logger.info(f"[Agent] ✅ JSON generated ({language}) and repaired successfully.")
+            _validate_schema(data)
+            return data
+        except Exception as e:
+            last_error = str(e)
+            logger.warning(f"[Agent] Model {active_model} failed: {e}")
+            time.sleep(1.5) # Short wait before next model
+            continue
+    raise ValueError(f"CRITICAL: All AI models failed or rate-limited. Last error: {last_error}")
+def parse_script_into_video_package(script: str, language: str = "English", model: str = None) -> dict:
+    """
+    Take a raw user script and use AI to parse it into scene-by-scene JSON.
+    """
+    if not OPENROUTER_API_KEY:
+        raise EnvironmentError("OPENROUTER_API_KEY is not set.")
+    model_queue = [model] if model else MODELS[:]
+    lang_desc = LANGUAGE_MAP.get(language, "English")
+    prompt = SCRIPT_TO_JSON_PROMPT.format(script=script, language_desc=lang_desc)
+    last_error = None
+    for attempt in range(len(model_queue) * 2):
+        active_model = model_queue[attempt % len(model_queue)]
+        logger.info(f"[Agent] Attempt {attempt+1} — Parsing with {active_model}...")
+        try:
+            response = client.chat.completions.create(
+                model=active_model,
+                messages=[
+                    {"role": "system", "content": "You are a specialized script parser. Output ONLY JSON."},
+                    {"role": "user", "content": prompt},
+                ],
+                temperature=0.3,
+                max_tokens=2500,
+            )
+            content = response.choices[0].message.content
+            if not content:
+                raise ValueError("Model returned empty content")
+            raw = content.strip()
+            if raw.startswith("```"):
+                raw = raw.split("```")[1]
+                if raw.startswith("json"):
+                    raw = raw[4:]
+                raw = raw.strip()
+            data = _robust_json_parse(raw)
+            data["language"] = language
+            logger.info(f"[Agent] ✅ Script parsed ({language}) and repaired.")
+            _validate_schema(data)
+            return data
+        except json.JSONDecodeError as e:
+            logger.warning(f"[Agent] Script parse failed on attempt {attempt}: {e}")
+            last_error = e
+            if attempt < 3:
+                time.sleep(2 ** attempt)
+        except Exception as e:
+            logger.error(f"[Agent] Script parse API error: {e}")
+            last_error = e
+            if attempt < 3:
+                time.sleep(2 ** attempt)
+    raise ValueError(f"Failed to parse script into JSON. Last error: {last_error}")
+def _robust_json_parse(raw: str) -> dict:
+    """
+    Extract JSON from text and attempt to repair if it's truncated.
+    """
+    # 1. Extract the actual JSON block using regex (find first { and last })
+    # If the response has trailing text or headers, this strips them.
+    match = re.search(r'(\{.*\})', raw, re.DOTALL)
+    if match:
+        raw = match.group(1)
+    else:
+        # If no closing brace, try finding the start and manually closing
+        start_idx = raw.find('{')
+        if start_idx != -1:
+            raw = raw[start_idx:]
+        else:
+            raise ValueError("No JSON object found in response")
+    try:
+        return json.loads(raw)
+    except json.JSONDecodeError:
+        # 2. Attempt Auto-Repair for truncated JSON
+        repaired = _repair_json(raw)
+        try:
+            return json.loads(repaired)
+        except json.JSONDecodeError as e:
+            logger.error(f"[Agent] JSON Repair failed: {e}\nRaw start: {raw[:100]}...\nRaw end: {raw[-100:]}")
+            raise ValueError(f"Failed to parse or repair JSON: {e}")
+def _repair_json(raw: str) -> str:
+    """
+    Extremely robust JSON repair for truncated strings, objects and arrays.
+    """
+    # Remove trailing commas and whitespace that cause issues
+    raw = raw.strip()
+    # Fix unterminated strings
+    # If the last character is not " or }, and there was an unclosed quote...
+    if raw.count('"') % 2 != 0:
+        raw += '"'
+    # Balance brackets and braces
+    # Count open vs closed
+    stack = []
+    in_string = False
+    escape = False
+    for i, char in enumerate(raw):
+        if char == '"' and not escape:
+            in_string = not in_string
+        if in_string:
+            if char == '\\':
+                escape = not escape
+            else:
+                escape = False
+            continue
+        if char == '{': stack.append('}')
+        elif char == '[': stack.append(']')
+        elif char == '}' or char == ']':
+            if stack and stack[-1] == char:
+                stack.pop()
+    # Close everything in reverse order
+    while stack:
+        raw += stack.pop()
+    return raw
+def _validate_schema(data: dict) -> None:
+    """Basic schema validation — raises KeyError if required fields missing."""
+    required_top = ["niche", "scenes", "voiceover_settings", "seo", "total_duration_seconds"]
+    for key in required_top:
+        if key not in data:
+            raise KeyError(f"Missing required key in agent output: '{key}'")
+    if not isinstance(data["scenes"], list) or len(data["scenes"]) == 0:
+        raise ValueError("'scenes' must be a non-empty list")
+    scene_required = ["scene_number", "script_text", "on_screen_text",
+                      "duration_seconds", "pexels_keywords"]
+    for i, scene in enumerate(data["scenes"]):
+        for key in scene_required:
+            if key not in scene:
+                raise KeyError(f"Scene {i+1} missing required key: '{key}'")
+    seo_required = ["title", "description", "hashtags", "keywords"]
+    for key in seo_required:
+        if key not in data["seo"]:
+            raise KeyError(f"Missing SEO key: '{key}'")
+# ── CLI Quick Test ────────────────────────────────────────
+if __name__ == "__main__":
+    import sys
+    niche = sys.argv[1] if len(sys.argv) > 1 else "AI Productivity Tools"
+    print(f"\n🧠 Generating video package for: '{niche}'\n")
+    result = generate_video_package(niche)
+    print(json.dumps(result, indent=2))
+    print(f"\n✅ Total scenes: {len(result['scenes'])}")
+    print(f"⏱  Total duration: {result['total_duration_seconds']}s")

app.py ADDED Viewed

	@@ -0,0 +1,642 @@

+"""
+app.py
+─────────────────────────────────────────────────────────────
+Autonomous Short-Form Video Engine — Gradio Web UI
+Premium dark-mode interface with live status logs,
+embedded video player, SEO panel, and download button.
+Run: python app.py  →  http://localhost:7860
+─────────────────────────────────────────────────────────────
+"""
+import json
+import logging
+import traceback
+from pathlib import Path
+from datetime import datetime
+import gradio as gr
+from dotenv import load_dotenv
+load_dotenv()
+logging.basicConfig(level=logging.INFO)
+# ── Niche presets ─────────────────────────────────────────
+NICHE_PRESETS = [
+    "AI Productivity Tools",
+    "Morning Routine Hacks",
+    "Passive Income Ideas",
+    "Fitness Motivation",
+    "Crypto & Web3 Trends",
+    "Mental Health Tips",
+    "Travel on a Budget",
+    "Cooking Life Hacks",
+    "Self-Improvement",
+    "Tech Gadgets Review",
+]
+STYLE_OPTIONS = [
+    "engaging and educational",
+    "hype and energetic",
+    "calm and inspirational",
+    "funny and relatable",
+    "informative and authoritative",
+]
+EDITING_STYLES = ["motion_graphics", "montage", "documentary", "social_media"]
+EDITING_STYLE_META = {
+    "motion_graphics": {
+        "label": "🎨 Motion Graphics",
+        "desc": "Kinetic text animations, slow zoom, cinematic grade, pill captions",
+        "color": "#7c3aed",
+    },
+    "montage": {
+        "label": "⚡ Montage",
+        "desc": "Fast cuts, zoom punches, speed ramps, flash transitions, impact font",
+        "color": "#dc2626",
+    },
+    "documentary": {
+        "label": "🎥 Documentary",
+        "desc": "Ken Burns, crossfades, lower-thirds, slow burn, cold grade",
+        "color": "#0369a1",
+    },
+    "social_media": {
+        "label": "📲 Social Media",
+        "desc": "Karaoke captions, glitch hook, zoom punch, TikTok/Reels style",
+        "color": "#db2777",
+    },
+}
+# ── Voice Model Meta ─────────────────────────────────────
+from voice_generator import PREMIUM_VOICES
+VOICE_CHOICES = [(v[2], k) for k, v in PREMIUM_VOICES.items()]
+VOICE_CHOICES.insert(0, ("Auto-Select (Recommended)", "auto"))
+# ── Custom CSS ────────────────────────────────────────────
+CUSTOM_CSS = """
+/* ─── Base & backdrop ─── */
+.gradio-container {
+    background: radial-gradient(circle at top right, #1a1a2e, #0a0a0f) !important;
+    min-height: 100vh;
+    font-family: 'Outfit', 'Inter', system-ui, sans-serif !important;
+    color: #f8fafc !important;
+}
+/* ─── Ultra Premium Header ─── */
+#hero-header {
+    background: rgba(255, 255, 255, 0.02);
+    backdrop-filter: blur(20px);
+    border-radius: 28px;
+    padding: 60px 40px;
+    margin-bottom: 32px;
+    border: 1px solid rgba(168, 85, 247, 0.25);
+    box-shadow: 0 20px 50px rgba(0, 0, 0, 0.4), inset 0 0 20px rgba(168, 85, 247, 0.05);
+    position: relative;
+    overflow: hidden;
+}
+#hero-header::before {
+    content: '';
+    position: absolute;
+    top: -50%;
+    left: -20%;
+    width: 140%;
+    height: 140%;
+    background: radial-gradient(circle, rgba(124, 58, 237, 0.1) 0%, transparent 60%);
+    pointer-events: none;
+}
+#hero-header h1 {
+    font-size: 4rem !important;
+    font-weight: 900 !important;
+    background: linear-gradient(135deg, #fff 0%, #a78bfa 50%, #6366f1 100%);
+    -webkit-background-clip: text;
+    -webkit-text-fill-color: transparent;
+    margin-bottom: 12px !important;
+    letter-spacing: -2.5px !important;
+    line-height: 1 !important;
+}
+#hero-header p {
+    color: #94a3b8 !important;
+    font-size: 1.25rem !important;
+    max-width: 650px;
+    line-height: 1.6 !important;
+    margin-bottom: 24px !important;
+}
+.badge-premium {
+    display: inline-flex;
+    align-items: center;
+    background: linear-gradient(90deg, #a855f7, #6366f1);
+    color: white;
+    padding: 4px 12px;
+    border-radius: 100px;
+    font-size: 0.7rem;
+    font-weight: 700;
+    text-transform: uppercase;
+    letter-spacing: 1px;
+    margin-bottom: 20px;
+    box-shadow: 0 4px 12px rgba(168, 85, 247, 0.4);
+}
+.powered-by {
+    display: flex;
+    gap: 12px;
+    align-items: center;
+    font-size: 0.85rem !important;
+    color: #64748b !important;
+}
+.powered-by span {
+    width: 6px;
+    height: 6px;
+    background: #475569;
+    border-radius: 50%;
+}
+/* ─── Login Page Custom Styling ─── */
+#login-container {
+    max-width: 1000px;
+    margin: 10vh auto;
+    padding: 60px 40px;
+    background: rgba(255, 255, 255, 0.02);
+    backdrop-filter: blur(25px);
+    border-radius: 32px;
+    border: 1px solid rgba(255, 255, 255, 0.1);
+    text-align: center;
+    box-shadow: 0 30px 60px rgba(0,0,0,0.5);
+}
+#login-container h1 {
+    font-size: 3rem !important;
+    font-weight: 900 !important;
+    margin-bottom: 20px !important;
+    background: linear-gradient(135deg, #fff 0%, #a78bfa 100%);
+    -webkit-background-clip: text;
+    -webkit-text-fill-color: transparent;
+}
+#login-container p {
+    color: #94a3b8 !important;
+    font-size: 1.2rem !important;
+    margin-bottom: 40px !important;
+}
+#google-btn {
+    display: inline-flex;
+    align-items: center;
+    justify-content: center;
+    background: white !important;
+    color: #1f2937 !important;
+    padding: 14px 28px !important;
+    border-radius: 12px !important;
+    font-weight: 700 !important;
+    font-size: 1.1rem !important;
+    cursor: pointer;
+    transition: all 0.3s cubic-bezier(0.4, 0, 0.2, 1);
+    box-shadow: 0 4px 20px rgba(0,0,0,0.2);
+    border: none !important;
+    width: auto !important;
+    margin: 0 auto;
+}
+#google-btn:hover {
+    transform: translateY(-4px) scale(1.02);
+    box-shadow: 0 12px 30px rgba(168, 85, 247, 0.3);
+}
+#google-btn svg {
+    margin-right: 12px;
+    width: 24px;
+    height: 24px;
+}
+/* ─── Panel cards (Glassmorphism) ─── */
+.panel-box, .gr-form, .gr-box {
+    background: rgba(255, 255, 255, 0.03) !important;
+    border: 1px solid rgba(255, 255, 255, 0.08) !important;
+    border-radius: 20px !important;
+    backdrop-filter: blur(12px) saturate(180%);
+    box-shadow: 0 8px 32px 0 rgba(0, 0, 0, 0.37);
+}
+/* ─── Generate button (Premium Gradient + Animation) ─── */
+#gen-btn {
+    background: linear-gradient(135deg, #a855f7 0%, #6366f1 100%) !important;
+    border: none !important;
+    border-radius: 14px !important;
+    font-size: 18px !important;
+    font-weight: 800 !important;
+    color: white !important;
+    padding: 16px !important;
+    cursor: pointer !important;
+    transition: all 0.4s cubic-bezier(0.175, 0.885, 0.32, 1.275) !important;
+    box-shadow: 0 10px 25px -5px rgba(168, 85, 247, 0.4) !important;
+    text-transform: uppercase;
+    letter-spacing: 1px;
+}
+#gen-btn:hover {
+    transform: scale(1.02) translateY(-3px) !important;
+    box-shadow: 0 20px 35px -5px rgba(168, 85, 247, 0.6) !important;
+    filter: brightness(1.1);
+}
+#gen-btn:active {
+    transform: scale(0.98);
+}
+/* ─── Header Typography ─── */
+#hero h1 {
+    font-size: 3.2rem !important;
+    font-weight: 900 !important;
+    background: linear-gradient(to right, #ffffff, #a78bfa, #818cf8);
+    -webkit-background-clip: text !important;
+    -webkit-text-fill-color: transparent !important;
+    letter-spacing: -1px;
+    margin-bottom: 8px;
+}
+#hero p {
+    color: #94a3b8 !important;
+    font-size: 1.1rem;
+    max-width: 600px;
+    margin: 0 auto;
+}
+/* ─── Log box (Terminal Aesthetic) ─── */
+#log-box textarea {
+    background: rgba(0, 0, 0, 0.4) !important;
+    color: #34d399 !important;
+    font-family: 'Fira Code', 'JetBrains Mono', monospace !important;
+    border: 1px solid rgba(52, 211, 153, 0.2) !important;
+    border-radius: 12px !important;
+    line-height: 1.6;
+}
+/* ─── Inputs & Dropdowns ─── */
+.gr-dropdown, .gr-textbox, .gr-radio {
+    background: rgba(255, 255, 255, 0.05) !important;
+    border: 1px solid rgba(255, 255, 255, 0.1) !important;
+    border-radius: 12px !important;
+    transition: border-color 0.3s ease;
+}
+.gr-dropdown:focus-within, .gr-textbox:focus-within {
+    border-color: #a78bfa !important;
+}
+/* ─── Video Output ─── */
+video {
+    border: 4px solid rgba(167, 139, 250, 0.1) !important;
+    border-radius: 24px !important;
+    box-shadow: 0 0 60px rgba(124, 58, 237, 0.25) !important;
+    background: #000;
+}
+/* ─── Status Badge ─── */
+#status-box {
+    background: rgba(167, 139, 250, 0.1) !important;
+    border: 1px solid rgba(167, 139, 250, 0.2) !important;
+    color: #c4b5fd !important;
+    font-weight: 700;
+}
+"""
+def _log(msg: str, log_state: list) -> str:
+    ts = datetime.now().strftime("%H:%M:%S")
+    entry = f"[{ts}] {msg}"
+    log_state.append(entry)
+    return "\n".join(log_state)
+def generate_video(
+    input_mode: str,
+    language: str,
+    brain_model: str,
+    niche_preset: str,
+    niche_custom: str,
+    full_script: str,
+    voice_model: str,
+    style: str,
+    editing_style: str,
+    dry_run: bool,
+    skip_check: bool,
+    progress=gr.Progress(track_tqdm=True),
+):
+    """
+    Main generation function called by Gradio.
+    """
+    log_state = []
+    def log(msg): return _log(msg, log_state)
+    # ── Resolve Input ─────────────────────────────────────
+    if input_mode == "Topic":
+        niche = niche_custom.strip() if niche_custom.strip() else niche_preset
+        if not niche:
+            yield log("❌ Please select or enter a niche topic."), None, "", "❌ No niche entered"
+            return
+        script_input = None
+    else:
+        if not full_script.strip():
+            yield log("❌ Please enter a full script."), None, "", "❌ Script missing"
+            return
+        niche = "Custom Script"
+        script_input = full_script.strip()
+    editing_label = EDITING_STYLE_META.get(editing_style, {}).get("label", editing_style)
+    yield log(f"🚀 Starting — mode: {input_mode} | lang: {language} | niche: '{niche}'"), None, "", "⏳ Starting..."
+    try:
+        # ── Phase 1: AI Logic ─────────────────────────────
+        if script_input:
+            yield log(f"🧠 Phase 1 — Parsing script using {brain_model}..."), None, "", "🧠 Parsing..."
+            progress(0.05, desc="Parsing script...")
+            from agent import parse_script_into_video_package
+            video_json = parse_script_into_video_package(script_input, language=language, model=brain_model)
+        else:
+            yield log(f"🧠 Phase 1 — Generating script using {brain_model}..."), None, "", "🧠 Generating..."
+            progress(0.05, desc="AI Brain thinking...")
+            from agent import generate_video_package
+            video_json = generate_video_package(niche=niche, style=style, language=language, model=brain_model)
+        scenes_count = len(video_json.get("scenes", []))
+        total_dur    = video_json.get("total_duration_seconds", 0)
+        seo          = video_json.get("seo", {})
+        yield (
+            log(f"✅ Script ready — {scenes_count} scenes, ~{total_dur}s"),
+            None,
+            _build_seo_md(seo),
+            "🧠 Script generated",
+        )
+        if dry_run:
+            json_preview = json.dumps(video_json, indent=2)
+            yield (
+                log("🏁 Dry-run complete — JSON preview below\n\n```json\n" + json_preview[:800] + "\n...```"),
+                None,
+                _build_seo_md(seo),
+                "✅ Dry-run complete",
+            )
+            return
+        # ── Phase 2: Voiceover ──────────────────────────
+        yield log("🎙  Phase 2 — Generating voiceovers..."), None, _build_seo_md(seo), "🎙 Generating voice..."
+        progress(0.25, desc="Generating voiceovers...")
+        from voice_generator import generate_voiceovers
+        voice_id = None if voice_model == "auto" else voice_model
+        audio_results = generate_voiceovers(video_json, voice_id=voice_id)
+        total_audio = sum(r["duration"] for r in audio_results)
+        yield (
+            log(f"✅ {len(audio_results)} audio files — {total_audio:.1f}s total"),
+            None,
+            _build_seo_md(seo),
+            "🎙 Voiceovers done",
+        )
+        # ── Phase 3: Media Fetching ────────────────────
+        yield log("🎥  Phase 3 — Fetching media from Pexels..."), None, _build_seo_md(seo), "🎥 Fetching media..."
+        progress(0.45, desc="Downloading clips...")
+        from media_fetcher import fetch_all_media
+        media_results = fetch_all_media(video_json)
+        ok_media = sum(1 for m in media_results if m.get("path"))
+        yield (
+            log(f"✅ {ok_media}/{len(media_results)} media assets downloaded"),
+            None,
+            _build_seo_md(seo),
+            "🎥 Media fetched",
+        )
+        # ── Phase 4: Asset QA ──────────────────────────
+        yield log("🔍  Phase 4 — Running Nemotron-2 VL quality check..."), None, _build_seo_md(seo), "🔍 QA check..."
+        progress(0.60, desc="Vision quality check...")
+        from asset_checker import check_all_assets
+        media_results = check_all_assets(media_results, video_json, skip_check=skip_check)
+        approved = sum(1 for m in media_results if m.get("approved"))
+        yield (
+            log(f"✅ {approved}/{len(media_results)} assets passed quality gate"),
+            None,
+            _build_seo_md(seo),
+            "🔍 QA done",
+        )
+        # ── Phase 5: Assemble ──────────────────────────
+        yield log(f"🎬  Phase 5 — Assembling ({editing_label})..."), None, _build_seo_md(seo), "🎬 Assembling..."
+        progress(0.75, desc="Rendering video...")
+        from video_assembler import assemble_video, EDITING_STYLES as _ES
+        safe_style = editing_style if editing_style in _ES else "social_media"
+        video_path = assemble_video(
+            video_json, audio_results, media_results,
+            editing_style=safe_style,
+        )
+        progress(1.0, desc="Done!")
+        yield (
+            log(f"🏆 Video ready! → {video_path}"),
+            video_path,
+            _build_seo_md(seo),
+            "✅ Done!",
+        )
+    except EnvironmentError as e:
+        yield log(f"⚠️  Configuration Error: {e}"), None, "", f"⚠️ Config: {str(e)[:40]}"
+    except Exception as e:
+        error_msg = str(e)
+        if "429" in error_msg:
+            status_summary = "⚠️ Rate Limit (429)"
+            user_advice = "\n\n💡 TIP: OpenRouter is busy! Try switching the 'Brain Model' to Gemini 2.0 Flash."
+        else:
+            status_summary = "❌ Failed"
+            user_advice = ""
+        tb = traceback.format_exc()
+        yield (
+            log(f"❌ Error: {error_msg}{user_advice}\n\nTraceback:\n{tb}"),
+            None,
+            "",
+            status_summary,
+        )
+def _build_seo_md(seo: dict) -> str:
+    if not seo:
+        return ""
+    title     = seo.get("title", "")
+    desc      = seo.get("description", "")
+    hashtags  = " ".join(seo.get("hashtags", []))
+    keywords  = ", ".join(seo.get("keywords", []))
+    return f"""### 📊 SEO Metadata
+**Title:** {title}
+**Description:**
+{desc}
+**Hashtags:** {hashtags}
+**Keywords:** `{keywords}`
+"""
+# ── Build UI ──────────────────────────────────────────────
+with gr.Blocks(
+    title="AutoShorts — Autonomous Video Engine",
+) as demo:
+    logged_in = gr.State(False)
+    def on_login():
+        return gr.update(visible=False), gr.update(visible=True), True
+    # ── Login Page ───────────────────────────────────────
+    with gr.Column(elem_id="login-container", visible=True) as login_view:
+        gr.HTML("""
+        <div>
+            <div style="font-size: 4rem; margin-bottom: 20px;">🎬</div>
+            <h1>Generate short videos just by giving the topic and enter</h1>
+            <p>Welcome to <strong>AutoShorts</strong>. The elite autonomous engine for viral content.</p>
+        </div>
+        """)
+        login_btn = gr.Button(
+            "Continue with Google",
+            elem_id="google-btn",
+            icon="https://www.google.com/favicon.ico" # Fallback icon
+        )
+        gr.HTML("""
+        <div style="margin-top: 32px; color: #475569; font-size: 0.85rem;">
+            By continuing, you agree to AutoShorts' Terms of Service and Data Policy.
+        </div>
+        """)
+    # ── Main App Container ───────────────────────────────
+    with gr.Column(visible=False) as main_app_view:
+        gr.HTML("""
+        <div>
+            <div class="badge-premium">v2.0 • Ultra-Performance Edition</div>
+            <h1>AutoShorts</h1>
+            <p>The elite autonomous engine for viral short-form content. Transform any niche into high-retention videos in seconds.</p>
+            <div class="powered-by">
+                <strong>Llama 3.3</strong> <span></span>
+                <strong>Google TTS</strong> <span></span>
+                <strong>Pexels</strong> <span></span>
+                <strong>MoviePy 2.x</strong>
+            </div>
+        </div>
+        """)
+        with gr.Row(equal_height=False):
+            # ── Left Panel: Controls ──────────────────────────
+            with gr.Column(scale=1, min_width=320):
+                gr.HTML('<div style="color:#a78bfa; font-weight:700; font-size:16px; margin-bottom:12px;">⚡ Configuration</div>')
+                with gr.Row():
+                    input_toggle = gr.Radio(choices=["Topic", "Full Script"], value="Topic", label="Input Mode", interactive=True)
+                    language_sel = gr.Dropdown(choices=["English", "Hindi", "Hinglish"], value="English", label="Language", interactive=True)
+                brain_model_sel = gr.Dropdown(
+                    choices=[
+                        ("Llama 3.3 70B (High Logic)", "meta-llama/llama-3.3-70b-instruct:free"),
+                        ("Mistral Small 3.1", "mistralai/mistral-small-3.1-24b-instruct:free"),
+                        ("Gemma 3 27B", "google/gemma-3-27b-it:free"),
+                        ("MiniMax M2.5 (Fastest)", "minimax/minimax-m2.5:free"),
+                        ("Nemotron-3 Super", "nvidia/nemotron-3-super-120b-a12b:free")
+                    ],
+                    value="meta-llama/llama-3.3-70b-instruct:free",
+                    label="Brain Model (AI Engine)",
+                    interactive=True,
+                )
+                with gr.Column(visible=True) as topic_container:
+                    niche_preset = gr.Dropdown(choices=NICHE_PRESETS, label="Select a Niche", value=NICHE_PRESETS[0], interactive=True)
+                    niche_custom = gr.Textbox(placeholder='Or type your own: "Mindset Shifts"', label="Custom Topic")
+                    style_select = gr.Dropdown(choices=STYLE_OPTIONS, label="Script Tone / Vibe", value=STYLE_OPTIONS[0], interactive=True)
+                with gr.Column(visible=False) as script_container:
+                    full_script_input = gr.Textbox(placeholder="Paste your full video script here...", label="Full Video Script", lines=8)
+                gr.HTML('<div style="color:#a78bfa; font-weight:700; font-size:13px; margin:14px 0 8px;">🎙 Voice Selection</div>')
+                voice_select = gr.Dropdown(choices=VOICE_CHOICES, value="auto", label="Voice Model (Humanized)", interactive=True)
+                gr.HTML('<div style="color:#a78bfa; font-weight:700; font-size:13px; margin:14px 0 8px;">🎬 Editing Style</div>')
+                editing_style_radio = gr.Radio(
+                    choices=[
+                        (f"🎨 Motion Graphics", "motion_graphics"),
+                        (f"⚡ Montage", "montage"),
+                        (f"🎥 Documentary", "documentary"),
+                        (f"📲 Social Media", "social_media"),
+                    ],
+                    value="social_media",
+                    label="",
+                    interactive=True,
+                )
+                with gr.Row():
+                    dry_run_chk   = gr.Checkbox(label="🧪 Dry-run (JSON only)", value=False)
+                    skip_check_chk = gr.Checkbox(label="⚡ Skip VL QA", value=False)
+                gen_btn = gr.Button("🚀 Generate Video", elem_id="gen-btn", variant="primary")
+                status_box = gr.Textbox(value="Idle", label="Status", interactive=False, elem_id="status-box")
+                seo_output = gr.Markdown(value="*SEO metadata will appear here.*", elem_id="seo-panel", label="SEO Metadata")
+            # ── Right Panel: Output ───────────────────────────
+            with gr.Column(scale=2, min_width=500):
+                gr.HTML('<div style="color:#a78bfa; font-weight:700; font-size:16px; margin-bottom:8px;">📺 Output</div>')
+                video_out = gr.Video(label="Final Video (9:16)", height=600, interactive=False)
+                gr.HTML('<div style="color:#64748b; font-size:13px; margin-top:16px;">📋 Live Log</div>')
+                log_out = gr.Textbox(label="", lines=14, interactive=False, max_lines=20, elem_id="log-box")
+        # ── Footer ────────────────────────────────────────────
+        gr.HTML("""<div style="text-align:center; padding:20px; color:#334155; font-size:12px; margin-top:16px;">AutoShorts · Powered by OpenRouter · Pexels · MoviePy</div>""")
+        # ── UI Logic ──────────────────────────────────────────
+        def filter_voices_by_lang(lang):
+            options = [("Auto-Select (Recommended)", "auto")]
+            for k, v in PREMIUM_VOICES.items():
+                is_hindi = "hindi" in k
+                if lang in ["Hindi", "Hinglish"]:
+                    if is_hindi: options.append((v[2], k))
+                else:
+                    if not is_hindi: options.append((v[2], k))
+            return gr.update(choices=options, value="auto")
+        language_sel.change(fn=filter_voices_by_lang, inputs=[language_sel], outputs=[voice_select])
+        def toggle_input_mode(mode):
+            if mode == "Topic": return gr.update(visible=True), gr.update(visible=False)
+            return gr.update(visible=False), gr.update(visible=True)
+        input_toggle.change(fn=toggle_input_mode, inputs=[input_toggle], outputs=[topic_container, script_container])
+        gen_btn.click(
+            fn=generate_video,
+            inputs=[input_toggle, language_sel, brain_model_sel, niche_preset, niche_custom, full_script_input, voice_select, style_select, editing_style_radio, dry_run_chk, skip_check_chk],
+            outputs=[log_out, video_out, seo_output, status_box],
+        )
+    # ── Login Logic ───────────────────────────────────────
+    login_btn.click(
+        on_login,
+        outputs=[login_view, main_app_view, logged_in]
+    )
+if __name__ == "__main__":
+    print("\n[INFO] Launching Autonomous Video Engine UI...")
+    print("   URL: http://localhost:7860\n")
+    demo.launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        share=False,
+        show_error=True,
+        css=CUSTOM_CSS,
+        theme=gr.themes.Base(
+            primary_hue="violet",
+            secondary_hue="indigo",
+            neutral_hue="slate",
+            font=gr.themes.GoogleFont("Inter"),
+        )
+    )

asset_checker.py ADDED Viewed

	@@ -0,0 +1,198 @@

+"""
+asset_checker.py
+─────────────────────────────────────────────────────────────
+Autonomous Short-Form Video Engine — Visual Quality Gate
+Uses Nemotron-2 VL (vision-language model) via NVIDIA NIM
+to score each downloaded media asset for quality & relevance.
+Assets scoring < 6/10 are rejected and trigger a re-fetch.
+─────────────────────────────────────────────────────────────
+"""
+import os
+import json
+import base64
+import logging
+from pathlib import Path
+from openai import OpenAI
+from PIL import Image
+from io import BytesIO
+from dotenv import load_dotenv
+load_dotenv()
+logger = logging.getLogger(__name__)
+NVIDIA_API_KEY = os.getenv("NVIDIA_API_KEY", "")
+SCORE_THRESHOLD = 6.0        # assets below this score are rejected
+MAX_REFETCH_ATTEMPTS = 2
+# NVIDIA NIM endpoint for Nemotron-2 VL
+nvidia_client = OpenAI(
+    base_url="https://integrate.api.nvidia.com/v1",
+    api_key=NVIDIA_API_KEY,
+)
+VL_MODEL = "nvidia/nemotron-4-340b-instruct"  # vision-capable endpoint
+def _extract_frame(media_path: str) -> str:
+    """
+    Extract a thumbnail from a video file or resize an image,
+    then return as base64-encoded JPEG string.
+    """
+    path = Path(media_path)
+    if path.suffix.lower() in (".mp4", ".mov", ".avi", ".webm"):
+        try:
+            from moviepy import VideoFileClip
+            clip = VideoFileClip(str(path))
+            # Grab frame at 20% into the video (avoids black intro frames)
+            t = clip.duration * 0.2
+            frame = clip.get_frame(t)
+            clip.close()
+            img = Image.fromarray(frame)
+        except Exception as e:
+            logger.warning(f"[Checker] Could not extract video frame: {e}")
+            return ""
+    else:
+        try:
+            img = Image.open(path)
+        except Exception as e:
+            logger.warning(f"[Checker] Could not open image: {e}")
+            return ""
+    # Resize to max 512px on longest side for API efficiency
+    img.thumbnail((512, 512), Image.LANCZOS)
+    if img.mode != "RGB":
+        img = img.convert("RGB")
+    buf = BytesIO()
+    img.save(buf, format="JPEG", quality=85)
+    return base64.b64encode(buf.getvalue()).decode("utf-8")
+def _build_check_prompt(topic: str) -> str:
+    return (
+        f"You are a strict video quality reviewer. Look at this image (a frame from a short-form video clip). "
+        f"Rate it on two criteria for the topic: '{topic}'.\n\n"
+        f"1. Visual Quality (lighting, sharpness, professional look): 1-10\n"
+        f"2. Topic Relevance (does it visually match '{topic}'?): 1-10\n\n"
+        f"Reply ONLY with valid JSON in this format:\n"
+        f'{{ "quality_score": 7, "relevance_score": 8, "overall": 7.5, "reject": false, "reason": "brief reason" }}'
+    )
+def check_asset(media_path: str, topic: str, skip_check: bool = False) -> dict:
+    """
+    Run Nemotron-2 VL quality check on a downloaded media asset.
+    Args:
+        media_path: Path to the downloaded video/image file
+        topic: The scene topic/keyword for relevance scoring
+        skip_check: If True, skip the VL check and approve automatically
+    Returns:
+        Dict: {"approved": bool, "overall": float, "reason": str}
+    """
+    if skip_check or not NVIDIA_API_KEY:
+        if not NVIDIA_API_KEY:
+            logger.warning("[Checker] NVIDIA_API_KEY not set — auto-approving all assets.")
+        return {"approved": True, "overall": 10.0, "reason": "Check skipped"}
+    frame_b64 = _extract_frame(media_path)
+    if not frame_b64:
+        return {"approved": True, "overall": 7.0, "reason": "Could not extract frame — auto-approved"}
+    prompt = _build_check_prompt(topic)
+    image_url = f"data:image/jpeg;base64,{frame_b64}"
+    try:
+        response = nvidia_client.chat.completions.create(
+            model=VL_MODEL,
+            messages=[
+                {
+                    "role": "user",
+                    "content": [
+                        {"type": "image_url", "image_url": {"url": image_url}},
+                        {"type": "text", "text": prompt},
+                    ],
+                }
+            ],
+            temperature=0.1,
+            max_tokens=200,
+        )
+        raw = response.choices[0].message.content.strip()
+        # Strip markdown if needed
+        if raw.startswith("```"):
+            raw = raw.split("```")[1]
+            if raw.startswith("json"):
+                raw = raw[4:]
+            raw = raw.strip()
+        result = json.loads(raw)
+        overall = float(result.get("overall", 7.0))
+        approved = overall >= SCORE_THRESHOLD
+        logger.info(
+            f"[Checker] {Path(media_path).name} → score: {overall:.1f}/10 "
+            f"({'✅ approved' if approved else '❌ rejected'})"
+        )
+        return {
+            "approved": approved,
+            "overall": overall,
+            "reason": result.get("reason", ""),
+        }
+    except json.JSONDecodeError as e:
+        logger.warning(f"[Checker] JSON parse error from VL response: {e} — auto-approving")
+        return {"approved": True, "overall": 7.0, "reason": "Parse error — auto-approved"}
+    except Exception as e:
+        logger.warning(f"[Checker] VL API error: {e} — auto-approving")
+        return {"approved": True, "overall": 7.0, "reason": f"API error: {str(e)[:60]}"}
+def check_all_assets(media_results: list[dict], video_json: dict,
+                     skip_check: bool = False) -> list[dict]:
+    """
+    Quality-check all fetched media assets. Marks each with approved status.
+    Args:
+        media_results: Output of media_fetcher.fetch_all_media()
+        video_json: Original video package JSON (for scene keywords as topic)
+        skip_check: Skip VL check entirely
+    Returns:
+        Same list with 'approved', 'score', 'check_reason' added to each item
+    """
+    scene_map = {s["scene_number"]: s for s in video_json["scenes"]}
+    for item in media_results:
+        if not item.get("path"):
+            item.update({"approved": False, "score": 0.0, "check_reason": "No file"})
+            continue
+        scene = scene_map.get(item["scene_number"], {})
+        topic = scene.get("visual_description", "") or ", ".join(
+            scene.get("pexels_keywords", ["video"])
+        )
+        result = check_asset(item["path"], topic, skip_check=skip_check)
+        item["approved"] = result["approved"]
+        item["score"] = result["overall"]
+        item["check_reason"] = result["reason"]
+    approved_count = sum(1 for m in media_results if m.get("approved"))
+    logger.info(f"[Checker] ✅ {approved_count}/{len(media_results)} assets passed QA.")
+    return media_results
+# ── CLI Test ──────────────────────────────────────────────
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--image", required=True, help="Path to image or video file")
+    parser.add_argument("--topic", default="AI technology future", help="Topic for relevance check")
+    args = parser.parse_args()
+    logging.basicConfig(level=logging.INFO)
+    print(f"\n🔍 Checking asset: {args.image}")
+    print(f"   Topic: '{args.topic}'\n")
+    result = check_asset(args.image, args.topic)
+    print(json.dumps(result, indent=2))

debug_assembly.py ADDED Viewed

	@@ -0,0 +1,51 @@

+import json
+import logging
+from pathlib import Path
+from video_assembler import assemble_video
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Load the JSON
+json_path = "output/ai_tools_20260322_095831.json"
+with open(json_path, "r") as f:
+    video_json = json.load(f)
+# Mock audio results (first 3 scenes only for speed)
+audio_results = [
+    {"scene_number": 1, "path": "assets/audio/scene_01.mp3", "duration": 4.6},
+    {"scene_number": 2, "path": "assets/audio/scene_02.mp3", "duration": 3.1},
+    {"scene_number": 3, "path": "assets/audio/scene_03.mp3", "duration": 3.5},
+]
+# Mock media results (first 3 scenes)
+# I'll find existing clips in assets/clips
+clips_dir = Path("assets/clips")
+media_results = []
+for i in range(1, 4):
+    found = list(clips_dir.glob(f"scene_{i:02d}_*.mp4"))
+    if found:
+        media_results.append({
+            "scene_number": i,
+            "path": str(found[0]),
+            "type": "video",
+            "approved": True
+        })
+# Trim JSON to 3 scenes
+video_json["scenes"] = video_json["scenes"][:3]
+logger.info("Starting standalone assembly test...")
+try:
+    output_path = assemble_video(
+        video_json=video_json,
+        audio_results=audio_results,
+        media_results=media_results,
+        editing_style="social_media",
+        output_filename="assembly_verify_test.mp4"
+    )
+    print(f"\nSUCCESS: Video assembled at {output_path}")
+except Exception as e:
+    import traceback
+    print(f"\nFAILURE: Assembly failed!")
+    traceback.print_exc()

main.py ADDED Viewed

	@@ -0,0 +1,192 @@

+"""
+main.py
+─────────────────────────────────────────────────────────────
+Autonomous Short-Form Video Engine — CLI Orchestrator
+Wires all layers together: Agent → Voice → Media → QA → Assemble
+Usage:
+  python main.py --niche "AI Tools"
+  python main.py --niche "Motivation" --dry-run
+  python main.py --niche "Fitness" --skip-check
+  python main.py --niche "Fitness" --editing-style montage
+  python main.py --niche "Finance" --editing-style documentary
+─────────────────────────────────────────────────────────────"""
+import argparse
+import json
+import logging
+import sys
+import time
+from pathlib import Path
+from datetime import datetime
+# ── Logging & Encoding setup ──────────────────────────────
+import sys
+if sys.stdout.encoding != 'utf-8':
+    try:
+        sys.stdout.reconfigure(encoding='utf-8')
+    except AttributeError:
+        # Fallback for older Python versions
+        import codecs
+        sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s | %(levelname)-8s | %(message)s",
+    datefmt="%H:%M:%S",
+)
+logger = logging.getLogger(__name__)
+def banner():
+    print("""
++----------------------------------------------------------+
+|                 *  A U T O S H O R T S  *                |
+|         Powered by Llama 3.3 + MoviePy + Pexels          |
++----------------------------------------------------------+
+""")
+EDITING_STYLES = ["motion_graphics", "montage", "documentary", "social_media"]
+def run_pipeline(
+    niche: str,
+    style: str = "engaging and educational",
+    editing_style: str = "social_media",
+    dry_run: bool = False,
+    skip_check: bool = False,
+) -> dict:
+    """
+    Full pipeline execution.
+    Returns:
+        dict with keys: json_data, audio_results, media_results, video_path
+    """
+    if editing_style not in EDITING_STYLES:
+        logger.warning(f"Unknown editing style '{editing_style}' — defaulting to social_media")
+        editing_style = "social_media"
+    result = {}
+    start  = time.time()
+    # ── Phase 1: AI Brain ─────────────────────────────────
+    print(f"\n{'═'*55}")
+    print(f"  🧠 Phase 1 — Generating video script via Nemotron-3")
+    print(f"  🎬 Editing Style: {editing_style.replace('_', ' ').title()}")
+    print(f"{'═'*55}")
+    from agent import generate_video_package
+    video_json = generate_video_package(niche=niche, style=style)
+    result["json_data"] = video_json
+    # Save JSON
+    Path("output").mkdir(exist_ok=True)
+    ts = datetime.now().strftime("%Y%m%d_%H%M%S")
+    json_path = Path("output") / f"{niche.lower().replace(' ', '_')}_{ts}.json"
+    json_path.write_text(json.dumps(video_json, indent=2), encoding="utf-8")
+    print(f"\n  📋 Script saved → {json_path}")
+    # Preview
+    print(f"\n  Niche     : {video_json.get('niche')}")
+    print(f"  Scenes    : {len(video_json.get('scenes', []))}")
+    print(f"  Duration  : {video_json.get('total_duration_seconds')}s")
+    print(f"  SEO Title : {video_json.get('seo', {}).get('title', '')}")
+    if dry_run:
+        print("\n  🏁 Dry-run mode — stopping after JSON generation.\n")
+        print(json.dumps(video_json, indent=2))
+        return result
+    # ── Phase 2: Voiceover ────────────────────────────────
+    print(f"\n{'═'*55}")
+    print(f"  🎙  Phase 2 — Generating voiceovers (Google TTS)")
+    print(f"{'═'*55}")
+    from voice_generator import generate_voiceovers
+    audio_results = generate_voiceovers(video_json)
+    result["audio_results"] = audio_results
+    total_audio = sum(r["duration"] for r in audio_results)
+    print(f"\n  ✅ {len(audio_results)} audio files generated ({total_audio:.1f}s total)")
+    # ── Phase 3: Media Fetching ───────────────────────────
+    print(f"\n{'═'*55}")
+    print(f"  🎥  Phase 3 — Fetching media from Pexels")
+    print(f"{'═'*55}")
+    from media_fetcher import fetch_all_media
+    media_results = fetch_all_media(video_json)
+    # ── Phase 4: Asset QA ─────────────────────────────────
+    print(f"\n{'═'*55}")
+    print(f"  🔍  Phase 4 — Running Nemotron-2 VL asset check")
+    print(f"{'═'*55}")
+    from asset_checker import check_all_assets
+    media_results = check_all_assets(media_results, video_json, skip_check=skip_check)
+    result["media_results"] = media_results
+    approved = sum(1 for m in media_results if m.get("approved"))
+    print(f"\n  ✅ {approved}/{len(media_results)} assets passed QA")
+    # ── Phase 5: Video Assembly ───────────────────────────
+    print(f"\n{'═'*55}")
+    print(f"  🎬  Phase 5 — Assembling final video (MoviePy)")
+    print(f"{'═'*55}")
+    from video_assembler import assemble_video
+    video_path = assemble_video(
+        video_json, audio_results, media_results,
+        editing_style=editing_style,
+    )
+    result["video_path"] = video_path
+    elapsed = time.time() - start
+    print(f"""
+╔══════════════════════════════════════════════════════════╗
+║                    🏆  DONE!                             ║
+╠══════════════════════════════════════════════════════════╣
+║  📹 Video  : {video_path[:46]:<46} ║
+║  📋 JSON   : {str(json_path)[:46]:<46} ║
+║  ⏱  Time   : {elapsed:.1f}s                                      ║
+╚══════════════════════════════════════════════════════════╝
+""")
+    return result
+# ── Main ──────────────────────────────────────────────────
+if __name__ == "__main__":
+    banner()
+    parser = argparse.ArgumentParser(
+        description="Autonomous Short-Form Video Engine"
+    )
+    parser.add_argument("--niche", required=True, help='Topic niche, e.g. "AI Tools"')
+    parser.add_argument("--style", default="engaging and educational",
+                        help="AI script tone/style descriptor")
+    parser.add_argument(
+        "--editing-style",
+        choices=EDITING_STYLES,
+        default="social_media",
+        help="Video editing style (default: social_media)",
+    )
+    parser.add_argument("--dry-run", action="store_true",
+                        help="Generate JSON only — no media/voice/video")
+    parser.add_argument("--skip-check", action="store_true",
+                        help="Skip Nemotron-2 VL asset quality check")
+    args = parser.parse_args()
+    try:
+        run_pipeline(
+            niche=args.niche,
+            style=args.style,
+            editing_style=args.editing_style,
+            dry_run=args.dry_run,
+            skip_check=args.skip_check,
+        )
+    except KeyboardInterrupt:
+        print("\n\n⚠️  Interrupted by user.")
+        sys.exit(0)
+    except Exception as e:
+        logger.error(f"Pipeline failed: {e}", exc_info=True)
+        sys.exit(1)

media_fetcher.py ADDED Viewed

	@@ -0,0 +1,206 @@

+"""
+media_fetcher.py
+─────────────────────────────────────────────────────────────
+Autonomous Short-Form Video Engine — Media Layer
+Downloads portrait video clips from Pexels API per scene.
+Falls back to images if no video is found.
+Caches downloads to avoid re-fetching.
+─────────────────────────────────────────────────────────────
+"""
+import os
+import re
+import hashlib
+import logging
+import requests
+from pathlib import Path
+from dotenv import load_dotenv
+from tqdm import tqdm
+load_dotenv()
+logger = logging.getLogger(__name__)
+PEXELS_API_KEY = os.getenv("PEXELS_API_KEY", "")
+CLIPS_DIR = Path("assets/clips")
+CLIPS_DIR.mkdir(parents=True, exist_ok=True)
+PEXELS_VIDEOS_URL = "https://api.pexels.com/videos/search"
+PEXELS_PHOTOS_URL = "https://api.pexels.com/v1/search"
+HEADERS = {"Authorization": PEXELS_API_KEY}
+# Target orientation: portrait (9:16)
+MIN_PORTRAIT_RATIO = 0.45   # width/height < this → portrait
+PREFERRED_MIN_QUALITY = 720  # minimum height in pixels
+def _cache_key(keywords: list[str]) -> str:
+    joined = "_".join(sorted(keywords)).lower()
+    return hashlib.md5(joined.encode()).hexdigest()[:10]
+def _clean_filename(name: str) -> str:
+    return re.sub(r"[^\w\-.]", "_", name)
+def _download_file(url: str, dest: Path, desc: str = "") -> bool:
+    """Stream-download a file with progress bar."""
+    try:
+        resp = requests.get(url, stream=True, timeout=30)
+        resp.raise_for_status()
+        total = int(resp.headers.get("content-length", 0))
+        with open(dest, "wb") as f, tqdm(
+            total=total, unit="B", unit_scale=True,
+            desc=desc, ncols=70, leave=False
+        ) as bar:
+            for chunk in resp.iter_content(chunk_size=8192):
+                f.write(chunk)
+                bar.update(len(chunk))
+        return True
+    except Exception as e:
+        logger.error(f"[Media] Download failed for {url}: {e}")
+        dest.unlink(missing_ok=True)
+        return False
+def _fetch_video(keywords: list[str], scene_num: int) -> dict | None:
+    """Search Pexels for a portrait video clip matching keywords."""
+    if not PEXELS_API_KEY:
+        raise EnvironmentError("PEXELS_API_KEY not set.")
+    cache_key = _cache_key(keywords)
+    cached = list(CLIPS_DIR.glob(f"scene_{scene_num:02d}_{cache_key}.*"))
+    if cached:
+        logger.info(f"[Media] Scene {scene_num} cache hit → {cached[0].name}")
+        return {"path": str(cached[0]), "type": "video" if cached[0].suffix == ".mp4" else "image"}
+    # Try with up to 3 keywords, then fallback
+    search_keywords = keywords[:3]
+    while len(search_keywords) > 0:
+        query = " ".join(search_keywords)
+        params = {
+            "query": query,
+            "per_page": 15,
+            "orientation": "portrait",
+            "size": "medium",
+        }
+        try:
+            resp = requests.get(PEXELS_VIDEOS_URL, headers=HEADERS, params=params, timeout=15)
+            resp.raise_for_status()
+            data = resp.json()
+            videos = data.get("videos", [])
+            if videos:
+                # Process videos... (rest of the logic)
+                for video in videos:
+                    w, h = video.get("width", 0), video.get("height", 0)
+                    if w == 0 or h == 0: continue
+                    ratio = w / h
+                    if ratio > MIN_PORTRAIT_RATIO * 2: continue
+                    files = video.get("video_files", [])
+                    files_sorted = sorted(
+                        [f for f in files if f.get("quality") in ("hd", "sd")],
+                        key=lambda x: x.get("height", 0),
+                        reverse=True,
+                    )
+                    if not files_sorted: continue
+                    chosen = files_sorted[0]
+                    if chosen.get("height", 0) < PREFERRED_MIN_QUALITY: continue
+                    url = chosen["link"]
+                    dest = CLIPS_DIR / f"scene_{scene_num:02d}_{cache_key}.mp4"
+                    logger.info(f"[Media] Scene {scene_num} — downloading video: {video['id']}.mp4")
+                    if _download_file(url, dest, desc=f"Scene {scene_num}"):
+                        return {"path": str(dest), "type": "video"}
+            # If no videos found for this query, try fewer keywords
+            search_keywords.pop()
+        except Exception as e:
+            logger.warning(f"[Media] Pexels video search failed for '{query}': {e}")
+            search_keywords.pop()
+    logger.warning(f"[Media] No good portrait video found for scene {scene_num}. Trying image fallback...")
+    return _fetch_image(keywords, scene_num, cache_key)
+def _fetch_image(keywords: list[str], scene_num: int, cache_key: str = None) -> dict | None:
+    """Fallback: fetch a portrait image from Pexels."""
+    if cache_key is None:
+        cache_key = _cache_key(keywords)
+    query = " ".join(keywords[:3])
+    params = {
+        "query": query,
+        "per_page": 10,
+        "orientation": "portrait",
+    }
+    try:
+        resp = requests.get(PEXELS_PHOTOS_URL, headers=HEADERS, params=params, timeout=15)
+        resp.raise_for_status()
+        photos = resp.json().get("photos", [])
+    except Exception as e:
+        logger.error(f"[Media] Pexels image fallback failed: {e}")
+        return None
+    for photo in photos:
+        url = photo["src"].get("large2x") or photo["src"].get("large")
+        if not url:
+            continue
+        dest = CLIPS_DIR / f"scene_{scene_num:02d}_{cache_key}.jpg"
+        logger.info(f"[Media] Scene {scene_num} — downloading image fallback")
+        if _download_file(url, dest, desc=f"Scene {scene_num} img"):
+            return {"path": str(dest), "type": "image"}
+    return None
+def fetch_all_media(video_json: dict) -> list[dict]:
+    """
+    Fetch media (video/image) for every scene in the video JSON.
+    Args:
+        video_json: Parsed dict from agent.generate_video_package()
+    Returns:
+        List of dicts per scene: [{"scene_number": 1, "path": "...", "type": "video"}, ...]
+    """
+    if not PEXELS_API_KEY:
+        raise EnvironmentError("PEXELS_API_KEY not set in .env")
+    results = []
+    for scene in video_json["scenes"]:
+        num      = scene["scene_number"]
+        keywords = scene.get("pexels_keywords", ["technology", "future"])
+        logger.info(f"[Media] Fetching scene {num} — keywords: {keywords}")
+        media = _fetch_video(keywords, num)
+        if media:
+            media["scene_number"] = num
+            results.append(media)
+        else:
+            logger.error(f"[Media] Could not fetch any media for scene {num}!")
+            results.append({"scene_number": num, "path": None, "type": "none"})
+    ok = sum(1 for r in results if r["path"])
+    logger.info(f"[Media] ✅ Fetched {ok}/{len(results)} media assets.")
+    return results
+# ── CLI Test ──────────────────────────────────────────────
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--keywords", nargs="+", default=["artificial intelligence", "technology", "future"])
+    parser.add_argument("--test", action="store_true")
+    args = parser.parse_args()
+    if args.test or args.keywords:
+        logging.basicConfig(level=logging.INFO)
+        print(f"🎥 Fetching media for keywords: {args.keywords}")
+        result = _fetch_video(args.keywords, scene_num=99)
+        if result:
+            print(f"  ✅ {result['type'].upper()}: {result['path']}")
+        else:
+            print("  ❌ No media found.")

packages.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ ffmpeg

requirements.txt ADDED Viewed

	@@ -0,0 +1,25 @@

+# ============================================================
+# Autonomous Short-Form Video Engine — Python Requirements
+# ============================================================
+# AI / LLM (Nemotron-3 via OpenRouter, Nemotron-2 VL via NVIDIA)
+openai>=1.0.0          # Used as OpenRouter-compatible client
+# Text-to-Speech (Google Cloud TTS free tier)
+google-cloud-texttospeech>=2.14.0
+# Alternative TTS (edge-tts, no auth needed — uncomment to use)
+# edge-tts>=6.1.9
+# Media fetching & processing
+requests>=2.31.0
+Pillow>=10.0.0
+moviepy>=1.0.3
+# Web UI
+gradio>=4.0.0
+# Utilities
+python-dotenv>=1.0.0
+mutagen>=1.47.0        # Audio duration detection
+tqdm>=4.66.0           # Progress bars

test_hindi.py ADDED Viewed

	@@ -0,0 +1,21 @@

+from voice_generator import generate_voiceovers
+import json
+def test_hindi_voice():
+    mock_json = {
+        "language": "Hindi",
+        "voiceover_settings": {"mood": "energetic", "gender_preference": "female"},
+        "scenes": [
+            {
+                "scene_number": 1,
+                "script_text": "नमस्ते, क्या आप तैयार हैं?",
+            }
+        ]
+    }
+    print("Testing Hindi voiceover generation...")
+    results = generate_voiceovers(mock_json)
+    for res in results:
+        print(f"Scene {res['scene_number']}: {res['path']} ({res['duration']}s)")
+if __name__ == "__main__":
+    test_hindi_voice()

test_moviepy.py ADDED Viewed

	@@ -0,0 +1,7 @@

+try:
+    from moviepy import VideoFileClip, AudioFileClip, concatenate_videoclips, CompositeVideoClip
+    print("✅ MoviePy 2.x imports successful!")
+except ImportError as e:
+    print(f"❌ ImportError: {e}")
+except Exception as e:
+    print(f"❌ Exception: {e}")

test_moviepy_safe.py ADDED Viewed

	@@ -0,0 +1,15 @@

+import sys
+import codecs
+sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
+print("Testing MoviePy 2.x Imports...")
+try:
+    # In MoviePy 2.0+, many classes are moved to the root package
+    import moviepy
+    from moviepy import VideoFileClip, AudioFileClip, ImageClip, ColorClip, VideoClip, CompositeVideoClip, concatenate_videoclips
+    print("SUCCESS: MoviePy 2.x imports working!")
+    print(f"MoviePy Version: {moviepy.__version__}")
+except ImportError as e:
+    print(f"IMPORT ERROR: {e}")
+except Exception as e:
+    print(f"EXCEPTION: {e}")

test_pexels.py ADDED Viewed

	@@ -0,0 +1,35 @@

+import os
+import requests
+from dotenv import load_dotenv
+load_dotenv()
+api_key = os.getenv("PEXELS_API_KEY")
+if not api_key:
+    print("Error: PEXELS_API_KEY NOT FOUND IN .ENV")
+    exit(1)
+print(f"Testing Pexels API with key: {api_key[:8]}...")
+def test_search(query):
+    url = "https://api.pexels.com/videos/search"
+    headers = {"Authorization": api_key}
+    params = {"query": query, "per_page": 1, "orientation": "portrait"}
+    try:
+        response = requests.get(url, headers=headers, params=params, timeout=10)
+        print(f"Search '{query}': {response.status_code}")
+        if response.status_code == 200:
+            data = response.json()
+            videos = data.get("videos", [])
+            print(f"  Found {len(videos)} videos.")
+            if videos:
+                v = videos[0]
+                print(f"  Example: {v.get('url')} ({v.get('width')}x{v.get('height')})")
+        else:
+            print(f"  Error: {response.text}")
+    except Exception as e:
+        print(f"  Exception: {e}")
+test_search("technology")
+test_search("serene landscape")

test_repair.py ADDED Viewed

	@@ -0,0 +1,22 @@

+from agent import _robust_json_parse, _repair_json
+import json
+def test_repair():
+    truncated = '{"niche": "AI Tools", "scenes": [{"scene_number": 1, "text": "Hello world'
+    print(f"Original: {truncated}")
+    repaired = _repair_json(truncated)
+    print(f"Repaired: {repaired}")
+    data = _robust_json_parse(truncated)
+    print(f"Parsed JSON niche: {data['niche']}")
+    # Test complex truncation
+    complex_trunc = '{"seo": {"title": "AI'
+    print(f"\nOriginal: {complex_trunc}")
+    repaired_complex = _repair_json(complex_trunc)
+    print(f"Repaired: {repaired_complex}")
+    data_complex = _robust_json_parse(complex_trunc)
+    print(f"Parsed JSON SEO title: {data_complex['seo']['title']}")
+if __name__ == "__main__":
+    test_repair()

video_assembler.py ADDED Viewed

	@@ -0,0 +1,806 @@

+"""
+video_assembler.py
+─────────────────────────────────────────────────────────────
+Autonomous Short-Form Video Engine — Modern Editing Layer
+Supported editing styles:
+  • motion_graphics  — Animated titles, kinetic text, zoom punches,
+                       color grading, lower thirds, particles
+  • montage          — Fast cuts, beat-sync, speed ramps, whip pans,
+                       flash transitions, adrenaline pacing
+  • documentary      — Slow cross-fades, b-roll breathing, Ken Burns,
+                       subtitles like a real documentary
+  • social_media     — Zoom punch, caption pop-on, emoji overlays,
+                       bold captions, hook-first social hooks
+Each style wraps the same base infrastructure but applies a different
+visual treatment to every scene.
+─────────────────────────────────────────────────────────────
+"""
+import os
+import math
+import logging
+import random
+from pathlib import Path
+from datetime import datetime
+from typing import Callable
+import numpy as np
+from PIL import Image, ImageDraw, ImageFont, ImageFilter
+from dotenv import load_dotenv
+load_dotenv()
+logger = logging.getLogger(__name__)
+OUTPUT_DIR = Path("output")
+OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
+TARGET_W   = 1080
+TARGET_H   = 1920
+TARGET_FPS = 30
+BGM_VOLUME = float(os.getenv("BGM_VOLUME", "0.08"))
+EDITING_STYLES = ["motion_graphics", "montage", "documentary", "social_media"]
+# ─────────────────────────────────────────────────────────────
+# FONT UTILITIES
+# ─────────────────────────────────────────────────────────────
+def _load_font(size: int = 52, bold: bool = True):
+    candidates = [
+        # Windows Hindi/Devanagari candidates
+        ("C:/Windows/Fonts/Nirmala.ttf",   True),
+        ("C:/Windows/Fonts/Mangal.ttf",    False),
+        # Windows Latin candidates
+        ("C:/Windows/Fonts/arialbd.ttf",   True),
+        ("C:/Windows/Fonts/calibrib.ttf",  True),
+        ("C:/Windows/Fonts/verdanab.ttf",  True),
+        ("C:/Windows/Fonts/arial.ttf",     False),
+        # Linux candidates
+        ("/usr/share/fonts/truetype/liberation/LiberationSans-Bold.ttf", True),
+        ("/usr/share/fonts/truetype/noto/NotoSans-Bold.ttf", True),
+        ("/usr/share/fonts/truetype/noto/NotoSansDevanagari-Bold.ttf", True),
+    ]
+    for fp, is_bold in candidates:
+        if Path(fp).exists() and (is_bold == bold or not bold):
+            try:
+                return ImageFont.truetype(fp, size)
+            except Exception:
+                continue
+    try:
+        # Fallback to a system-registered font if arial is there
+        return ImageFont.truetype("arial.ttf", size)
+    except Exception:
+        return ImageFont.load_default()
+def _wrap_text(text: str, max_chars: int = 20) -> list[str]:
+    words  = text.split()
+    lines  = []
+    current = ""
+    for word in words:
+        if len(current) + len(word) + 1 <= max_chars:
+            current = f"{current} {word}".strip()
+        else:
+            if current:
+                lines.append(current)
+            current = word
+    if current:
+        lines.append(current)
+    return lines
+# ─────────────────────────────────────────────────────────────
+# BASE CLIP PREPARATION
+# ─────────────────────────────────────────────────────────────
+def _prepare_base_clip(media_path: str, target_dur: float,
+                        w=TARGET_W, h=TARGET_H, fps=TARGET_FPS):
+    """Load video/image, resize+crop to 9:16, loop/trim to target_dur."""
+    from moviepy import VideoFileClip, ImageClip, ColorClip
+    path = Path(media_path) if media_path else Path("")
+    if not path.exists():
+        return ColorClip(size=(w, h), color=(5, 5, 15)).with_duration(target_dur).with_fps(fps)
+    if path.suffix.lower() in (".mp4", ".mov", ".avi", ".webm"):
+        clip = VideoFileClip(str(path))
+        if clip.duration < target_dur:
+            from moviepy import concatenate_videoclips
+            reps = math.ceil(target_dur / clip.duration)
+            clip = concatenate_videoclips([clip] * reps)
+        clip = clip.subclipped(0, target_dur)
+    else:
+        clip = ImageClip(str(path)).with_duration(target_dur)
+    # Resize + center-crop to 9:16
+    cr = clip.w / clip.h
+    tr = w / h
+    if cr > tr:
+        clip = clip.resized(height=h)
+        ex = clip.w - w
+        clip = clip.cropped(x1=ex // 2, x2=ex // 2 + w)
+    else:
+        clip = clip.resized(width=w)
+        ex = clip.h - h
+        clip = clip.cropped(y1=ex // 2, y2=ex // 2 + h)
+    return clip.with_fps(fps)
+# ─────────────────────────────────────────────────────────────
+# EFFECT PRIMITIVES  (each returns a numpy RGBA array or a clip)
+# ─────────────────────────────────────────────────────────────
+# ── Color grading (LUT-like color correction) ─────────────
+def _apply_color_grade(clip, style: str = "cinematic"):
+    """Apply a color grade effect using fl_image frame transformation."""
+    from moviepy import VideoClip
+    def cinematic(frame):
+        img = Image.fromarray(frame).convert("RGB")
+        r, g, b = img.split()
+        r = r.point(lambda i: min(255, int(i * 1.08)))
+        g = g.point(lambda i: min(255, int(i * 0.97)))
+        b = b.point(lambda i: min(255, int(i * 1.12)))
+        return np.array(Image.merge("RGB", (r, g, b)))
+    def warm(frame):
+        img = Image.fromarray(frame).convert("RGB")
+        r, g, b = img.split()
+        r = r.point(lambda i: min(255, int(i * 1.15)))
+        g = g.point(lambda i: min(255, int(i * 1.02)))
+        b = b.point(lambda i: max(0,   int(i * 0.88)))
+        return np.array(Image.merge("RGB", (r, g, b)))
+    def cold(frame):
+        img = Image.fromarray(frame).convert("RGB")
+        r, g, b = img.split()
+        r = r.point(lambda i: max(0,   int(i * 0.90)))
+        g = g.point(lambda i: min(255, int(i * 1.00)))
+        b = b.point(lambda i: min(255, int(i * 1.18)))
+        return np.array(Image.merge("RGB", (r, g, b)))
+    grade_fn = {"cinematic": cinematic, "warm": warm, "cold": cold}.get(style, cinematic)
+    return clip.image_transform(grade_fn)
+# ── Ken Burns (pan + zoom for static images) ─────────────
+def _ken_burns(clip, zoom_in: bool = True, direction: str = "left"):
+    """Slowly zoom + pan across an image clip for a cinematic feel."""
+    dur  = clip.duration
+    w, h = clip.w, clip.h
+    def make_frame(t):
+        progress = t / dur
+        if zoom_in:
+            scale = 1.0 + 0.08 * progress
+        else:
+            scale = 1.08 - 0.08 * progress
+        new_w = int(w * scale)
+        new_h = int(h * scale)
+        frame = clip.get_frame(t)
+        img   = Image.fromarray(frame).resize((new_w, new_h), Image.Resampling.LANCZOS)
+        # Pan direction
+        if direction == "left":
+            x_off = int((new_w - w) * progress)
+        elif direction == "right":
+            x_off = int((new_w - w) * (1 - progress))
+        else:
+            x_off = (new_w - w) // 2
+        y_off = (new_h - h) // 2
+        cropped = img.crop((x_off, y_off, x_off + w, y_off + h))
+        return np.array(cropped)
+    from moviepy import VideoClip
+    return VideoClip(make_frame, duration=dur).with_fps(clip.fps)
+# ── Zoom Punch (single frame zoom-in burst) ──────────────
+def _zoom_punch(clip, at_time: float = 0.0, zoom_factor: float = 1.15, hold: float = 0.12):
+    """Apply an instant zoom punch impact at `at_time`."""
+    dur  = clip.duration
+    w, h = clip.w, clip.h
+    def make_frame(t):
+        frame = clip.get_frame(t)
+        if at_time <= t <= at_time + hold:
+            progress = (t - at_time) / hold
+            scale = 1.0 + (zoom_factor - 1.0) * math.sin(progress * math.pi)
+            nw = int(w * scale)
+            nh = int(h * scale)
+            img  = Image.fromarray(frame).resize((nw, nh), Image.Resampling.LANCZOS)
+            xoff = (nw - w) // 2
+            yoff = (nh - h) // 2
+            return np.array(img.crop((xoff, yoff, xoff + w, yoff + h)))
+        return frame
+    from moviepy import VideoClip
+    return VideoClip(make_frame, duration=dur).with_fps(clip.fps)
+# ── Flash Transition (white flash between scenes) ────────
+def _make_flash_overlay(w: int, h: int, dur: float = 0.18, fps: int = 30) -> "ImageClip":
+    """Create a white flash that fades out — used at scene start for montage."""
+    from moviepy import ImageClip
+    n_frames = int(dur * fps)
+    total_px  = w * h * 3
+    def make_frame(t):
+        alpha = max(0.0, 1.0 - (t / dur) ** 0.5)
+        val   = int(255 * alpha)
+        frame = np.full((h, w, 3), val, dtype=np.uint8)
+        return frame
+    from moviepy import VideoClip
+    return VideoClip(make_frame, duration=dur).with_fps(fps)
+# ── Speed Ramp (slow → fast for montage) ─────────────────
+def _speed_ramp(clip, ramp_duration: float = 1.5, start_speed: float = 0.4, end_speed: float = 1.6):
+    """Gradually change playback speed within the clip."""
+    dur = clip.duration
+    def time_map(t):
+        if t <= ramp_duration:
+            progress = t / ramp_duration
+            speed = start_speed + (end_speed - start_speed) * progress
+        else:
+            speed = end_speed
+        return min(t * speed, dur - 0.01)
+    return clip.time_transform(time_map, apply_to=["video", "mask"])
+# ── Glitch / Chromatic Aberration (social media hook) ────
+def _chromatic_aberration(clip, intensity: int = 6):
+    """RGB channel shift for a glitch effect on the hook scene."""
+    def effect(frame):
+        img = Image.fromarray(frame).convert("RGB")
+        r, g, b = img.split()
+        r_shifted = ImageDraw.Draw(Image.new("L", img.size, 0))
+        r_arr = np.array(r)
+        g_arr = np.array(g)
+        b_arr = np.array(b)
+        h_, w_ = r_arr.shape
+        r_out = np.roll(r_arr, intensity,  axis=1)
+        b_out = np.roll(b_arr, -intensity, axis=1)
+        combined = np.stack([r_out, g_arr, b_out], axis=2).astype(np.uint8)
+        return combined
+    return clip.image_transform(effect)
+# ── Vignette overlay ─────────────────────────────────────
+def _make_vignette(w: int, h: int, strength: float = 0.6) -> np.ndarray:
+    """Generate a vignette RGBA overlay (dark corners)."""
+    img  = Image.new("RGBA", (w, h), (0, 0, 0, 0))
+    draw = ImageDraw.Draw(img)
+    cx, cy = w // 2, h // 2
+    for r in range(max(w, h), 0, -5):
+        alpha = int(strength * 255 * (1 - r / max(w, h)) ** 2.5)
+        if alpha <= 0:
+            continue
+        draw.ellipse(
+            [(cx - r, cy - r), (cx + r, cy + r)],
+            fill=(0, 0, 0, 0),
+            outline=(0, 0, 0, alpha),
+        )
+    return np.array(img)
+# ─────────────────────────────────────────────────────────────
+# TEXT OVERLAY ENGINES (per style)
+# ─────────────────────────────────────────────────────────────
+def _caption_motion_graphics(text: str, w: int, h: int, t: float, dur: float) -> np.ndarray:
+    """
+    MOTION GRAPHICS style:
+    - Slide-up entrance animation
+    - Gradient color fill on text
+    - Bold uppercase with tracking
+    - Semi-transparent pill background
+    """
+    img  = Image.new("RGBA", (w, h), (0, 0, 0, 0))
+    draw = ImageDraw.Draw(img)
+    font = _load_font(size=64)
+    lines      = _wrap_text(text.upper(), max_chars=18)
+    line_h     = 76
+    total_text = len(lines) * line_h
+    y_base     = h - total_text - 200
+    # Slide-up animation: 0→0.4s entrance
+    slide_pct = min(1.0, t / 0.4)
+    ease_y    = int((1 - slide_pct) * 120)   # slides up from +120px
+    # Fade-out: last 0.3s
+    fade_out = max(0.0, min(1.0, (dur - t) / 0.3))
+    alpha    = int(255 * slide_pct * fade_out)
+    for i, line in enumerate(lines):
+        bbox  = draw.textbbox((0, 0), line, font=font)
+        tw    = bbox[2] - bbox[0]
+        th    = bbox[3] - bbox[1]
+        x     = (w - tw) // 2
+        y     = y_base + i * line_h + ease_y
+        # Pill background
+        pad  = 20
+        pill = [(x - pad, y - 8), (x + tw + pad, y + th + 8)]
+        draw.rounded_rectangle(pill, radius=12, fill=(0, 0, 0, int(alpha * 0.72)))
+        # Main text (gradient: purple → cyan approximated as white with tint)
+        draw.text((x + 3, y + 3), line, font=font,
+                  fill=(20, 180, 255, int(alpha * 0.4)))   # shadow in cyan-blue
+        draw.text((x, y), line, font=font,
+                  fill=(255, 255, 255, alpha))
+    return np.array(img)
+def _caption_montage(text: str, w: int, h: int, t: float, dur: float) -> np.ndarray:
+    """
+    MONTAGE style:
+    - Impact font-style (very bold, large)
+    - Fast pop-in (scale from 150% to 100% in 0.15s)
+    - All caps, tight letter spacing
+    - Strong yellow highlight color
+    - No background — heavy black outline instead
+    """
+    img  = Image.new("RGBA", (w, h), (0, 0, 0, 0))
+    draw = ImageDraw.Draw(img)
+    font = _load_font(size=80)
+    lines  = _wrap_text(text.upper(), max_chars=15)
+    line_h = 94
+    y_base = h // 2 - (len(lines) * line_h) // 2   # center screen for montage
+    # Pop-in: 0.15s scale punch (simulated by vertical stretch)
+    pop_pct = min(1.0, t / 0.15)
+    alpha   = int(255 * pop_pct)
+    for i, line in enumerate(lines):
+        bbox = draw.textbbox((0, 0), line, font=font)
+        tw   = bbox[2] - bbox[0]
+        x    = (w - tw) // 2
+        y    = y_base + i * line_h
+        # 6px thick outline (drawn first)
+        for oox in range(-5, 6, 2):
+            for ooy in range(-5, 6, 2):
+                if abs(oox) + abs(ooy) > 6:
+                    continue
+                draw.text((x + oox, y + ooy), line, font=font,
+                          fill=(0, 0, 0, alpha))
+        # Yellow main text
+        draw.text((x, y), line, font=font,
+                  fill=(255, 230, 0, alpha))
+    return np.array(img)
+def _caption_documentary(text: str, w: int, h: int, t: float, dur: float) -> np.ndarray:
+    """
+    DOCUMENTARY style:
+    - Lower-third strip design
+    - Serif-ish font, clean and professional
+    - Slow fade-in (0.6s), slow fade-out (0.5s)
+    - Left-aligned with accent bar
+    - White text on dark semi-transparent band
+    """
+    img  = Image.new("RGBA", (w, h), (0, 0, 0, 0))
+    draw = ImageDraw.Draw(img)
+    font = _load_font(size=54)
+    lines  = _wrap_text(text, max_chars=26)
+    line_h = 64
+    # Slow fade
+    fade_in  = min(1.0, t / 0.6)
+    fade_out = max(0.0, min(1.0, (dur - t) / 0.5))
+    alpha    = int(255 * fade_in * fade_out)
+    total_h = len(lines) * line_h + 40
+    strip_y = h - total_h - 140
+    # Dark band (full width strip)
+    draw.rectangle([(0, strip_y - 12), (w, strip_y + total_h)],
+                   fill=(0, 0, 0, int(alpha * 0.78)))
+    # Accent bar (left edge, electric blue)
+    draw.rectangle([(0, strip_y - 12), (8, strip_y + total_h)],
+                   fill=(0, 180, 255, alpha))
+    # Text
+    x_text = 36
+    for i, line in enumerate(lines):
+        y_ = strip_y + 16 + i * line_h
+        draw.text((x_text + 2, y_ + 2), line, font=font, fill=(0, 0, 0, int(alpha * 0.5)))
+        draw.text((x_text, y_), line, font=font, fill=(255, 255, 255, alpha))
+    return np.array(img)
+def _caption_social_media(text: str, w: int, h: int, t: float, dur: float) -> np.ndarray:
+    """
+    SOCIAL MEDIA style:
+    - TikTok/Reels-style bold center caption
+    - Word-by-word karaoke highlight effect (approximated)
+    - Bright pink/yellow highlight on current word
+    - Strong outline, big font, bottom-centered
+    """
+    img  = Image.new("RGBA", (w, h), (0, 0, 0, 0))
+    draw = ImageDraw.Draw(img)
+    font = _load_font(size=72)
+    lines  = _wrap_text(text, max_chars=16)
+    line_h = 88
+    y_base = h - (len(lines) * line_h) - 160
+    # Pop-in burst: 0→0.2s
+    pop    = min(1.0, t / 0.2)
+    alpha  = int(255 * pop)
+    # Karaoke: highlight word cycling
+    words = text.split()
+    total_words = max(1, len(words))
+    current_word_idx = int((t / max(dur, 0.01)) * total_words)
+    for i, line in enumerate(lines):
+        bbox = draw.textbbox((0, 0), line, font=font)
+        tw   = bbox[2] - bbox[0]
+        x    = (w - tw) // 2
+        y    = y_base + i * line_h
+        # Black pill backdrop
+        pad = 22
+        draw.rounded_rectangle(
+            [(x - pad, y - 10), (x + tw + pad, y + 80)],
+            radius=16,
+            fill=(0, 0, 0, int(alpha * 0.65)),
+        )
+        # Outline
+        for ox, oy in [(-4,0),(4,0),(0,-4),(0,4),(-3,-3),(3,3),(-3,3),(3,-3)]:
+            draw.text((x + ox, y + oy), line, font=font, fill=(0, 0, 0, alpha))
+        # Check if this line's words include the highlighted word
+        line_words = line.split()
+        flat_idx   = sum(len(l.split()) for l in lines[:i])
+        is_active  = flat_idx <= current_word_idx < flat_idx + len(line_words)
+        color = (255, 50, 200, alpha) if is_active else (255, 255, 255, alpha)
+        draw.text((x, y), line, font=font, fill=color)
+    return np.array(img)
+# ─────────────────────────────────────────────────────────────
+# STYLE-SPECIFIC SCENE PROCESSORS
+# ─────────────────────────────────────────────────────────────
+def _build_motion_graphics_scene(base_clip, scene: dict, dur: float, fps: int):
+    """
+    Motion Graphics:
+    - Cinematic color grade
+    - Subtle zoom-in over scene duration
+    - Animated slide-up text overlay
+    - Vignette final composite
+    """
+    from moviepy import VideoClip, CompositeVideoClip, ImageClip
+    graded = _apply_color_grade(base_clip, "cinematic")
+    # Slow zoom-in (1.0x → 1.06x)
+    w, h = base_clip.w, base_clip.h
+    def zoom_frame(t):
+        frame  = graded.get_frame(t)
+        scale  = 1.0 + 0.06 * (t / dur)
+        nw, nh = int(w * scale), int(h * scale)
+        img    = Image.fromarray(frame).resize((nw, nh), Image.Resampling.LANCZOS)
+        xoff   = (nw - w) // 2
+        yoff   = (nh - h) // 2
+        return np.array(img.crop((xoff, yoff, xoff + w, yoff + h)))
+    zoomed = VideoClip(zoom_frame, duration=dur).with_fps(fps)
+    # Animated caption
+    caption_text = scene.get("on_screen_text", "")
+    def caption_frame(t):
+        return _caption_motion_graphics(caption_text, w, h, t, dur)
+    caption_layer = (
+        VideoClip(caption_frame, duration=dur)
+        .with_fps(fps)
+    )
+    # Vignette
+    vignette_arr  = _make_vignette(w, h, strength=0.55)
+    vignette_clip = ImageClip(vignette_arr).with_duration(dur).with_fps(fps)
+    return CompositeVideoClip([zoomed, caption_layer, vignette_clip])
+def _build_montage_scene(base_clip, scene: dict, dur: float, fps: int, scene_idx: int):
+    """
+    Montage:
+    - Fast cuts already handled at concat level (short dur)
+    - Speed ramp on most scenes
+    - Zoom punch at scene start
+    - Warm/saturated grade
+    - Bold impact text
+    - White flash overlay at start
+    """
+    from moviepy import VideoClip, CompositeVideoClip
+    w, h = base_clip.w, base_clip.h
+    # Warm color grade
+    graded = _apply_color_grade(base_clip, "warm")
+    # Zoom punch at t=0 for every other scene
+    if scene_idx % 2 == 0:
+        punched = _zoom_punch(graded, at_time=0.0, zoom_factor=1.18, hold=0.10)
+    else:
+        punched = graded
+    # Speed ramp for longer scenes
+    if dur > 4.0:
+        try:
+            punched = _speed_ramp(punched)
+        except Exception:
+            pass
+    # Flash overlay (first 0.18s of scene)
+    flash_dur = min(0.18, dur * 0.15)
+    flash     = _make_flash_overlay(w, h, dur=flash_dur, fps=fps)
+    # Bold caption
+    caption_text = scene.get("on_screen_text", "")
+    def caption_frame(t):
+        return _caption_montage(caption_text, w, h, t, dur)
+    cap_layer = VideoClip(caption_frame, duration=dur).with_fps(fps)
+    # Flash only first flash_dur seconds
+    def flash_frame(t):
+        if t < flash_dur:
+            return flash.get_frame(t)
+        return np.zeros((h, w, 3), dtype=np.uint8)
+    flash_layer = VideoClip(flash_frame, duration=dur).with_fps(fps)
+    return CompositeVideoClip([punched, flash_layer, cap_layer])
+def _build_documentary_scene(base_clip, scene: dict, dur: float, fps: int):
+    """
+    Documentary:
+    - Ken Burns on image clips, smooth playback on video
+    - Cold/neutral grade
+    - Slow cross-fade handled at concat level
+    - Lower-third text with professional styling
+    """
+    from moviepy import VideoClip, CompositeVideoClip
+    w, h = base_clip.w, base_clip.h
+    # Determine if it's a static image (no motion)
+    is_image = not hasattr(base_clip, "reader")
+    if is_image or dur > 8:
+        directions = ["left", "right", "center"]
+        direction  = random.choice(directions)
+        zoomed     = _ken_burns(base_clip, zoom_in=(dur % 2 == 0), direction=direction)
+    else:
+        zoomed = base_clip
+    graded = _apply_color_grade(zoomed, "cold")
+    # Lower-third text
+    caption_text = scene.get("on_screen_text", "")
+    def caption_frame(t):
+        return _caption_documentary(caption_text, w, h, t, dur)
+    cap_layer = VideoClip(caption_frame, duration=dur).with_fps(fps)
+    # Vignette
+    from moviepy import ImageClip
+    vignette_arr  = _make_vignette(w, h, strength=0.4)
+    vignette_clip = ImageClip(vignette_arr).with_duration(dur).with_fps(fps)
+    return CompositeVideoClip([graded, cap_layer, vignette_clip])
+def _build_social_media_scene(base_clip, scene: dict, dur: float, fps: int, is_hook: bool):
+    """
+    Social Media (TikTok/Reels style):
+    - Glitch/aberration on hook scene
+    - Zoom punch on every scene
+    - Bright warm grade
+    - Karaoke-style pop captions
+    - Slight tilt/rotation on some scenes
+    """
+    from moviepy import VideoClip, CompositeVideoClip
+    w, h = base_clip.w, base_clip.h
+    # Glitch on hook
+    if is_hook:
+        graded_raw = _apply_color_grade(base_clip, "warm")
+        graded     = _chromatic_aberration(graded_raw, intensity=7)
+    else:
+        graded = _apply_color_grade(base_clip, "warm")
+    # Zoom punch at t=0 (social media always punches in)
+    punched = _zoom_punch(graded, at_time=0.0, zoom_factor=1.12, hold=0.08)
+    # Social caption
+    caption_text = scene.get("on_screen_text", "")
+    def caption_frame(t):
+        return _caption_social_media(caption_text, w, h, t, dur)
+    cap_layer = VideoClip(caption_frame, duration=dur).with_fps(fps)
+    return CompositeVideoClip([punched, cap_layer])
+# ─────────────────────────────────────────────────────────────
+# TRANSITION HELPERS
+# ─────────────────────────────────────────────────────────────
+def _crossfade_concat(clips: list, crossfade_dur: float = 0.5):
+    """Concatenate clips with a crossfade dissolve between them."""
+    from moviepy import CompositeVideoClip, concatenate_videoclips
+    if len(clips) <= 1 or crossfade_dur <= 0:
+        return concatenate_videoclips(clips, method="compose")
+    result = clips[0]
+    for next_clip in clips[1:]:
+        # Create overlap: fade the tail of result into next_clip
+        try:
+            fade_out = result.faded_out(crossfade_dur)
+            fade_in  = next_clip.faded_in(crossfade_dur)
+            overlap_start = result.duration - crossfade_dur
+            result = CompositeVideoClip([
+                result.with_end(result.duration),
+                next_clip.with_start(overlap_start),
+            ], use_bgclip=True)
+            result = result.with_duration(overlap_start + next_clip.duration)
+        except Exception:
+            result = concatenate_videoclips([result, next_clip], method="compose")
+    return result
+# ─────────────────────────────────────────────────────────────
+# MAIN PUBLIC API
+# ─────────────────────────────────────────────────────────────
+def assemble_video(
+    video_json: dict,
+    audio_results: list,
+    media_results: list,
+    editing_style: str = "social_media",
+    output_filename: str | None = None,
+) -> str:
+    """
+    Assemble the final 9:16 MP4 with a chosen modern editing style.
+    Args:
+        video_json:     Parsed JSON from agent
+        audio_results:  From voice_generator.generate_voiceovers()
+        media_results:  From asset_checker.check_all_assets()
+        editing_style:  One of: motion_graphics | montage | documentary | social_media
+        output_filename: Optional custom filename
+    Returns:
+        Absolute path string to the final MP4
+    """
+    from moviepy import AudioFileClip, concatenate_videoclips
+    style = editing_style if editing_style in EDITING_STYLES else "social_media"
+    logger.info(f"[Assembler] 🎬 Style: {style.upper().replace('_',' ')}")
+    audio_map  = {r["scene_number"]: r for r in audio_results}
+    media_map  = {r["scene_number"]: r for r in media_results}
+    scenes     = video_json["scenes"]
+    styled_clips = []
+    for idx, scene in enumerate(scenes):
+        num     = scene["scene_number"]
+        caption = scene.get("on_screen_text", "")
+        audio_info = audio_map.get(num)
+        media_info = media_map.get(num, {})
+        # Use real TTS duration if available
+        target_dur = float(scene.get("duration_seconds", 5))
+        if audio_info and audio_info.get("duration", 0) > 0:
+            target_dur = audio_info["duration"]
+        logger.info(f"[Assembler] Scene {num} ({style}) — {target_dur:.1f}s")
+        media_path = media_info.get("path") if media_info.get("approved", True) else None
+        base = _prepare_base_clip(media_path or "", target_dur)
+        # Apply style engine
+        try:
+            if style == "motion_graphics":
+                styled = _build_motion_graphics_scene(base, scene, target_dur, TARGET_FPS)
+            elif style == "montage":
+                styled = _build_montage_scene(base, scene, target_dur, TARGET_FPS, idx)
+            elif style == "documentary":
+                styled = _build_documentary_scene(base, scene, target_dur, TARGET_FPS)
+            elif style == "social_media":
+                is_hook = (scene.get("type") == "hook" or idx == 0)
+                styled  = _build_social_media_scene(base, scene, target_dur, TARGET_FPS, is_hook)
+            else:
+                styled = base
+        except Exception as e:
+            logger.warning(f"[Assembler] Style engine failed on scene {num}: {e} — using base")
+            styled = base
+        # Attach voiceover audio
+        if audio_info and audio_info.get("path") and Path(audio_info["path"]).exists():
+            try:
+                raw_audio = AudioFileClip(audio_info["path"])
+                # Safe duration: if TTS is shorter than scene, don't try to extend it (causes crash)
+                audio_dur = min(target_dur, raw_audio.duration)
+                audio     = raw_audio.subclipped(0, audio_dur).with_duration(audio_dur)
+                styled    = styled.with_audio(audio)
+            except Exception as e:
+                logger.warning(f"[Assembler] Audio attach failed scene {num}: {e}")
+        styled_clips.append(styled)
+    if not styled_clips:
+        raise RuntimeError("No clips built — check media and audio outputs.")
+    # ── Concatenation strategy per style ─────────────────
+    logger.info("[Assembler] Concatenating scenes...")
+    try:
+        if style == "documentary":
+            final = _crossfade_concat(styled_clips, crossfade_dur=0.5)
+        elif style == "montage":
+            # Hard cuts — no transition
+            final = concatenate_videoclips(styled_clips, method="compose")
+        elif style == "motion_graphics":
+            final = _crossfade_concat(styled_clips, crossfade_dur=0.35)
+        else:  # social_media
+            final = concatenate_videoclips(styled_clips, method="compose")
+    except Exception as e:
+        logger.warning(f"[Assembler] Crossfade failed ({e}), falling back to hard cuts")
+        final = concatenate_videoclips(styled_clips, method="compose")
+    # ── Output ────────────────────────────────────────────
+    if not output_filename:
+        ts         = datetime.now().strftime("%Y%m%d_%H%M%S")
+        niche_slug = video_json.get("niche", "video").lower().replace(" ", "_")
+        output_filename = f"{niche_slug}_{style}_{ts}.mp4"
+    out_path = OUTPUT_DIR / output_filename
+    logger.info(f"[Assembler] Rendering → {out_path}")
+    final.write_videofile(
+        str(out_path),
+        fps=TARGET_FPS,
+        codec="libx264",
+        audio_codec="aac",
+        temp_audiofile="temp_audio.m4a",
+        remove_temp=True,
+        logger=None,
+        preset="medium",
+    )
+    logger.info(f"[Assembler] ✅ Done → {out_path}")
+    return str(out_path.resolve())

voice_generator.py ADDED Viewed

	@@ -0,0 +1,259 @@

+"""
+voice_generator.py
+─────────────────────────────────────────────────────────────
+Autonomous Short-Form Video Engine — Voiceover Layer
+Uses Google Cloud Text-to-Speech (free tier: 1M chars/month)
+to generate per-scene MP3 audio from the agent's JSON output.
+Fallback: set USE_EDGE_TTS=true in .env to use edge-tts
+          (no Google credentials needed).
+─────────────────────────────────────────────────────────────
+"""
+import os
+import asyncio
+import logging
+from pathlib import Path
+from dotenv import load_dotenv
+from mutagen.mp3 import MP3
+load_dotenv()
+logger = logging.getLogger(__name__)
+# ── Hugging Face / Cloud Secrets ──────────────────────────
+# If GOOGLE_APPLICATION_CREDENTIALS_JSON is provided as a secret,
+# write it to a temp file and set the path.
+if os.getenv("GOOGLE_APPLICATION_CREDENTIALS_JSON"):
+    import json
+    import tempfile
+    try:
+        creds_json = os.getenv("GOOGLE_APPLICATION_CREDENTIALS_JSON")
+        with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.json') as tf:
+            tf.write(creds_json)
+            os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = tf.name
+            logger.info(f"✅ Google Cloud credentials written to {tf.name}")
+    except Exception as e:
+        logger.error(f"❌ Failed to process Google Cloud credentials JSON: {e}")
+AUDIO_DIR = Path("assets/audio")
+AUDIO_DIR.mkdir(parents=True, exist_ok=True)
+USE_EDGE_TTS = os.getenv("USE_EDGE_TTS", "false").lower() == "true"
+# ── Voice mapping by mood ─────────────────────────────────
+# Google Cloud TTS Neural2 voices (en-US)
+GOOGLE_VOICE_MAP = {
+    ("calm",        "male"):   ("en-US-Neural2-D", 0.95),
+    ("calm",        "female"): ("en-US-Neural2-F", 0.92),
+    ("energetic",   "male"):   ("en-US-Neural2-J", 1.05),
+    ("energetic",   "female"): ("en-US-Neural2-G", 1.08),
+    ("inspirational","male"):  ("en-US-Neural2-A", 1.00),
+    ("inspirational","female"):("en-US-Neural2-C", 0.98),
+    ("monotone",    "male"):   ("en-US-Neural2-I", 0.90),
+    ("monotone",    "female"): ("en-US-Neural2-E", 0.90),
+    ("neutral",     "neutral"):("en-US-Neural2-D", 1.00),
+}
+# edge-tts voice map (fallback)
+EDGE_VOICE_MAP = {
+    "English": {
+        ("calm",        "male"):   ("en-US-GuyNeural",     {"rate": "-5%",  "pitch": "-2Hz"}),
+        ("calm",        "female"): ("en-US-JennyNeural",   {"rate": "-5%",  "pitch": "+0Hz"}),
+        ("energetic",   "male"):   ("en-US-GuyNeural",     {"rate": "+10%", "pitch": "+2Hz"}),
+        ("energetic",   "female"): ("en-US-AriaNeural",    {"rate": "+10%", "pitch": "+4Hz"}),
+        ("inspirational","male"):  ("en-US-GuyNeural",     {"rate": "+0%",  "pitch": "+0Hz"}),
+        ("inspirational","female"):("en-US-JennyNeural",   {"rate": "+0%",  "pitch": "+2Hz"}),
+        ("monotone",    "male"):   ("en-US-GuyNeural",     {"rate": "-10%", "pitch": "-5Hz"}),
+        ("monotone",    "female"): ("en-US-JennyNeural",   {"rate": "-10%", "pitch": "-3Hz"}),
+        ("neutral",     "neutral"):("en-US-GuyNeural",     {"rate": "+0%",  "pitch": "+0Hz"}),
+    },
+    "Hindi": {
+        ("calm",        "male"):   ("hi-IN-MadhurNeural",  {"rate": "-5%",  "pitch": "+0Hz"}),
+        ("calm",        "female"): ("hi-IN-SwaraNeural", {"rate": "-5%",  "pitch": "+0Hz"}),
+        ("energetic",   "male"):   ("hi-IN-MadhurNeural",  {"rate": "+5%",  "pitch": "+0Hz"}),
+        ("energetic",   "female"): ("hi-IN-SwaraNeural", {"rate": "+5%",  "pitch": "+0Hz"}),
+        ("neutral",     "neutral"):("hi-IN-MadhurNeural",  {"rate": "+0%",  "pitch": "+0Hz"}),
+    }
+}
+EDGE_VOICE_MAP["Hinglish"] = EDGE_VOICE_MAP["Hindi"] # Use Hindi for Hinglish
+# ── Premium Humanized Voices ─────────────────────────────
+# Hand-picked for high quality and natural sound
+PREMIUM_VOICES = {
+    # Google Cloud TTS
+    "google_male_1":   ("en-US-Neural2-D", 1.0,  "Google - Professional (Male)"),
+    "google_male_2":   ("en-US-Neural2-J", 1.0,  "Google - Energetic (Male)"),
+    "google_female_1": ("en-US-Neural2-F", 1.0,  "Google - Soft (Female)"),
+    "google_female_2": ("en-US-Neural2-C", 1.0,  "Google - Authoritative (Female)"),
+    # edge-tts (no auth)
+    "edge_male_1":     ("en-US-GuyNeural",     1.0,  "Edge - Mature (Male)"),
+    "edge_male_2":     ("en-US-ChristopherNeural", 1.0, "Edge - Friendly (Male)"),
+    "edge_female_1":   ("en-US-JennyNeural",   1.0,  "Edge - Conversational (Female)"),
+    "edge_female_2":   ("en-US-AvaNeural",     1.0,  "Edge - Bright (Female)"),
+    "edge_female_3":   ("en-GB-SoniaNeural",   1.0,  "Edge - British (Female)"),
+    # Hindi / Hinglish (Google)
+    "google_hindi_1":  ("hi-IN-Neural2-A", 1.0,  "Google - Hindi (Male)"),
+    "google_hindi_2":  ("hi-IN-Neural2-C", 1.0,  "Google - Hindi (Female)"),
+    # Hindi / Hinglish (Edge)
+    "edge_hindi_1":    ("hi-IN-MadhurNeural", 1.0,  "Edge - Hindi (Male)"),
+    "edge_hindi_2":    ("hi-IN-SwaraNeural", 1.0,  "Edge - Hindi (Female)"),
+}
+def _get_audio_duration(path: Path) -> float:
+    """Return duration of an MP3 file in seconds."""
+    try:
+        audio = MP3(str(path))
+        return audio.info.length
+    except Exception:
+        return 0.0
+# ── Google TTS backend ────────────────────────────────────
+def _synthesize_google(text: str, scene_num: int, mood: str, gender: str, voice_id: str = None) -> Path:
+    """Generate speech via Google Cloud TTS."""
+    from google.cloud import texttospeech
+    if voice_id and voice_id in PREMIUM_VOICES:
+        voice_name, speed, _ = PREMIUM_VOICES[voice_id]
+    else:
+        key = (mood, gender)
+        fallback_key = ("neutral", "neutral")
+        voice_name, speed = GOOGLE_VOICE_MAP.get(key, GOOGLE_VOICE_MAP.get(fallback_key, ("en-US-Neural2-D", 1.0)))
+    tts_client = texttospeech.TextToSpeechClient()
+    input_text = texttospeech.SynthesisInput(text=text)
+    voice = texttospeech.VoiceSelectionParams(
+        language_code="en-US",
+        name=voice_name,
+    )
+    audio_config = texttospeech.AudioConfig(
+        audio_encoding=texttospeech.AudioEncoding.MP3,
+        speaking_rate=speed,
+    )
+    response = tts_client.synthesize_speech(
+        input=input_text, voice=voice, audio_config=audio_config
+    )
+    out_path = AUDIO_DIR / f"scene_{scene_num:02d}.mp3"
+    out_path.write_bytes(response.audio_content)
+    logger.info(f"[Voice/Google] Scene {scene_num} → {out_path.name}")
+    return out_path
+# ── edge-tts backend ─────────────────────────────────────
+async def _synthesize_edge_async(text: str, scene_num: int, mood: str, gender: str, voice_id: str = None, language: str = "English") -> Path:
+    """Generate speech via edge-tts (no auth required)."""
+    import edge_tts
+    out_path = AUDIO_DIR / f"scene_{scene_num:02d}.mp3"
+    if voice_id and voice_id in PREMIUM_VOICES:
+        voice_name, speed, _ = PREMIUM_VOICES[voice_id]
+        communicate = edge_tts.Communicate(text, voice_name, rate=f"{int((speed-1)*100):+d}%")
+    else:
+        key = (mood, gender)
+        fallback_key = ("neutral", "neutral")
+        # Select map based on language
+        lang_map = EDGE_VOICE_MAP.get(language, EDGE_VOICE_MAP["English"])
+        voice, settings = lang_map.get(key, lang_map.get(fallback_key))
+        communicate = edge_tts.Communicate(
+            text,
+            voice,
+            rate=settings.get("rate", "+0%"),
+            pitch=settings.get("pitch", "+0Hz"),
+        )
+    await communicate.save(str(out_path))
+    logger.info(f"[Voice/edge-tts] Scene {scene_num} ({language}) → {out_path.name}")
+    return out_path
+def _synthesize_edge(text: str, scene_num: int, mood: str, gender: str, voice_id: str = None, language: str = "English") -> Path:
+    return asyncio.run(_synthesize_edge_async(text, scene_num, mood, gender, voice_id=voice_id, language=language))
+# ── Public API ────────────────────────────────────────────
+def generate_voiceovers(video_json: dict, voice_id: str = None) -> list[dict]:
+    """
+    Generate voiceover MP3s for all scenes in the video JSON.
+    Args:
+        video_json: Parsed dict from agent.generate_video_package()
+        voice_id:   Explicit voice selection from PREMIUM_VOICES
+    Returns:
+        List of dicts: [{"scene_number": 1, "path": "assets/audio/scene_01.mp3",
+                          "duration": 5.2}, ...]
+    """
+    settings = video_json.get("voiceover_settings", {})
+    mood   = settings.get("mood", "calm")
+    gender = settings.get("gender_preference", "male")
+    results = []
+    # Decide back-end based on voice_id or env toggle
+    use_edge = USE_EDGE_TTS
+    if voice_id and voice_id.startswith("edge_"):
+        use_edge = True
+    elif voice_id and voice_id.startswith("google_"):
+        use_edge = False
+    language = video_json.get("language", "English")
+    for scene in video_json["scenes"]:
+        num  = scene["scene_number"]
+        text = scene["script_text"]
+        if use_edge:
+            path = _synthesize_edge(text, num, mood, gender, voice_id=voice_id, language=language)
+        else:
+            # Google synthesis would need similar language-aware mapping update if used
+            path = _synthesize_google(text, num, mood, gender, voice_id=voice_id)
+        duration = _get_audio_duration(path)
+        results.append({
+            "scene_number": num,
+            "path": str(path),
+            "duration": round(duration, 2),
+        })
+    logger.info(f"[Voice] ✅ Generated {len(results)} audio files.")
+    return results
+# ── CLI Test ──────────────────────────────────────────────
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--test", action="store_true", help="Run with sample text")
+    args = parser.parse_args()
+    if args.test:
+        sample = {
+            "scenes": [
+                {
+                    "scene_number": 1,
+                    "script_text": "Did you know that AI tools can save you 3 hours every single day?",
+                },
+                {
+                    "scene_number": 2,
+                    "script_text": "Here are the top 3 AI productivity tools you need right now.",
+                },
+            ],
+            "voiceover_settings": {
+                "mood": "energetic",
+                "gender_preference": "male",
+                "pace": "normal",
+            },
+        }
+        print("🎙  Running voiceover test...")
+        results = generate_voiceovers(sample)
+        for r in results:
+            print(f"  Scene {r['scene_number']}: {r['path']} ({r['duration']}s)")