--- license: apache-2.0 tags: - routing - code - mlx - pid - cascade library_name: mlx --- # Vibe Coding Router v5 A three-tier cascaded router for coding tasks that routes prompts between: - **Local**: Qwen3-Coder-Next (80B/3B active MoE, on-device via MLX) - **Sonnet**: Claude Sonnet 4.6 (medium-complexity cloud) - **Opus**: Claude Opus 4.6 (max-capability cloud) ## What's New in v5 v4 suffered from **inverted routing** — simple queries went to cloud while complex ones stayed local. Root cause: length-quality anti-correlation in training data combined with PID loss reward-weight amplification. v5 fixes this with: 1. **7 new complexity features** (45 handcrafted total): `is_coding_task`, `junk_score`, `scope_breadth`, `imperative_verb_density`, `noun_phrase_density`, `interaction_complexity`, `requirement_clause_count` 2. **Centered complexity premium**: Adjusts training margins by `premium * (complexity_score - center)` so complex tasks push toward cloud and simple tasks push toward local 3. **Junk prompt clamping**: 75 junk/greeting prompts neutralized (p_teacher=0.5, margin=0.0) 4. **Reward weight cap**: PID loss reward_weight capped at 0.5 to prevent outlier margin dominance ## Architecture Two cascaded binary MLP routers trained with **Privileged Information Distillation (PID)**: - **Router A** (local vs cloud): 77-dim → [32, 16] → 1, dropout=0.2, LayerNorm+ReLU - **Router B** (sonnet vs opus): 77-dim → [128, 64] → 1, dropout=0.0, LayerNorm+ReLU Features: 45 handcrafted code features + 32 PCA-reduced sentence embeddings (all-MiniLM-L6-v2). ## Training - **Data**: 1,644 coding prompts with real quality scores from all three models - **Judge**: GPT-5.4 scoring correctness, completeness, code quality, explanation - **Loss**: PID (reward-weighted CE + KL divergence), β_kl=0.02, reward_cap=0.5 - **Label smoothing**: ε=0.05, cost-aware margin for Router B (cost_premium=0.03) - **Complexity premium**: 2.0, centered at 0.3 - **HP sweep**: 108 configurations, 3-way split (1150 train / 247 val / 247 test) - **Threshold A**: 0.60 (manually tuned for routing behavior — see note below) - **Threshold B**: 0.474 (calibrated on validation set) ### Threshold Note The utility-optimal Router A threshold (0.01) routes almost nothing to local because cloud quality is genuinely equal or better on nearly all prompts. The manual threshold of 0.60 trades ~1.4% utility for correct routing intuition: simple/fast tasks run locally with zero latency, while complex tasks go to cloud. ## Real-World Routing (28 test queries, threshold_a=0.60) | Category | Local | Sonnet | Opus | |----------|-------|--------|------| | Simple (8) | 5 (62%) | 0 | 3 (38%) | | Medium (8) | 3 (38%) | 0 | 5 (62%) | | Complex (6) | 1 (17%) | 1 (17%) | 4 (67%) | v4 comparison: simple→local was 0/8 (now 5/8), complex→local was 6/6 (now 1/6). ## Test Set Results (calibrated thresholds) | Metric | Value | |--------|-------| | Utility | 0.6205 | | Oracle Utility | 0.7179 | | Regret | 0.0973 | ## Files - `router_a.safetensors` — Router A weights (32×16 MLP, 13KB) - `router_b.safetensors` — Router B weights (128×64 MLP, 76KB) - `config.json` — Model config, thresholds, HP, training results - `scaler.pkl` — StandardScaler for feature normalization - `embedding_extractor.pkl` — PCA-reduced sentence-transformers extractor - `sweep_results.json` — Full 108-config HP sweep results ## Usage ```python from router.three_tier_inference import ThreeTierRouter router = ThreeTierRouter("models/three_tier_v5") result = router.route("Write a Python function to sort a list") # result.decision: "local", "sonnet", or "opus" # result.p_cloud: probability of cloud routing # result.p_opus: probability of opus (if routed to cloud) ```