Spaces:

ymlin105
/

book-rec-with-LLMs

Running

App Files Files Community

ymlin105 commited on 1 day ago

Commit

52a0642

1 Parent(s): 65b86c6

feat: enhance recommendation system with improved routing, latency optimizations, and onboarding features

Browse files

Files changed (22) hide show

CHANGELOG.md +19 -1
benchmarks/benchmark.py +98 -16
benchmarks/locustfile.py +48 -0
benchmarks/results.md +61 -0
benchmarks/test_concurrent_benchmark.py +73 -0
docs/LATENCY_OPTIMIZATION.md +130 -0
docs/interview_guide.md +34 -4
requirements.txt +4 -0
scripts/model/train_intent_router.py +20 -2
src/config.py +13 -0
src/core/intent_prober.py +112 -0
src/core/recommendation_orchestrator.py +30 -5
src/core/reranker.py +118 -77
src/core/router.py +14 -5
src/main.py +93 -8
src/recall/fusion.py +5 -1
src/recommender.py +6 -2
src/services/recommend_service.py +25 -0
src/vector_db.py +2 -3
web/src/App.jsx +74 -4
web/src/api.js +18 -4
web/src/components/OnboardingModal.jsx +137 -0

CHANGELOG.md CHANGED Viewed

@@ -11,7 +11,25 @@ All notable changes to this project will be documented in this file.
 ## [Unreleased]
-*No changes — project frozen at v2.6.0*
 ### Added - 2026-01-29 (Frontend Refactor: React Router SPA)
 - **React Router SPA**: Refactored monolithic 960-line `App.jsx` into React Router architecture with 3 route pages and 5 reusable components.

 ## [Unreleased]
+### Fixed - Router heuristic fragility (intent_classifier)
+- **Model-based routing**: Trained `intent_classifier.pkl` (TF-IDF + LogisticRegression) with book title examples; router now uses model when available.
+- **SEED_DATA extended**: Added `War and Peace`, `The Lord of the Rings`, `Harry Potter`, `1984`, etc. (fast) and `books like War and Peace`, `similar to The Lord of the Rings` (deep) so model distinguishes book titles from recommendation-style queries.
+- **Fallback rules improved**: Replaced brittle `len(words) <= 2` with NL keyword detection (`ROUTER_NL_KEYWORDS`: like, similar, recommend, want, looking, ...). Short queries (≤6 words) without NL keywords → FAST; queries with NL keywords → DEEP.
+- **Config**: `natural_language_keywords` in router config; `ROUTER_NL_KEYWORDS` in `src/config.py`.
+### Added - Latency Optimizations (LATENCY_OPTIMIZATION.md)
+- **1. 裁剪候选集**: `RERANK_CANDIDATES_MAX=20` (env overridable); rerank top 20 instead of 50.
+- **2. ColBERT**: `RERANKER_BACKEND=colbert`; optional `llama-index-postprocessor-colbert-rerank`.
+- **3. Rerank 异步化**: `fast=true` skips rerank (~150ms); `async_rerank=true` returns RRF first, reranks in background, next request gets cached reranked.
+- **4. ONNX 量化**: `RERANKER_BACKEND=onnx` (default); `onnxruntime` for ~2x CrossEncoder speedup.
+- API: `POST /recommend` accepts `fast`, `async_rerank`; `web/src/api.js` updated.
+### Added - Cold-Start Optimizations (P0–P2)
+- **P0**: Popularity fallback — enabled Popularity channel by default; `RecallFusion` and `RecommendationService` fallback to popular books when all recall channels return empty.
+- **P0**: `recent_isbns` API param — `/api/recommend/personal` accepts comma-separated ISBNs from current session; injected into SASRec for 1-click cold-start convergence.
+- **P1**: Frontend passes `recent_isbns` — session-level tracking of viewed books; passed to personalized API on Start Discovery.
+- **P2**: Onboarding flow — `OnboardingModal` when new user (no collection); pick 3–5 books from popular list to seed preferences; `GET /api/onboarding/books`.
+- **P2**: Zero-shot intent probing — `src/core/intent_prober.py` uses LLM to infer categories/emotions/keywords from user query; `GET /api/intent/probe`; `intent_query` param on personal API seeds SASRec via semantic search when user has no history.
 ### Added - 2026-01-29 (Frontend Refactor: React Router SPA)
 - **React Router SPA**: Refactored monolithic 960-line `App.jsx` into React Router architecture with 3 route pages and 5 reusable components.

benchmarks/benchmark.py CHANGED Viewed

@@ -4,15 +4,23 @@ Performance Benchmark Script for Book Recommender System
 This script measures:
 1. Vector search latency
 2. End-to-end recommendation latency
-3. Throughput (queries per second)
 Usage:
     python benchmarks/benchmark.py
 """
 import sys
 import time
 import statistics
 from pathlib import Path
 # Add project root to path
@@ -82,11 +90,11 @@ def benchmark_full_recommendation(recommender: BookRecommender, n_runs: int = 30
 def benchmark_throughput(recommender: BookRecommender, duration_sec: int = 10) -> dict:
-    """Measure queries per second over a time window."""
     query_count = 0
     start = time.perf_counter()
     query_idx = 0
     while (time.perf_counter() - start) < duration_sec:
         recommender.get_recommendations_sync(
             TEST_QUERIES[query_idx % len(TEST_QUERIES)],
@@ -95,17 +103,63 @@ def benchmark_throughput(recommender: BookRecommender, duration_sec: int = 10) -
         )
         query_count += 1
         query_idx += 1
     elapsed = time.perf_counter() - start
     return {
-        "operation": "Throughput Test",
         "duration_sec": round(elapsed, 2),
         "total_queries": query_count,
         "qps": round(query_count / elapsed, 2),
     }
 def print_results(results: list[dict]):
     """Print benchmark results in a formatted table."""
     print("\n" + "=" * 70)
@@ -146,37 +200,65 @@ def save_results(results: list[dict], filepath: str = "benchmarks/results.md"):
         f.write("## Interpretation\n\n")
         f.write("- **Vector Search**: Time to query ChromaDB and retrieve top-k results\n")
         f.write("- **Full Recommendation**: End-to-end latency including filtering and formatting\n")
-        f.write("- **Throughput**: Sustained queries per second under load\n")
     print(f"\n✅ Results saved to {filepath}")
 def main():
     print("🚀 Initializing Book Recommender System...")
     print("   (This may take a moment to load models and vector database)")
     try:
         recommender = BookRecommender()
     except Exception as e:
         print(f"❌ Failed to initialize: {e}")
         return
     print("✅ System initialized. Starting benchmarks...\n")
     results = []
     # Benchmark 1: Vector Search
     print("📊 Running Vector Search benchmark...")
     results.append(benchmark_vector_search(recommender.vector_db))
     # Benchmark 2: Full Recommendation
     print("📊 Running Full Recommendation benchmark...")
     results.append(benchmark_full_recommendation(recommender))
-    # Benchmark 3: Throughput
-    print("📊 Running Throughput benchmark (10 seconds)...")
     results.append(benchmark_throughput(recommender))
     # Print and save results
     print_results(results)
     save_results(results)

 This script measures:
 1. Vector search latency
 2. End-to-end recommendation latency
+3. Throughput (queries per second, sequential)
+4. Concurrent throughput (QPS under N parallel workers)
 Usage:
     python benchmarks/benchmark.py
+    python benchmarks/benchmark.py --concurrent 5   # 5 concurrent workers
+Note: For HTTP-level load testing (simulating real users), use Locust:
+    pip install locust
+    locust -f benchmarks/locustfile.py --host=http://localhost:8000
 """
+import argparse
 import sys
 import time
 import statistics
+from concurrent.futures import ThreadPoolExecutor, as_completed
 from pathlib import Path
 # Add project root to path
 def benchmark_throughput(recommender: BookRecommender, duration_sec: int = 10) -> dict:
+    """Measure queries per second over a time window (sequential)."""
     query_count = 0
     start = time.perf_counter()
     query_idx = 0
     while (time.perf_counter() - start) < duration_sec:
         recommender.get_recommendations_sync(
             TEST_QUERIES[query_idx % len(TEST_QUERIES)],
         )
         query_count += 1
         query_idx += 1
     elapsed = time.perf_counter() - start
     return {
+        "operation": "Throughput Test (sequential)",
         "duration_sec": round(elapsed, 2),
         "total_queries": query_count,
         "qps": round(query_count / elapsed, 2),
     }
+def _run_one_query(recommender: BookRecommender, query: str) -> tuple[float, int]:
+    """Run a single recommendation and return (latency_ms, 1)."""
+    start = time.perf_counter()
+    recommender.get_recommendations_sync(query, category="All", tone="All")
+    return (time.perf_counter() - start) * 1000, 1
+def benchmark_concurrent(
+    recommender: BookRecommender,
+    n_workers: int = 5,
+    total_queries: int = 50,
+) -> dict:
+    """
+    Measure throughput under concurrent load using ThreadPoolExecutor.
+    Simulates N parallel clients to expose:
+    - VectorDB connection/query limits under load
+    - GIL contention if CPU-bound (embedding, rerank)
+    - I/O blocking in ChromaDB / LLM calls
+    """
+    queries = [TEST_QUERIES[i % len(TEST_QUERIES)] for i in range(total_queries)]
+    latencies: list[float] = []
+    start = time.perf_counter()
+    with ThreadPoolExecutor(max_workers=n_workers) as executor:
+        futures = [
+            executor.submit(_run_one_query, recommender, q) for q in queries
+        ]
+        for future in as_completed(futures):
+            lat_ms, _ = future.result()
+            latencies.append(lat_ms)
+    wall_sec = time.perf_counter() - start
+    return {
+        "operation": f"Concurrent Throughput ({n_workers} workers)",
+        "workers": n_workers,
+        "total_queries": total_queries,
+        "wall_sec": round(wall_sec, 2),
+        "qps": round(total_queries / wall_sec, 2),
+        "mean_latency_ms": round(statistics.mean(latencies), 2),
+        "median_latency_ms": round(statistics.median(latencies), 2),
+        "p95_latency_ms": round(sorted(latencies)[int(len(latencies) * 0.95)], 2),
+    }
 def print_results(results: list[dict]):
     """Print benchmark results in a formatted table."""
     print("\n" + "=" * 70)
         f.write("## Interpretation\n\n")
         f.write("- **Vector Search**: Time to query ChromaDB and retrieve top-k results\n")
         f.write("- **Full Recommendation**: End-to-end latency including filtering and formatting\n")
+        f.write("- **Throughput (sequential)**: Sustained QPS when processing one query at a time\n")
+        f.write("- **Concurrent Throughput**: QPS under N parallel workers; exposes GIL/IO bottlenecks\n")
     print(f"\n✅ Results saved to {filepath}")
 def main():
+    parser = argparse.ArgumentParser(description="Benchmark Book Recommender System")
+    parser.add_argument(
+        "--concurrent",
+        type=int,
+        default=0,
+        metavar="N",
+        help="Add concurrent benchmark with N workers (e.g. 5). 0 = skip.",
+    )
+    parser.add_argument(
+        "--concurrent-queries",
+        type=int,
+        default=50,
+        help="Total queries for concurrent benchmark (default: 50)",
+    )
+    args = parser.parse_args()
     print("🚀 Initializing Book Recommender System...")
     print("   (This may take a moment to load models and vector database)")
     try:
         recommender = BookRecommender()
     except Exception as e:
         print(f"❌ Failed to initialize: {e}")
         return
     print("✅ System initialized. Starting benchmarks...\n")
     results = []
     # Benchmark 1: Vector Search
     print("📊 Running Vector Search benchmark...")
     results.append(benchmark_vector_search(recommender.vector_db))
     # Benchmark 2: Full Recommendation
     print("📊 Running Full Recommendation benchmark...")
     results.append(benchmark_full_recommendation(recommender))
+    # Benchmark 3: Sequential Throughput
+    print("📊 Running Sequential Throughput benchmark (10 seconds)...")
     results.append(benchmark_throughput(recommender))
+    # Benchmark 4: Concurrent Throughput (optional)
+    if args.concurrent > 0:
+        print(f"📊 Running Concurrent Throughput ({args.concurrent} workers, {args.concurrent_queries} queries)...")
+        results.append(
+            benchmark_concurrent(
+                recommender,
+                n_workers=args.concurrent,
+                total_queries=args.concurrent_queries,
+            )
+        )
     # Print and save results
     print_results(results)
     save_results(results)

benchmarks/locustfile.py ADDED Viewed

	@@ -0,0 +1,48 @@

+"""
+Locust load test for Book Recommender API.
+Simulates concurrent HTTP requests to measure real-world throughput.
+Run API server first, then:
+    pip install locust
+    locust -f benchmarks/locustfile.py --host=http://localhost:8000
+Then open http://localhost:8089 to drive the load test.
+"""
+import random
+from locust import HttpUser, task, between
+# Mirror TEST_QUERIES from benchmark.py for consistency
+TEST_QUERIES = [
+    "a romantic comedy set in New York",
+    "a philosophical novel about the meaning of life",
+    "a fast-paced thriller with plot twists",
+    "a coming-of-age story about friendship and loss",
+    "a historical fiction set during World War II",
+    "a science fiction story about space exploration",
+    "a mystery novel with an unreliable narrator",
+    "a fantasy epic with dragons and magic",
+    "a memoir about overcoming adversity",
+    "a literary fiction exploring family dynamics",
+]
+class RecommenderUser(HttpUser):
+    """Simulates a user hitting the recommendation API."""
+    wait_time = between(0.5, 2.0)  # 0.5–2s between requests
+    @task(10)
+    def recommend(self):
+        """Primary: POST /recommend."""
+        q = random.choice(TEST_QUERIES)
+        self.client.post(
+            "/recommend",
+            json={"query": q, "category": "All"},
+        )
+    @task(1)
+    def health(self):
+        """Occasional health check."""
+        self.client.get("/health")

benchmarks/results.md ADDED Viewed

	@@ -0,0 +1,61 @@

+# Performance Benchmark Results
+**Date**: 2026-02-12 01:02:27
+## System Info
+- Dataset: 5,000+ books
+- Embedding Model: all-MiniLM-L6-v2 (384 dim)
+- Vector DB: ChromaDB with HNSW index
+## Results
+### Vector Search (k=50)
+| Metric | Value |
+|--------|-------|
+| runs | 50 |
+| mean_ms | 11.49 |
+| median_ms | 6.43 |
+| std_ms | 27.41 |
+| min_ms | 5.49 |
+| max_ms | 200.46 |
+| p95_ms | 15.51 |
+### Full Recommendation
+| Metric | Value |
+|--------|-------|
+| runs | 30 |
+| mean_ms | 3876.27 |
+| median_ms | 260.87 |
+| std_ms | 5445.93 |
+| min_ms | 14.54 |
+| max_ms | 16609.18 |
+| p95_ms | 11694.41 |
+### Throughput Test (sequential)
+| Metric | Value |
+|--------|-------|
+| duration_sec | 10.1 |
+| total_queries | 89 |
+| qps | 8.81 |
+### Concurrent Throughput (3 workers)
+| Metric | Value |
+|--------|-------|
+| workers | 3 |
+| total_queries | 12 |
+| wall_sec | 1.29 |
+| qps | 9.28 |
+| mean_latency_ms | 298.3 |
+| median_latency_ms | 370.19 |
+| p95_latency_ms | 579.95 |
+## Interpretation
+- **Vector Search**: Time to query ChromaDB and retrieve top-k results
+- **Full Recommendation**: End-to-end latency including filtering and formatting
+- **Throughput (sequential)**: Sustained QPS when processing one query at a time
+- **Concurrent Throughput**: QPS under N parallel workers; exposes GIL/IO bottlenecks

benchmarks/test_concurrent_benchmark.py ADDED Viewed

	@@ -0,0 +1,73 @@

+"""
+Quick test for concurrent benchmark logic without loading full recommender.
+Run: python benchmarks/test_concurrent_benchmark.py
+"""
+import sys
+import time
+import statistics
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).parent.parent))
+# Mock recommender that simulates ~100ms latency
+class MockRecommender:
+    def get_recommendations_sync(self, query: str, category: str = "All", tone: str = "All"):
+        time.sleep(0.1)
+        return [{"title": "Mock Book", "isbn": "123"}]
+TEST_QUERIES = ["query A", "query B", "query C"]
+def _run_one_query(recommender, query: str) -> tuple[float, int]:
+    start = time.perf_counter()
+    recommender.get_recommendations_sync(query, category="All", tone="All")
+    return (time.perf_counter() - start) * 1000, 1
+def benchmark_concurrent(recommender, n_workers: int = 5, total_queries: int = 15) -> dict:
+    queries = [TEST_QUERIES[i % len(TEST_QUERIES)] for i in range(total_queries)]
+    latencies = []
+    start = time.perf_counter()
+    with ThreadPoolExecutor(max_workers=n_workers) as executor:
+        futures = [executor.submit(_run_one_query, recommender, q) for q in queries]
+        for future in as_completed(futures):
+            lat_ms, _ = future.result()
+            latencies.append(lat_ms)
+    wall_sec = time.perf_counter() - start
+    return {
+        "operation": f"Concurrent ({n_workers} workers)",
+        "workers": n_workers,
+        "total_queries": total_queries,
+        "wall_sec": round(wall_sec, 2),
+        "qps": round(total_queries / wall_sec, 2),
+        "mean_latency_ms": round(statistics.mean(latencies), 2),
+    }
+def main():
+    mock = MockRecommender()
+    # Sequential: 15 * 100ms = ~1.5s
+    print("Sequential (1 worker):")
+    r1 = benchmark_concurrent(mock, n_workers=1, total_queries=15)
+    print(f"  wall_sec={r1['wall_sec']}, qps={r1['qps']}, mean_ms={r1['mean_latency_ms']}")
+    # Concurrent: 15 queries with 5 workers -> ~3 batches of 5 -> ~300ms
+    print("\nConcurrent (5 workers):")
+    r5 = benchmark_concurrent(mock, n_workers=5, total_queries=15)
+    print(f"  wall_sec={r5['wall_sec']}, qps={r5['qps']}, mean_ms={r5['mean_latency_ms']}")
+    # Concurrency should give ~5x speedup
+    speedup = r1["wall_sec"] / r5["wall_sec"]
+    print(f"\nSpeedup: {speedup:.1f}x (expected ~5x for 5 workers)")
+    assert r5["qps"] > r1["qps"], "Concurrent QPS should exceed sequential"
+    print("OK: Concurrent benchmark logic works correctly.")
+if __name__ == "__main__":
+    main()

docs/LATENCY_OPTIMIZATION.md ADDED Viewed

	@@ -0,0 +1,130 @@

+# Latency Optimization: Full Recommendation Pipeline
+## Current State
+| Metric | Value | Target (Spotify-style) |
+|--------|-------|------------------------|
+| P95 Full Recommendation | ~1250ms | < 100ms |
+| Mean | ~700–900ms | - |
+**面试官点评**: 在 Spotify，推荐接口通常要在 100ms 内返回。1.2s 对用户来说是可以感知的卡顿。
+---
+## Latency Breakdown
+Approximate warm-query breakdown (from `benchmarks/benchmark.py` + `docs/experiments/reports/rerank_report.md`):
+| Stage | Location | Latency | Notes |
+|-------|----------|---------|-------|
+| Router | `src/core/router.py` | ~1ms | Rule-based, fast |
+| Sparse (FTS5) | `vector_db._sparse_fts_search` | ~20–50ms | SQLite MATCH |
+| Dense (Chroma) | `vector_db.search` | ~50–100ms | HNSW + MiniLM |
+| RRF Fusion | `vector_db.hybrid_search` | ~5ms | In-memory |
+| **Cross-Encoder Rerank** | `src/core/reranker.py` | **~400–900ms** | **主要瓶颈** |
+| Metadata Enrichment | `enrich_and_format` | ~50–100ms | SQLite lookups |
+**Rerank 详情**:
+- 模型: `cross-encoder/ms-marco-MiniLM-L-6-v2`
+- 候选数: `max(k*4, 20)` = 50 (当 `k=10`)
+- 每个 (query, doc) pair 需完整前向传播
+- 50 对 × ~15–20ms/pair ≈ 750–1000ms
+---
+## Root Causes
+1. **Cross-Encoder 过重**
+   - 每对 (query, doc) 都要做完整 attention，无法像 Bi-Encoder 那样预计算 doc 向量
+   - 候选数 50 导致串行推理时间长
+2. **Benchmark 查询全部触发 Rerank**
+   - `TEST_QUERIES` 均为自然语言（如 "a romantic comedy set in New York"）
+   - Router 规则: `len(words) > 2` 且无 detail 关键词 → **DEEP** → `rerank=True`
+   - 所以每次 benchmark 都跑 Cross-Encoder
+3. **LangGraph Agentic 模式更慢**
+   - Router → Retrieve → Evaluate（LLM 调用）→ 可选 Web Fallback
+   - 串行执行，无并行优化
+---
+## Optimization Options
+### 1. 裁剪候选集（Quick Win）
+**当前**: `rerank_candidates = top_candidates[:max(k*4, 20)]` → 50 个
+**建议**: 降为 20 个，或通过 config 可配置
+```python
+# config.py
+RERANK_CANDIDATES_MAX = 20  # 从 50 降到 20，预期 latency 减半
+```
+**Trade-off**: 若 Top-20 中漏掉真实相关书，召回会略降；通常 20 足够覆盖。
+---
+### 2. ColBERT（Late Interaction）替代 Cross-Encoder
+**原理**: ColBERT 对 query 和 doc 分别编码，再用 token-level MaxSim 打分，doc 向量可预计算缓存。
+| 方案 | 推理方式 | 预计算 | 典型 Latency |
+|------|----------|--------|--------------|
+| Cross-Encoder | 每对 (q,d) 完整 forward | 否 | ~15–20ms/pair |
+| ColBERT | q 编码 1 次 + doc 向量 dot | 是（doc 可缓存） | ~2–5ms/doc |
+**实现要点**:
+- 使用 `colbert-ai/colbertv2` 或类似库
+- 预计算书籍描述的 token embeddings 存入向量库
+- 在线只需 encode query + 与候选 doc 向量做 MaxSim
+**Trade-off**: 需要额外索引建设和依赖，效果可能与 Cross-Encoder 相当或略逊。
+---
+### 3. Rerank 异步化
+**思路**: 先返回 Hybrid RRF 的 Top-K，再后台异步 Rerank，结果通过 WebSocket/轮询或下次请求返回。
+```
+用户请求 → 立即返回 RRF Top-10 (~150ms) → 后台 Rerank → 推送精排结果（可选）
+```
+**Trade-off**: 实现复杂，需改动 API 和前端；首屏结果质量略降。
+---
+### 4. ONNX 量化（已有规划）
+`rerank_report.md` 已提到: 使用 Cross-Encoder 的 ONNX 版本可获约 2x 加速。
+---
+### 5. 动态 Rerank 策略（已部分实现）
+Router 已对 ISBN/关键词 禁用 Rerank；可进一步收紧：
+- 仅当 query 长度 > 某阈值且非纯关键词时启用
+- 或增加「低延迟模式」：用户可选「快速」vs「精准」
+---
+## Implementation Status (v2.7+)
+| 优化 | 状态 | 说明 |
+|------|------|------|
+| 1. 裁剪候选集 | ✅ | `RERANK_CANDIDATES_MAX=20` (config), env 可覆盖 |
+| 2. ColBERT | ✅ | `RERANKER_BACKEND=colbert`, 需 `pip install llama-index-postprocessor-colbert-rerank` |
+| 3. Rerank 异步化 | ✅ | `fast=true` 跳过 rerank; `async_rerank=true` 先返 RRF，后台精排并缓存 |
+| 4. ONNX 量化 | ✅ | `RERANKER_BACKEND=onnx` (默认), 需 `onnxruntime` |
+### API 用法
+```bash
+# 快速模式 (~150ms)
+curl -X POST /recommend -d '{"query":"romantic comedy","fast":true}'
+# 异步精排：先返 RRF，下次同 query 返缓存精排
+curl -X POST /recommend -d '{"query":"romantic comedy","async_rerank":true}'
+```

docs/interview_guide.md CHANGED Viewed

@@ -98,7 +98,37 @@
 ## 🔬 深度技术问题 (Advanced Technical Q&A)
-### Q5. 负采样 (Negative Sampling)
 **问题**：你在 TECHNICAL_REPORT 中使用了 "Hard negative sampling from recall results"。这样做会不会导致 **False Negative** 问题（即把用户其实喜欢但没点击的物品当成了负样本）？在训练 DIN 或 LGBMRanker 时，你是如何平衡 Random Negatives 和 Hard Negatives 的比例的？这对模型收敛有什么影响？
@@ -114,7 +144,7 @@
 ---
-### Q6. 实时性 (Real-time / Near-line)
 **问题**：SASRec 主要是离线训练的。在 Spotify 场景下，如果用户刚刚连续听了 3 首 "Heavy Metal"，我们希望下一首推荐立刻跟上这个兴趣变化。在目前的架构下，如何将用户的**实时交互序列**（还没落库到 CSV）注入到 SASRec 或 DIN 的推理过程中？需要在 `RecommendationService` 里增加什么逻辑？
@@ -135,7 +165,7 @@
 ---
-### Q7. 评估指标：Diversity 与 Serendipity
 **问题**：目前关注的是 HR@10 和 NDCG。作为内容平台，发现推荐列表里全是热门书（Harry Potter 效应）。如果要求在不显著降低 Accuracy 的前提下，提升推荐结果的 **Diversity（多样性）** 和 **Serendipity（惊喜感）**，你会如何在 Ranking 阶段或 Rerank 阶段修改目标函数或逻辑？
@@ -162,7 +192,7 @@
 ## 📋 已知限制与改进方向 (Known Limitations & Improvement)
-### Q6. "Research" 风格的代码残留
 **现象**：代码库在向 production 演进过程中，仍保留了一些研究原型风格的痕迹。

 ## 🔬 深度技术问题 (Advanced Technical Q&A)
+### Q5. ChromaDB/SQLite 内存与扩展性：千万级迁移
+**问题**：你选择了 ChromaDB (embedded) 和 SQLite。这对于演示很好，但对于千万级 Item 的库（Spotify 级别），这不可行。**如何迁移到 Milvus/Qdrant？如何对 ANN 索引（HNSW）进行分片？**
+**考察点**：对向量数据库扩展性、分布式 ANN 的理解。
+**建议回答**：
+> 当前架构（ChromaDB + SQLite）适合 20 万级数据和演示。千万级规模下存在以下瓶颈：
+>
+> **ChromaDB**：嵌入式、单机、索引加载到内存。10M × 384 维 × 4B ≈ 15GB 向量，HNSW 图结构可能再放大 10–50 倍，单机内存和 CPU 无法支撑。
+>
+> **SQLite**：单文件、单写锁、磁盘 I/O 成为瓶颈。
+>
+> **迁移策略**：
+>
+> 1. **抽象 VectorStore 接口**：在 `vector_db.py` 中抽象 `VectorStoreInterface`，实现 `ChromaVectorStore`、`QdrantVectorStore`、`MilvusVectorStore`，通过配置切换，便于迁移。
+> 2. **选型**：Milvus 适合大数据、分析 + 检索、原生分布式；Qdrant 更轻量、纯向量检索。千万级两者皆可。
+> 3. **迁移步骤**：导出 Chroma 的 (id, embedding, metadata) → 在 Milvus/Qdrant 创建 Collection、配置 HNSW 参数 → 批量 upsert → 配置切换。
+>
+> **HNSW 分片**：
+>
+> - **按 ID 哈希分片**：`hash(id) % N` 分布到 N 个 shard，每 shard 内建 HNSW。查询时并发打 N 个 shard，各取 top_k，再 merge 取最终 top_k。
+> - **按 embedding 聚类分片**：K-Means 聚类，query 先定位所属簇，只查少数 shard（减少查询范围，但需处理冷启动和数据倾斜）。
+> - **利用 Milvus/Qdrant 内置能力**：两者都支持分布式分片，可直接使用其 Sharding 配置，无需自建。
+>
+> **与 Q4 的衔接**：metadata_store 的 SQLite 按 Q4 方案改造（Redis + PostgreSQL/Cassandra）； sparse 检索 FTS5 可迁移到 Elasticsearch/Meilisearch 做 hybrid。
+---
+### Q6. 负采样 (Negative Sampling)
 **问题**：你在 TECHNICAL_REPORT 中使用了 "Hard negative sampling from recall results"。这样做会不会导致 **False Negative** 问题（即把用户其实喜欢但没点击的物品当成了负样本）？在训练 DIN 或 LGBMRanker 时，你是如何平衡 Random Negatives 和 Hard Negatives 的比例的？这对模型收敛有什么影响？
 ---
+### Q7. 实时性 (Real-time / Near-line)
 **问题**：SASRec 主要是离线训练的。在 Spotify 场景下，如果用户刚刚连续听了 3 首 "Heavy Metal"，我们希望下一首推荐立刻跟上这个兴趣变化。在目前的架构下，如何将用户的**实时交互序列**（还没落库到 CSV）注入到 SASRec 或 DIN 的推理过程中？需要在 `RecommendationService` 里增加什么逻辑？
 ---
+### Q8. 评估指标：Diversity 与 Serendipity
 **问题**：目前关注的是 HR@10 和 NDCG。作为内容平台，发现推荐列表里全是热门书（Harry Potter 效应）。如果要求在不显著降低 Accuracy 的前提下，提升推荐结果的 **Diversity（多样性）** 和 **Serendipity（惊喜感）**，你会如何在 Ranking 阶段或 Rerank 阶段修改目标函数或逻辑？
 ## 📋 已知限制与改进方向 (Known Limitations & Improvement)
+### Q9. "Research" 风格的代码残留
 **现象**：代码库在向 production 演进过程中，仍保留了一些研究原型风格的痕迹。

requirements.txt CHANGED Viewed

@@ -24,6 +24,7 @@ langchain-openai
 transformers>=4.40.0
 torch
 sentence-transformers
 gensim>=4.3.0
 lightgbm
 xgboost>=2.0.0
@@ -43,6 +44,9 @@ requests
 # Intent classifier backends (optional)
 # fasttext  # Uncomment for FastText backend: pip install fasttext
 # LLM Agent & Fine-tuning
 faiss-cpu
 diffusers

 transformers>=4.40.0
 torch
 sentence-transformers
+onnxruntime>=1.16.0  # For CrossEncoder backend=onnx (~2x faster)
 gensim>=4.3.0
 lightgbm
 xgboost>=2.0.0
 # Intent classifier backends (optional)
 # fasttext  # Uncomment for FastText backend: pip install fasttext
+# Latency: ColBERT reranker (optional, RERANKER_BACKEND=colbert)
+# llama-index-postprocessor-colbert-rerank
 # LLM Agent & Fine-tuning
 faiss-cpu
 diffusers

scripts/model/train_intent_router.py CHANGED Viewed

@@ -67,6 +67,18 @@ SEED_DATA = [
     ("music", "fast"),
     ("art", "fast"),
     ("philosophy", "fast"),
     # deep: natural language, complex queries
     ("What are the best books about artificial intelligence for beginners", "deep"),
     ("I'm looking for something similar to Harry Potter", "deep"),
@@ -87,6 +99,12 @@ SEED_DATA = [
     ("Recommend me novels with strong female protagonists", "deep"),
     ("What to read to understand economics", "deep"),
     ("Books on meditation and mindfulness", "deep"),
 ]
@@ -144,8 +162,8 @@ def main():
                 pred = result.predict(sample)[0][0].replace("__label__", "")
             elif args.backend == "distilbert":
                 from transformers import pipeline
-            pipe = pipeline("zero-shot-classification", model="distilbert-base-uncased", device=-1)
-            pred = pipe(sample, INTENTS, multi_label=False)["labels"][0]
             else:
                 pred = result.predict([sample])[0]
             ok = "✓" if pred == intent else "✗"

     ("music", "fast"),
     ("art", "fast"),
     ("philosophy", "fast"),
+    # fast: book titles (keyword-like, BM25 works well)
+    ("War and Peace", "fast"),
+    ("The Lord of the Rings", "fast"),
+    ("Harry Potter", "fast"),
+    ("1984", "fast"),
+    ("To Kill a Mockingbird", "fast"),
+    ("The Great Gatsby", "fast"),
+    ("Pride and Prejudice", "fast"),
+    ("Dune", "fast"),
+    ("Sapiens", "fast"),
+    ("Atomic Habits", "fast"),
+    ("Deep Work", "fast"),
     # deep: natural language, complex queries
     ("What are the best books about artificial intelligence for beginners", "deep"),
     ("I'm looking for something similar to Harry Potter", "deep"),
     ("Recommend me novels with strong female protagonists", "deep"),
     ("What to read to understand economics", "deep"),
     ("Books on meditation and mindfulness", "deep"),
+    # deep: natural language with book references (need context, not just keyword)
+    ("books like War and Peace", "deep"),
+    ("similar to The Lord of the Rings", "deep"),
+    ("recommend something like Harry Potter", "deep"),
+    ("what to read after 1984", "deep"),
+    ("books similar to Sapiens", "deep"),
 ]
                 pred = result.predict(sample)[0][0].replace("__label__", "")
             elif args.backend == "distilbert":
                 from transformers import pipeline
+                pipe = pipeline("zero-shot-classification", model="distilbert-base-uncased", device=-1)
+                pred = pipe(sample, INTENTS, multi_label=False)["labels"][0]
             else:
                 pred = result.predict([sample])[0]
             ok = "✓" if pred == intent else "✗"

src/config.py CHANGED Viewed

@@ -32,6 +32,12 @@ CACHE_TTL = 3600  # 1 hour
 TOP_K_INITIAL = 50
 TOP_K_FINAL = 10
 # Debug mode: set DEBUG=1 to enable verbose logging (research prototype style)
 DEBUG = os.getenv("DEBUG", "0") == "1"
@@ -47,6 +53,10 @@ def _load_router_config() -> dict:
             "new", "newest", "latest", "recent", "modern", "contemporary", "current",
         ],
         "strong_freshness_keywords": ["newest", "latest"],
     }
     path = CONFIG_DIR / "router.json"
     if path.exists():
@@ -82,3 +92,6 @@ ROUTER_FRESHNESS_KEYWORDS: frozenset[str] = frozenset(
 ROUTER_STRONG_FRESHNESS_KEYWORDS: frozenset[str] = frozenset(
     str(k).lower() for k in _ROUTER_CFG.get("strong_freshness_keywords", [])
 )

 TOP_K_INITIAL = 50
 TOP_K_FINAL = 10
+# Latency: Rerank candidate cap (lower = faster, LATENCY_OPTIMIZATION.md)
+RERANK_CANDIDATES_MAX = int(os.getenv("RERANK_CANDIDATES_MAX", "20"))
+# Reranker backend: cross_encoder | onnx | colbert (onnx ~2x faster, colbert optional)
+RERANKER_BACKEND = os.getenv("RERANKER_BACKEND", "onnx")
 # Debug mode: set DEBUG=1 to enable verbose logging (research prototype style)
 DEBUG = os.getenv("DEBUG", "0") == "1"
             "new", "newest", "latest", "recent", "modern", "contemporary", "current",
         ],
         "strong_freshness_keywords": ["newest", "latest"],
+        "natural_language_keywords": [
+            "like", "similar", "recommend", "want", "looking", "books", "something",
+            "suggest", "recommendations", "after", "read", "if", "liked",
+        ],
     }
     path = CONFIG_DIR / "router.json"
     if path.exists():
 ROUTER_STRONG_FRESHNESS_KEYWORDS: frozenset[str] = frozenset(
     str(k).lower() for k in _ROUTER_CFG.get("strong_freshness_keywords", [])
 )
+ROUTER_NL_KEYWORDS: frozenset[str] = frozenset(
+    str(k).lower() for k in _ROUTER_CFG.get("natural_language_keywords", [])
+)

src/core/intent_prober.py ADDED Viewed

	@@ -0,0 +1,112 @@

+"""
+P2: Zero-shot intent probing for cold-start users.
+Uses LLM to infer categories, emotions, and keywords from a user's first query.
+When user has no history, this helps seed preferences for faster convergence.
+"""
+import json
+import re
+from typing import Optional
+from src.utils import setup_logger
+logger = setup_logger(__name__)
+# Categories we support (match router/metadata)
+KNOWN_CATEGORIES = [
+    "Fiction", "History", "Philosophy", "Science", "Art",
+    "Biography", "Mystery", "Romance", "Fantasy", "Science Fiction",
+    "Literary", "General",
+]
+EMOTION_KEYWORDS = [
+    "happy", "sad", "suspenseful", "angry", "surprising",
+    "heartbreaking", "uplifting", "thought-provoking", "relaxing",
+]
+def probe_intent(query: str, llm=None) -> dict:
+    """
+    Infer user intent from a short query (zero-shot, no history).
+    Returns:
+        dict with keys: categories, emotions, keywords, summary
+    """
+    if not query or not query.strip():
+        return {"categories": [], "emotions": [], "keywords": [], "summary": ""}
+    if llm is None:
+        try:
+            from src.core.llm import get_llm_model
+            import os
+            provider = os.getenv("LLM_PROVIDER", "ollama")
+            api_key = os.getenv("OPENAI_API_KEY") if provider == "openai" else None
+            llm = get_llm_model(provider=provider, api_key=api_key)
+        except Exception as e:
+            logger.warning(f"Intent prober: LLM not available ({e}), using rule-based fallback")
+            return _rule_based_intent(query)
+    prompt = f"""Analyze this book preference query and return JSON only.
+Query: "{query.strip()}"
+Extract:
+- categories: list of book categories from {KNOWN_CATEGORIES} that match (max 3)
+- emotions: list of emotions/moods from {EMOTION_KEYWORDS} that match (max 2)
+- keywords: 2-4 short searchable keywords (e.g. "WWII", "detective", "love story")
+- summary: one short sentence summarizing what the user wants
+Return only valid JSON, no markdown:
+{{"categories": [...], "emotions": [...], "keywords": [...], "summary": "..."}}"""
+    try:
+        response = llm.invoke(prompt)
+        text = response.content if hasattr(response, "content") else str(response)
+        # Extract JSON from response (handle markdown code blocks)
+        json_match = re.search(r"\{[^{}]*\}", text, re.DOTALL)
+        if json_match:
+            data = json.loads(json_match.group())
+            return {
+                "categories": data.get("categories", [])[:3],
+                "emotions": data.get("emotions", [])[:2],
+                "keywords": data.get("keywords", [])[:4],
+                "summary": data.get("summary", "")[:200],
+            }
+    except Exception as e:
+        logger.warning(f"Intent prober LLM failed: {e}")
+    return _rule_based_intent(query)
+def _rule_based_intent(query: str) -> dict:
+    """Fallback when LLM unavailable: simple keyword matching."""
+    lower = query.lower().strip()
+    categories = []
+    emotions = []
+    keywords = []
+    cat_map = {
+        "fiction": "Fiction", "history": "History", "philosophy": "Philosophy",
+        "science": "Science", "art": "Art", "mystery": "Mystery", "romance": "Romance",
+        "fantasy": "Fantasy", "sci-fi": "Science Fiction", "biography": "Biography",
+    }
+    for k, v in cat_map.items():
+        if re.search(r"\b" + re.escape(k) + r"\b", lower):
+            categories.append(v)
+    for e in EMOTION_KEYWORDS:
+        if re.search(r"\b" + re.escape(e) + r"\b", lower):
+            emotions.append(e)
+    # Extract likely keywords (words 4+ chars, not common)
+    stop = {"book", "books", "want", "like", "looking", "something", "that", "with", "the", "and"}
+    words = [w for w in re.findall(r"\b\w{4,}\b", lower) if w not in stop][:4]
+    keywords.extend(words)
+    return {
+        "categories": categories[:3] or ["General"],
+        "emotions": emotions[:2],
+        "keywords": keywords[:4],
+        "summary": query[:150] if query else "",
+    }

src/core/recommendation_orchestrator.py CHANGED Viewed

@@ -54,9 +54,13 @@ class RecommendationOrchestrator:
         tone: str = "All",
         user_id: str = "local",
         use_agentic: bool = False,
     ) -> List[Dict[str, Any]]:
         """
         Generate book recommendations. Async for web search fallback.
         """
         if not query or not query.strip():
             return []
@@ -67,17 +71,34 @@ class RecommendationOrchestrator:
             logger.info(f"Returning cached results for key: {cache_key}")
             return cached
-        logger.info(f"Processing request: query='{query}', category='{category}', use_agentic={use_agentic}")
         if use_agentic:
             results = await self._get_recommendations_agentic(query, category)
         else:
-            results = await self._get_recommendations_classic(query, category)
         if results:
             self.cache.set(cache_key, results)
         return results
     def get_recommendations_sync(
         self,
         query: str,
@@ -85,10 +106,12 @@ class RecommendationOrchestrator:
         tone: str = "All",
         user_id: str = "local",
         use_agentic: bool = False,
     ) -> List[Dict[str, Any]]:
         """Sync wrapper for scripts/CLI."""
         import asyncio
-        return asyncio.run(self.get_recommendations(query, category, tone, user_id, use_agentic))
     async def _get_recommendations_agentic(self, query: str, category: str) -> List[Dict[str, Any]]:
         """LangGraph workflow: Router -> Retrieve -> Evaluate -> (optional) Web Fallback."""
@@ -103,7 +126,7 @@ class RecommendationOrchestrator:
         books_list = final_state.get("isbn_list", [])
         return enrich_and_format(books_list, category, TOP_K_FINAL, "local", metadata_store_inst=self._meta)
-    async def _get_recommendations_classic(self, query: str, category: str) -> List[Dict[str, Any]]:
         """Classic Router -> Hybrid/Small-to-Big -> optional Web Fallback."""
         from src.core.router import QueryRouter
@@ -111,6 +134,8 @@ class RecommendationOrchestrator:
         decision = router.route(query)
         logger.info(f"Retrieval Strategy: {decision}")
         if decision["strategy"] == "small_to_big":
             recs = self.vector_db.small_to_big_search(query, k=TOP_K_INITIAL)
         else:
@@ -118,7 +143,7 @@ class RecommendationOrchestrator:
                 query,
                 k=TOP_K_INITIAL,
                 alpha=decision.get("alpha", 0.5),
-                rerank=decision["rerank"],
                 temporal=decision.get("temporal", False),
             )

         tone: str = "All",
         user_id: str = "local",
         use_agentic: bool = False,
+        fast: bool = False,
+        async_rerank: bool = False,
     ) -> List[Dict[str, Any]]:
         """
         Generate book recommendations. Async for web search fallback.
+        fast: Skip rerank for low latency (~150ms).
+        async_rerank: Return RRF immediately, rerank in background; next request gets cached reranked.
         """
         if not query or not query.strip():
             return []
             logger.info(f"Returning cached results for key: {cache_key}")
             return cached
+        logger.info(f"Processing request: query='{query}', category='{category}', use_agentic={use_agentic}, fast={fast}, async_rerank={async_rerank}")
+        skip_rerank = fast or async_rerank
         if use_agentic:
             results = await self._get_recommendations_agentic(query, category)
         else:
+            results = await self._get_recommendations_classic(query, category, skip_rerank=skip_rerank)
         if results:
             self.cache.set(cache_key, results)
+        if async_rerank and not use_agentic and skip_rerank:
+            import asyncio
+            asyncio.create_task(self._background_rerank_and_cache(query, category, cache_key))
         return results
+    async def _background_rerank_and_cache(self, query: str, category: str, cache_key: str) -> None:
+        """Run full pipeline with rerank and cache for async_rerank flow."""
+        try:
+            results = await self._get_recommendations_classic(query, category, skip_rerank=False)
+            if results:
+                self.cache.set(cache_key, results)
+                logger.info(f"Background rerank completed for query '{query[:30]}...'")
+        except Exception as e:
+            logger.warning(f"Background rerank failed: {e}")
     def get_recommendations_sync(
         self,
         query: str,
         tone: str = "All",
         user_id: str = "local",
         use_agentic: bool = False,
+        fast: bool = False,
+        async_rerank: bool = False,
     ) -> List[Dict[str, Any]]:
         """Sync wrapper for scripts/CLI."""
         import asyncio
+        return asyncio.run(self.get_recommendations(query, category, tone, user_id, use_agentic, fast, async_rerank))
     async def _get_recommendations_agentic(self, query: str, category: str) -> List[Dict[str, Any]]:
         """LangGraph workflow: Router -> Retrieve -> Evaluate -> (optional) Web Fallback."""
         books_list = final_state.get("isbn_list", [])
         return enrich_and_format(books_list, category, TOP_K_FINAL, "local", metadata_store_inst=self._meta)
+    async def _get_recommendations_classic(self, query: str, category: str, skip_rerank: bool = False) -> List[Dict[str, Any]]:
         """Classic Router -> Hybrid/Small-to-Big -> optional Web Fallback."""
         from src.core.router import QueryRouter
         decision = router.route(query)
         logger.info(f"Retrieval Strategy: {decision}")
+        do_rerank = decision["rerank"] and not skip_rerank
         if decision["strategy"] == "small_to_big":
             recs = self.vector_db.small_to_big_search(query, k=TOP_K_INITIAL)
         else:
                 query,
                 k=TOP_K_INITIAL,
                 alpha=decision.get("alpha", 0.5),
+                rerank=do_rerank,
                 temporal=decision.get("temporal", False),
             )

src/core/reranker.py CHANGED Viewed

@@ -1,104 +1,145 @@
-from typing import List, Tuple, Dict, Any
-from sentence_transformers import CrossEncoder
-import torch
 from src.utils import setup_logger
 logger = setup_logger(__name__)
-# 轻量级重排序模型，速度快且效果不错
 DEFAULT_RERANKER_MODEL = "cross-encoder/ms-marco-MiniLM-L-6-v2"
 class RerankerService:
     """
-    Singleton service for re-ranking documents using a Cross-Encoder.
-    This significantly improves RAG precision by scoring the exact relevance
-    of (query, document) pairs.
     """
     _instance = None
     def __new__(cls):
         if cls._instance is None:
             cls._instance = super(RerankerService, cls).__new__(cls)
             cls._instance.model = None
         return cls._instance
     def __init__(self):
         if self.model is None:
             self._load_model()
     def _load_model(self):
-        try:
-            device = "mps" if torch.backends.mps.is_available() else "cpu"
-            logger.info(f"Loading Reranker model: {DEFAULT_RERANKER_MODEL} on {device}...")
-            self.model = CrossEncoder(DEFAULT_RERANKER_MODEL, device=device)
-            logger.info("Reranker model loaded.")
-        except Exception as e:
-            logger.error(f"Failed to load Reranker: {e}")
-            self.model = None
-    def rerank(self, query: str, docs: List[Dict[str, Any]], top_k: int = 5) -> List[Dict[str, Any]]:
         """
-        Rerank a list of documents based on relevance to the query.
-        Args:
-            query: User question
-            docs: List of dicts, each must have a 'content' field (or 'description')
-            top_k: Number of results to return
-        Returns:
-            Top-K sorted documents with added 'score' field.
         """
         if not self.model or not docs:
             return docs[:top_k]
-        # Prepare pairs for Cross-Encoder: [[query, doc1], [query, doc2], ...]
-        # We assume 'description' or 'page_content' holds the text
-        pairs = []
-        valid_docs = []
-        for doc in docs:
-            # Handle LangChain Document object
-            if hasattr(doc, "page_content"):
-                text = doc.page_content
-            # Handle Dict
-            else:
-                text = doc.get("description") or doc.get("page_content") or str(doc)
-            pairs.append([query, text])
-            valid_docs.append(doc)
-        if not pairs:
-            return docs[:top_k]
-        # Predict scores
         scores = self.model.predict(pairs)
-        # Attach scores and sort
-        scored_results = []
-        for i, doc in enumerate(valid_docs):
-            score = float(scores[i])
-            if hasattr(doc, "metadata"):
-                # Handle Document
-                # Create a shallow copy to avoid mutating original if needed,
-                # but simplistic approach is fine here
-                doc.metadata["relevance_score"] = score
-                scored_results.append(doc)
             else:
-                # Handle Dict
-                doc_copy = doc.copy()
-                doc_copy["score"] = score
-                scored_results.append(doc_copy)
-        # Sort descending by score
-        # Sort descending by score
-        def get_score(doc):
-            if hasattr(doc, "metadata"):
-                return doc.metadata.get("relevance_score", 0)
-            return doc.get("score", 0)
-        scored_results.sort(key=get_score, reverse=True)
-        return scored_results[:top_k]
-# Global instance
 reranker = RerankerService()

+"""
+Reranker: Cross-Encoder (torch/ONNX) or ColBERT (optional).
+Backend selectable via RERANKER_BACKEND env: cross_encoder | onnx | colbert.
+ONNX ~2x faster than torch; ColBERT requires llama-index-postprocessor-colbert-rerank.
+"""
+from typing import List, Dict, Any
+from src.config import RERANKER_BACKEND
 from src.utils import setup_logger
 logger = setup_logger(__name__)
 DEFAULT_RERANKER_MODEL = "cross-encoder/ms-marco-MiniLM-L-6-v2"
+def _load_cross_encoder(backend: str):
+    """Load CrossEncoder with torch or ONNX backend. Falls back to torch if ONNX fails."""
+    from sentence_transformers import CrossEncoder
+    import torch
+    device = "mps" if torch.backends.mps.is_available() else "cpu"
+    be = "onnx" if backend == "onnx" else "torch"
+    try:
+        logger.info(f"Loading Reranker ({DEFAULT_RERANKER_MODEL}) backend={be} on {device}...")
+        model = CrossEncoder(DEFAULT_RERANKER_MODEL, device=device, backend=be)
+        logger.info("Reranker model loaded.")
+        return model
+    except Exception as e:
+        if be == "onnx":
+            logger.warning(f"ONNX backend failed (pip install onnxruntime?), falling back to torch: {e}")
+            return CrossEncoder(DEFAULT_RERANKER_MODEL, device=device, backend="torch")
+        raise
+def _load_colbert():
+    """Load ColBERT reranker via llama-index (optional dep)."""
+    try:
+        from llama_index.postprocessor.colbert_rerank import ColbertRerank
+        return ColbertRerank(
+            model_name="colbert-ir/colbertv2.0",
+            top_n=10,
+        )
+    except ImportError as e:
+        logger.warning(f"ColBERT not available (pip install llama-index-postprocessor-colbert-rerank): {e}")
+        return None
+def _get_text(doc: Any) -> str:
+    if hasattr(doc, "page_content"):
+        return doc.page_content
+    return doc.get("description") or doc.get("page_content") or str(doc)
+def _set_score(doc: Any, score: float) -> None:
+    if hasattr(doc, "metadata"):
+        doc.metadata["relevance_score"] = score
+    else:
+        doc["score"] = score
+def _get_score(doc: Any) -> float:
+    if hasattr(doc, "metadata"):
+        return doc.metadata.get("relevance_score", 0)
+    return doc.get("score", 0)
 class RerankerService:
     """
+    Singleton reranker: Cross-Encoder (torch/ONNX) or ColBERT.
     """
     _instance = None
     def __new__(cls):
         if cls._instance is None:
             cls._instance = super(RerankerService, cls).__new__(cls)
             cls._instance.model = None
+            cls._instance._backend = None
         return cls._instance
     def __init__(self):
         if self.model is None:
             self._load_model()
     def _load_model(self):
+        backend = (RERANKER_BACKEND or "").lower()
+        if backend == "colbert":
+            self.model = _load_colbert()
+            self._backend = "colbert" if self.model else "cross_encoder"
+            if self._backend == "cross_encoder":
+                self.model = _load_cross_encoder("torch")
+        else:
+            self._backend = "onnx" if backend == "onnx" else "cross_encoder"
+            self.model = _load_cross_encoder(self._backend)
+    def rerank(self, query: str, docs: List[Any], top_k: int = 5) -> List[Any]:
         """
+        Rerank documents by relevance to query.
+        docs: List of dicts or LangChain Document with description/page_content.
         """
         if not self.model or not docs:
             return docs[:top_k]
+        if self._backend == "colbert":
+            return self._rerank_colbert(query, docs, top_k)
+        return self._rerank_cross_encoder(query, docs, top_k)
+    def _rerank_cross_encoder(self, query: str, docs: List[Any], top_k: int) -> List[Any]:
+        pairs = [[query, _get_text(d)] for d in docs]
         scores = self.model.predict(pairs)
+        for i, doc in enumerate(docs):
+            _set_score(doc, float(scores[i]))
+        docs.sort(key=_get_score, reverse=True)
+        return docs[:top_k]
+    def _rerank_colbert(self, query: str, docs: List[Any], top_k: int) -> List[Any]:
+        from llama_index.schema import NodeWithScore, TextNode
+        # Keep ref to original doc for metadata (isbn, etc.)
+        nodes = []
+        for d in docs:
+            meta = d.metadata if hasattr(d, "metadata") else (d if isinstance(d, dict) else {})
+            node = TextNode(text=_get_text(d), metadata={"__original": d})
+            nodes.append(NodeWithScore(node=node, score=0.0))
+        reranked = self.model.postprocess_nodes(nodes, query_str=query)
+        result = []
+        for nws in reranked[:top_k]:
+            orig = getattr(nws.node, "metadata", {}).get("__original")
+            if orig is not None:
+                _set_score(orig, float(nws.score or 0))
+                result.append(orig)
             else:
+                from langchain_core.documents import Document
+                doc = Document(page_content=nws.node.text, metadata={"relevance_score": float(nws.score or 0)})
+                result.append(doc)
+        return result
 reranker = RerankerService()

src/core/router.py CHANGED Viewed

@@ -90,8 +90,12 @@ class QueryRouter:
         freshness_fallback: bool = False,
         target_year: Optional[int] = None
     ) -> Dict[str, Any]:
-        """Fallback: rule-based routing (original logic + freshness)."""
-        from src.config import ROUTER_DETAIL_KEYWORDS
         base_result = {
             "temporal": is_temporal,
@@ -103,10 +107,15 @@ class QueryRouter:
         if any(w.lower() in ROUTER_DETAIL_KEYWORDS for w in words):
             logger.info("Router (rules): Detail Query -> SMALL_TO_BIG")
             return {**base_result, "strategy": "small_to_big", "alpha": 0.5, "rerank": False, "k_final": 5}
-        if len(words) <= 2:
-            logger.info("Router (rules): Keyword -> FAST (Temporal=%s, Freshness=%s)", is_temporal, freshness_fallback)
             return {**base_result, "strategy": "fast", "alpha": 0.5, "rerank": False, "k_final": 5}
-        logger.info("Router (rules): Natural Language -> DEEP (Temporal=%s, Freshness=%s)", is_temporal, freshness_fallback)
         return {**base_result, "strategy": "deep", "alpha": 0.5, "rerank": True, "k_final": 10}
     def route(self, query: str) -> Dict[str, Any]:

         freshness_fallback: bool = False,
         target_year: Optional[int] = None
     ) -> Dict[str, Any]:
+        """
+        Fallback: rule-based routing when classifier not loaded.
+        Uses NL keywords (like, similar, recommend...) instead of brittle word-count.
+        Book titles (e.g. "War and Peace", "The Lord of the Rings") -> FAST.
+        """
+        from src.config import ROUTER_DETAIL_KEYWORDS, ROUTER_NL_KEYWORDS
         base_result = {
             "temporal": is_temporal,
         if any(w.lower() in ROUTER_DETAIL_KEYWORDS for w in words):
             logger.info("Router (rules): Detail Query -> SMALL_TO_BIG")
             return {**base_result, "strategy": "small_to_big", "alpha": 0.5, "rerank": False, "k_final": 5}
+        # NL keywords indicate recommendation intent -> DEEP
+        if any(w.lower() in ROUTER_NL_KEYWORDS for w in words):
+            logger.info("Router (rules): NL keywords -> DEEP (Temporal=%s, Freshness=%s)", is_temporal, freshness_fallback)
+            return {**base_result, "strategy": "deep", "alpha": 0.5, "rerank": True, "k_final": 10}
+        # Short query without NL keywords: book title or keyword -> FAST
+        if len(words) <= 6:
+            logger.info("Router (rules): Keyword/Title -> FAST (Temporal=%s, Freshness=%s)", is_temporal, freshness_fallback)
             return {**base_result, "strategy": "fast", "alpha": 0.5, "rerank": False, "k_final": 5}
+        logger.info("Router (rules): Long query -> DEEP (Temporal=%s, Freshness=%s)", is_temporal, freshness_fallback)
         return {**base_result, "strategy": "deep", "alpha": 0.5, "rerank": True, "k_final": 10}
     def route(self, query: str) -> Dict[str, Any]:

src/main.py CHANGED Viewed

@@ -99,6 +99,8 @@ class RecommendationRequest(BaseModel):
     category: str = "All"
     user_id: Optional[str] = "local"
     use_agentic: Optional[bool] = False  # LangGraph workflow: Router -> Retrieve -> Evaluate -> Web Fallback
 class FeatureContribution(BaseModel):
@@ -187,6 +189,8 @@ async def get_recommendations(request: RecommendationRequest):
             category=request.category,
             user_id=request.user_id if hasattr(request, 'user_id') else "local",
             use_agentic=request.use_agentic or False,
         )
         return {"recommendations": results}
     except Exception as e:
@@ -349,22 +353,58 @@ async def run_benchmark():
 # --- Personalized Recommendation API ---
 @app.get("/api/recommend/personal", response_model=RecommendationResponse)
-def personalized_recommendations(user_id: str = "local", top_k: int = 10):
     """
     Get personalized recommendations for a user.
     Uses 6-channel recall (ItemCF/UserCF/Swing/SASRec/YoutubeDNN/Popularity) + LGBMRanker.
     """
-    # Demo logic: Map 'local' to a real user for demonstration
-    if user_id in ["local", "demo"]:
-        # Pick a demo user ID from active users (A1ZQ1LUQ9R6JHZ is a heavy reader)
-        user_id = "A1ZQ1LUQ9R6JHZ"
     # Check initialization
     if not rec_service:
         raise HTTPException(status_code=503, detail="Service not ready")
     try:
-        recs = rec_service.get_recommendations(user_id, top_k)
         # Enrich with metadata
         from src.utils import enrich_book_metadata
@@ -430,6 +470,51 @@ def personalized_recommendations(user_id: str = "local", top_k: int = 10):
         # In production, maybe return fallback popular items instead of error
         raise HTTPException(status_code=500, detail=str(e))
 # Allow local frontend dev origins
 # Added LAST so it wraps the app outermost (first to process request)
 app.add_middleware(

     category: str = "All"
     user_id: Optional[str] = "local"
     use_agentic: Optional[bool] = False  # LangGraph workflow: Router -> Retrieve -> Evaluate -> Web Fallback
+    fast: Optional[bool] = False  # Skip rerank for ~150ms latency
+    async_rerank: Optional[bool] = False  # Return RRF first, rerank in background; next request gets cached
 class FeatureContribution(BaseModel):
             category=request.category,
             user_id=request.user_id if hasattr(request, 'user_id') else "local",
             use_agentic=request.use_agentic or False,
+            fast=request.fast or False,
+            async_rerank=request.async_rerank or False,
         )
         return {"recommendations": results}
     except Exception as e:
 # --- Personalized Recommendation API ---
 @app.get("/api/recommend/personal", response_model=RecommendationResponse)
+def personalized_recommendations(
+    user_id: str = "local",
+    top_k: int = 10,
+    limit: Optional[int] = None,
+    recent_isbns: Optional[str] = None,
+    intent_query: Optional[str] = None,
+):
     """
     Get personalized recommendations for a user.
     Uses 6-channel recall (ItemCF/UserCF/Swing/SASRec/YoutubeDNN/Popularity) + LGBMRanker.
+    P0: recent_isbns — Comma-separated ISBNs from current session (e.g. just-viewed).
+        Injected into SASRec for cold-start convergence (1+ clicks).
+    P2: intent_query — Zero-shot intent probing when user has no history.
+        Probes LLM for categories/keywords, does semantic search, seeds SASRec.
     """
+    k = limit if limit is not None else top_k
+    # Demo logic: Map 'local' to a real user for demonstration (skip when intent_query = cold-start)
+    if user_id in ["local", "demo"] and not intent_query:
+        user_id = "A1ZQ1LUQ9R6JHZ"
+    # P0: Parse recent_isbns for real-time cold-start
+    real_time_seq = None
+    if recent_isbns:
+        real_time_seq = [x.strip() for x in recent_isbns.split(",") if x.strip()]
+    # P2: Zero-shot intent probing — when no recent_isbns, use query to seed
+    if not real_time_seq and intent_query and intent_query.strip():
+        from src.core.intent_prober import probe_intent
+        intent = probe_intent(intent_query.strip())
+        semantic_query = " ".join(
+            intent.get("keywords", []) + intent.get("categories", []) + [intent.get("summary", "")]
+        ).strip()
+        if semantic_query and recommender:
+            try:
+                rag_results = recommender.get_recommendations_sync(
+                    semantic_query, category="All", tone="All", user_id=user_id
+                )
+                seed_isbns = [r.get("isbn") for r in (rag_results or [])[:5] if r.get("isbn")]
+                if seed_isbns:
+                    real_time_seq = seed_isbns
+            except Exception as e:
+                logger.warning(f"Intent-to-seed failed: {e}")
     # Check initialization
     if not rec_service:
         raise HTTPException(status_code=503, detail="Service not ready")
     try:
+        recs = rec_service.get_recommendations(
+            user_id, top_k=k, real_time_sequence=real_time_seq
+        )
         # Enrich with metadata
         from src.utils import enrich_book_metadata
         # In production, maybe return fallback popular items instead of error
         raise HTTPException(status_code=500, detail=str(e))
+@app.get("/api/intent/probe")
+def probe_intent_endpoint(query: str = ""):
+    """
+    P2: Zero-shot intent probing for cold-start users.
+    Returns inferred categories, emotions, keywords from user's first query.
+    """
+    from src.core.intent_prober import probe_intent
+    try:
+        result = probe_intent(query)
+        return result
+    except Exception as e:
+        logger.error(f"Intent probe failed: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
+@app.get("/api/onboarding/books")
+def get_onboarding_books(limit: int = 24):
+    """
+    P2: Return popular books for new-user onboarding.
+    Lets user pick 3–5 to seed preferences (cold-start).
+    """
+    if not rec_service:
+        raise HTTPException(status_code=503, detail="Service not ready")
+    try:
+        items = rec_service.get_popular_books(limit)
+        from src.utils import enrich_book_metadata
+        results = []
+        for isbn, meta in items:
+            meta = meta or {}
+            meta = enrich_book_metadata(meta, str(isbn))
+            results.append({
+                "isbn": isbn,
+                "title": meta.get("title") or f"ISBN: {isbn}",
+                "authors": meta.get("authors", "Unknown"),
+                "description": meta.get("description", ""),
+                "thumbnail": meta.get("thumbnail") or "/content/cover-not-found.jpg",
+                "category": meta.get("category", "General"),
+            })
+        return {"books": results}
+    except Exception as e:
+        logger.error(f"Error in onboarding books: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
 # Allow local frontend dev origins
 # Added LAST so it wraps the app outermost (first to process request)
 app.add_middleware(

src/recall/fusion.py CHANGED Viewed

@@ -20,7 +20,7 @@ DEFAULT_CHANNEL_CONFIG = {
     "usercf": {"enabled": False, "weight": 1.0},
     "swing": {"enabled": False, "weight": 1.0},
     "item2vec": {"enabled": False, "weight": 0.8},
-    "popularity": {"enabled": False, "weight": 0.5},
 }
@@ -123,6 +123,10 @@ class RecallFusion:
             self._add_to_candidates(candidates, recs, cfg["popularity"]["weight"])
         sorted_cands = sorted(candidates.items(), key=lambda x: x[1], reverse=True)
         return sorted_cands[:k]
     def _add_to_candidates(self, candidates, recs, weight: float) -> None:

     "usercf": {"enabled": False, "weight": 1.0},
     "swing": {"enabled": False, "weight": 1.0},
     "item2vec": {"enabled": False, "weight": 0.8},
+    "popularity": {"enabled": True, "weight": 0.5},  # P0: Cold-start fallback
 }
             self._add_to_candidates(candidates, recs, cfg["popularity"]["weight"])
         sorted_cands = sorted(candidates.items(), key=lambda x: x[1], reverse=True)
+        # P0: Cold-start fallback — when all channels return empty, use popularity
+        if not sorted_cands:
+            pop_recs = self.popularity.recommend(user_id, top_k=k)
+            sorted_cands = [(item, s) for item, s in pop_recs]
         return sorted_cands[:k]
     def _add_to_candidates(self, candidates, recs, weight: float) -> None:

src/recommender.py CHANGED Viewed

@@ -39,9 +39,11 @@ class BookRecommender:
         tone: str = "All",
         user_id: str = "local",
         use_agentic: bool = False,
     ) -> List[Dict[str, Any]]:
         return await self._orchestrator.get_recommendations(
-            query, category, tone, user_id, use_agentic
         )
     def get_recommendations_sync(
@@ -51,9 +53,11 @@ class BookRecommender:
         tone: str = "All",
         user_id: str = "local",
         use_agentic: bool = False,
     ) -> List[Dict[str, Any]]:
         return self._orchestrator.get_recommendations_sync(
-            query, category, tone, user_id, use_agentic
         )
     def get_similar_books(

         tone: str = "All",
         user_id: str = "local",
         use_agentic: bool = False,
+        fast: bool = False,
+        async_rerank: bool = False,
     ) -> List[Dict[str, Any]]:
         return await self._orchestrator.get_recommendations(
+            query, category, tone, user_id, use_agentic, fast, async_rerank
         )
     def get_recommendations_sync(
         tone: str = "All",
         user_id: str = "local",
         use_agentic: bool = False,
+        fast: bool = False,
+        async_rerank: bool = False,
     ) -> List[Dict[str, Any]]:
         return self._orchestrator.get_recommendations_sync(
+            query, category, tone, user_id, use_agentic, fast, async_rerank
         )
     def get_similar_books(

src/services/recommend_service.py CHANGED Viewed

@@ -155,6 +155,10 @@ class RecommendationService:
         candidates = self.fusion.get_recall_items(
             user_id, k=200, real_time_seq=real_time_sequence
         )
         if not candidates:
             return []
@@ -267,6 +271,27 @@ class RecommendationService:
         return unique_results
 if __name__ == "__main__":
     import logging
     logger.setLevel(logging.INFO)

         candidates = self.fusion.get_recall_items(
             user_id, k=200, real_time_seq=real_time_sequence
         )
+        # P1: Cold-start fallback — when recall returns empty, use popularity
+        if not candidates:
+            pop_recs = self.fusion.popularity.recommend(user_id, top_k=200)
+            candidates = list(pop_recs)
         if not candidates:
             return []
         return unique_results
+    def get_popular_books(self, limit: int = 24) -> list:
+        """
+        P2: Return popular books for onboarding selection.
+        Used when new user has no history — lets them pick 3–5 to seed preferences.
+        """
+        self.load_resources()
+        recs = self.fusion.popularity.recommend(user_id=None, top_k=limit)
+        results = []
+        seen_titles = set()
+        for isbn, _ in recs:
+            meta = self.metadata_store.get_book_metadata(str(isbn))
+            title = (meta.get("title") or "").lower().strip()
+            if title and title in seen_titles:
+                continue
+            if title:
+                seen_titles.add(title)
+            results.append((isbn, meta or {}))
+            if len(results) >= limit:
+                break
+        return results
 if __name__ == "__main__":
     import logging
     logger.setLevel(logging.INFO)

src/vector_db.py CHANGED Viewed

@@ -2,7 +2,7 @@ from typing import List, Any
 # Using community version to avoid 'BaseBlobParser' version conflict in langchain-chroma/core
 from langchain_community.vectorstores import Chroma
 from langchain_huggingface import HuggingFaceEmbeddings
-from src.config import REVIEW_HIGHLIGHTS_TXT, CHROMA_DB_DIR, EMBEDDING_MODEL
 from src.utils import setup_logger
 from src.core.metadata_store import metadata_store
 from src.core.online_books_store import online_books_store
@@ -220,8 +220,7 @@ class VectorDB:
         final_results = top_candidates[:k]
         if rerank:
             from src.core.reranker import reranker
-            # Rerank the top 20 (or more) candidates from fusion
-            rerank_candidates = top_candidates[:max(k*4, 20)]
             logger.info(f"Reranking top {len(rerank_candidates)} candidates...")
             final_results = reranker.rerank(query, rerank_candidates, top_k=k)

 # Using community version to avoid 'BaseBlobParser' version conflict in langchain-chroma/core
 from langchain_community.vectorstores import Chroma
 from langchain_huggingface import HuggingFaceEmbeddings
+from src.config import REVIEW_HIGHLIGHTS_TXT, CHROMA_DB_DIR, EMBEDDING_MODEL, RERANK_CANDIDATES_MAX
 from src.utils import setup_logger
 from src.core.metadata_store import metadata_store
 from src.core.online_books_store import online_books_store
         final_results = top_candidates[:k]
         if rerank:
             from src.core.reranker import reranker
+            rerank_candidates = top_candidates[:min(len(top_candidates), RERANK_CANDIDATES_MAX)]
             logger.info(f"Reranking top {len(rerank_candidates)} candidates...")
             final_results = reranker.rerank(query, rerank_candidates, top_k=k)

web/src/App.jsx CHANGED Viewed

@@ -19,6 +19,7 @@ import Header from "./components/Header";
 import BookDetailModal from "./components/BookDetailModal";
 import SettingsModal from "./components/SettingsModal";
 import AddBookModal from "./components/AddBookModal";
 // Pages
 import GalleryPage from "./pages/GalleryPage";
@@ -57,6 +58,13 @@ const App = () => {
     return stored === "mock" || !stored ? "ollama" : stored;
   });
   // --- Add Book Modal State ---
   const [showAddBook, setShowAddBook] = useState(false);
   const [googleQuery, setGoogleQuery] = useState("");
@@ -64,6 +72,14 @@ const App = () => {
   const [isSearching, setIsSearching] = useState(false);
   const [addingBookId, setAddingBookId] = useState(null);
   // --- Load favorites and stats on startup or user change ---
   useEffect(() => {
     setLoading(true);
@@ -78,11 +94,13 @@ const App = () => {
         reading: 0,
         finished: 0,
       })),
-      getPersonalizedRecommendations(userId).catch(() => []),
     ]).then(([favs, stats, personalRecs]) => {
       setMyCollection(favs);
       setReadingStats(stats);
       const mappedRecs = personalRecs.map((r, idx) => ({
         id: r.isbn,
         title: r.title,
@@ -283,6 +301,13 @@ const App = () => {
   };
   const openBook = (book) => {
     setSelectedBook({
       ...book,
       aiHighlight: "\u2728 ...",
@@ -319,8 +344,15 @@ const App = () => {
     setBooks([]);
     try {
       let recs;
-      if (!searchQuery) {
-        recs = await getPersonalizedRecommendations(userId);
       } else {
         recs = await recommend(searchQuery, searchCategory, searchMood, userId);
       }
@@ -384,6 +416,44 @@ const App = () => {
           />
         )}
         {showAddBook && (
           <AddBookModal
             onClose={() => setShowAddBook(false)}

 import BookDetailModal from "./components/BookDetailModal";
 import SettingsModal from "./components/SettingsModal";
 import AddBookModal from "./components/AddBookModal";
+import OnboardingModal from "./components/OnboardingModal";
 // Pages
 import GalleryPage from "./pages/GalleryPage";
     return stored === "mock" || !stored ? "ollama" : stored;
   });
+  // --- P1: Session-level recent ISBNs for cold-start ---
+  const [recentIsbns, setRecentIsbns] = useState([]);
+  const MAX_RECENT_ISBNS = 10;
+  // --- P2: Onboarding (new user, no collection) ---
+  const [showOnboarding, setShowOnboarding] = useState(false);
   // --- Add Book Modal State ---
   const [showAddBook, setShowAddBook] = useState(false);
   const [googleQuery, setGoogleQuery] = useState("");
   const [isSearching, setIsSearching] = useState(false);
   const [addingBookId, setAddingBookId] = useState(null);
+  // --- P2: Show onboarding when new user (no collection, not completed) ---
+  useEffect(() => {
+    const completed = localStorage.getItem("onboarding_complete") === "true";
+    if (!completed && userId === "local") {
+      setShowOnboarding(true);
+    }
+  }, [userId]);
   // --- Load favorites and stats on startup or user change ---
   useEffect(() => {
     setLoading(true);
         reading: 0,
         finished: 0,
       })),
+      getPersonalizedRecommendations(userId, 20, recentIsbns).catch(() => []),
     ]).then(([favs, stats, personalRecs]) => {
       setMyCollection(favs);
       setReadingStats(stats);
+      if (favs.length > 0) {
+        localStorage.setItem("onboarding_complete", "true");
+      }
       const mappedRecs = personalRecs.map((r, idx) => ({
         id: r.isbn,
         title: r.title,
   };
   const openBook = (book) => {
+    // P1: Track session-level recent views for cold-start
+    if (book?.isbn) {
+      setRecentIsbns((prev) => {
+        const next = [book.isbn, ...prev.filter((i) => i !== book.isbn)].slice(0, MAX_RECENT_ISBNS);
+        return next;
+      });
+    }
     setSelectedBook({
       ...book,
       aiHighlight: "\u2728 ...",
     setBooks([]);
     try {
       let recs;
+      // P2: Cold-start with intent — when no collection and user typed a mood, use intent-seeded personal recs
+      const useIntentSeed = myCollection.length === 0 && searchQuery.trim();
+      if (!searchQuery || useIntentSeed) {
+        recs = await getPersonalizedRecommendations(
+          userId,
+          20,
+          recentIsbns,
+          useIntentSeed ? searchQuery : null
+        );
       } else {
         recs = await recommend(searchQuery, searchCategory, searchMood, userId);
       }
           />
         )}
+        {showOnboarding && (
+          <OnboardingModal
+            onComplete={async () => {
+              setShowOnboarding(false);
+              const [favs, stats, personalRecs] = await Promise.all([
+                getFavorites(userId).catch(() => []),
+                getUserStats(userId).catch(() => ({ total: 0, want_to_read: 0, reading: 0, finished: 0 })),
+                getPersonalizedRecommendations(userId, 20, recentIsbns).catch(() => []),
+              ]);
+              setMyCollection(favs);
+              setReadingStats(stats);
+              const mapped = (personalRecs || []).map((r, idx) => ({
+                id: r.isbn,
+                title: r.title,
+                author: r.authors,
+                category: r.category || "General",
+                mood: r.emotions && Object.keys(r.emotions).length > 0
+                  ? Object.entries(r.emotions).reduce((a, b) => (a[1] > b[1] ? a : b))[0]
+                  : "Literary",
+                rank: idx + 1,
+                rating: r.average_rating || 0,
+                tags: r.tags || [],
+                review_highlights: r.review_highlights || [],
+                desc: r.description,
+                img: r.thumbnail,
+                isbn: r.isbn,
+                emotions: r.emotions || {},
+                explanations: r.explanations || [],
+                aiHighlight: "\u2014",
+                suggestedQuestions: ["Why was this recommended?", "Similar to what I've read?", "What's the core highlight?"],
+              }));
+              setBooks(mapped);
+            }}
+            onAddFavorite={(isbn) => addFavorite(isbn, userId)}
+            onSkip={() => setShowOnboarding(false)}
+          />
+        )}
         {showAddBook && (
           <AddBookModal
             onClose={() => setShowAddBook(false)}

web/src/api.js CHANGED Viewed

@@ -1,7 +1,7 @@
 const API_URL = import.meta.env.VITE_API_URL || (import.meta.env.PROD ? "" : "http://127.0.0.1:6006");
-export async function recommend(query, category = "All", tone = "All", user_id = "local", use_agentic = false) {
-  const body = { query, category, tone, user_id, use_agentic };
   const resp = await fetch(`${API_URL}/recommend`, {
     method: "POST",
     headers: { "Content-Type": "application/json" },
@@ -12,9 +12,23 @@ export async function recommend(query, category = "All", tone = "All", user_id =
   return data.recommendations || [];
 }
-export async function getPersonalizedRecommendations(user_id = "local", limit = 20) {
-  // Use URLSearchParams for query parameters
   const params = new URLSearchParams({ user_id, limit: limit.toString() });
   const resp = await fetch(`${API_URL}/api/recommend/personal?${params.toString()}`);
   if (!resp.ok) throw new Error(await resp.text());
   const data = await resp.json();

 const API_URL = import.meta.env.VITE_API_URL || (import.meta.env.PROD ? "" : "http://127.0.0.1:6006");
+export async function recommend(query, category = "All", tone = "All", user_id = "local", use_agentic = false, fast = false, async_rerank = false) {
+  const body = { query, category, tone, user_id, use_agentic, fast, async_rerank };
   const resp = await fetch(`${API_URL}/recommend`, {
     method: "POST",
     headers: { "Content-Type": "application/json" },
   return data.recommendations || [];
 }
+export async function getOnboardingBooks(limit = 24) {
+  const resp = await fetch(`${API_URL}/api/onboarding/books?limit=${limit}`);
+  if (!resp.ok) throw new Error(await resp.text());
+  const data = await resp.json();
+  return data.books || [];
+}
+export async function getPersonalizedRecommendations(user_id = "local", limit = 20, recent_isbns = null, intent_query = null) {
+  // P1: recent_isbns — session-level ISBNs for cold-start (1+ clicks)
+  // P2: intent_query — zero-shot intent probing when user has no history
   const params = new URLSearchParams({ user_id, limit: limit.toString() });
+  if (recent_isbns && Array.isArray(recent_isbns) && recent_isbns.length > 0) {
+    params.set("recent_isbns", recent_isbns.join(","));
+  }
+  if (intent_query && typeof intent_query === "string" && intent_query.trim()) {
+    params.set("intent_query", intent_query.trim());
+  }
   const resp = await fetch(`${API_URL}/api/recommend/personal?${params.toString()}`);
   if (!resp.ok) throw new Error(await resp.text());
   const data = await resp.json();

web/src/components/OnboardingModal.jsx ADDED Viewed

	@@ -0,0 +1,137 @@

+/**
+ * P2: New-user onboarding — pick 3–5 books to seed preferences.
+ * Shown when myCollection is empty and onboarding not completed.
+ */
+import React, { useState, useEffect } from "react";
+import { getOnboardingBooks } from "../api";
+const PLACEHOLDER_IMG = "/content/cover-not-found.jpg";
+const MIN_SELECT = 3;
+const MAX_SELECT = 5;
+const OnboardingModal = ({ onComplete, onAddFavorite, onSkip }) => {
+  const [books, setBooks] = useState([]);
+  const [selected, setSelected] = useState(new Set());
+  const [loading, setLoading] = useState(true);
+  const [error, setError] = useState("");
+  useEffect(() => {
+    getOnboardingBooks(24)
+      .then(setBooks)
+      .catch((e) => setError(e.message))
+      .finally(() => setLoading(false));
+  }, []);
+  const toggle = (isbn) => {
+    setSelected((prev) => {
+      const next = new Set(prev);
+      if (next.has(isbn)) {
+        next.delete(isbn);
+      } else if (next.size < MAX_SELECT) {
+        next.add(isbn);
+      }
+      return next;
+    });
+  };
+  const handleComplete = async () => {
+    if (selected.size < MIN_SELECT) return;
+    try {
+      for (const isbn of selected) {
+        await onAddFavorite(isbn);
+      }
+      localStorage.setItem("onboarding_complete", "true");
+      onComplete();
+    } catch (e) {
+      setError(e.message);
+    }
+  };
+  const canComplete = selected.size >= MIN_SELECT;
+  return (
+    <div className="fixed inset-0 z-50 flex items-center justify-center bg-black/50 p-4">
+      <div className="bg-white max-w-3xl w-full max-h-[90vh] overflow-hidden shadow-xl">
+        <div className="p-6 border-b border-[#eee]">
+          <h2 className="text-xl font-bold text-[#333]">Welcome — Pick Your Favorites</h2>
+          <p className="text-sm text-gray-500 mt-1">
+            Select 3–5 books you like to get personalized recommendations.
+          </p>
+        </div>
+        <div className="p-6 overflow-y-auto max-h-[50vh]">
+          {loading && (
+            <div className="text-center text-gray-400 py-8">Loading popular books...</div>
+          )}
+          {error && (
+            <div className="text-center text-red-500 py-4 text-sm">{error}</div>
+          )}
+          {!loading && !error && (
+            <div className="grid grid-cols-3 md:grid-cols-4 gap-4">
+              {books.map((book) => {
+                const isSelected = selected.has(book.isbn);
+                return (
+                  <button
+                    key={book.isbn}
+                    type="button"
+                    onClick={() => toggle(book.isbn)}
+                    className={`text-left border-2 transition-all p-2 ${
+                      isSelected ? "border-[#b392ac] bg-[#faf5f7]" : "border-[#eee] hover:border-[#ddd]"
+                    }`}
+                  >
+                    <div className="aspect-[3/4] bg-gray-100 mb-2 overflow-hidden">
+                      <img
+                        src={book.thumbnail || PLACEHOLDER_IMG}
+                        alt={book.title}
+                        className="w-full h-full object-cover"
+                        onError={(e) => {
+                          e.target.onerror = null;
+                          e.target.src = PLACEHOLDER_IMG;
+                        }}
+                      />
+                    </div>
+                    <p className="text-[10px] font-bold text-[#555] truncate" title={book.title}>
+                      {book.title}
+                    </p>
+                    {isSelected && (
+                      <span className="text-[10px] text-[#b392ac] font-bold">✓ Selected</span>
+                    )}
+                  </button>
+                );
+              })}
+            </div>
+          )}
+        </div>
+        <div className="p-6 border-t border-[#eee] flex justify-between items-center">
+          <span className="text-xs text-gray-500">
+            {selected.size} selected (min {MIN_SELECT}, max {MAX_SELECT})
+          </span>
+          <div className="flex gap-2">
+            {onSkip && (
+              <button
+                type="button"
+                onClick={() => {
+                  localStorage.setItem("onboarding_complete", "true");
+                  onSkip();
+                }}
+                className="px-4 py-2 text-sm text-gray-500 hover:text-gray-700"
+              >
+                Skip for now
+              </button>
+            )}
+          <button
+            onClick={handleComplete}
+            disabled={!canComplete}
+            className={`px-6 py-2 text-sm font-bold ${
+              canComplete ? "bg-[#b392ac] text-white" : "bg-gray-200 text-gray-400 cursor-not-allowed"
+            }`}
+          >
+            Start Exploring
+          </button>
+          </div>
+        </div>
+      </div>
+    </div>
+  );
+};
+export default OnboardingModal;