Spaces:
Running
Running
feat: enhance recommendation system with improved routing, latency optimizations, and onboarding features
Browse files- CHANGELOG.md +19 -1
- benchmarks/benchmark.py +98 -16
- benchmarks/locustfile.py +48 -0
- benchmarks/results.md +61 -0
- benchmarks/test_concurrent_benchmark.py +73 -0
- docs/LATENCY_OPTIMIZATION.md +130 -0
- docs/interview_guide.md +34 -4
- requirements.txt +4 -0
- scripts/model/train_intent_router.py +20 -2
- src/config.py +13 -0
- src/core/intent_prober.py +112 -0
- src/core/recommendation_orchestrator.py +30 -5
- src/core/reranker.py +118 -77
- src/core/router.py +14 -5
- src/main.py +93 -8
- src/recall/fusion.py +5 -1
- src/recommender.py +6 -2
- src/services/recommend_service.py +25 -0
- src/vector_db.py +2 -3
- web/src/App.jsx +74 -4
- web/src/api.js +18 -4
- web/src/components/OnboardingModal.jsx +137 -0
CHANGELOG.md
CHANGED
|
@@ -11,7 +11,25 @@ All notable changes to this project will be documented in this file.
|
|
| 11 |
|
| 12 |
## [Unreleased]
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
### Added - 2026-01-29 (Frontend Refactor: React Router SPA)
|
| 17 |
- **React Router SPA**: Refactored monolithic 960-line `App.jsx` into React Router architecture with 3 route pages and 5 reusable components.
|
|
|
|
| 11 |
|
| 12 |
## [Unreleased]
|
| 13 |
|
| 14 |
+
### Fixed - Router heuristic fragility (intent_classifier)
|
| 15 |
+
- **Model-based routing**: Trained `intent_classifier.pkl` (TF-IDF + LogisticRegression) with book title examples; router now uses model when available.
|
| 16 |
+
- **SEED_DATA extended**: Added `War and Peace`, `The Lord of the Rings`, `Harry Potter`, `1984`, etc. (fast) and `books like War and Peace`, `similar to The Lord of the Rings` (deep) so model distinguishes book titles from recommendation-style queries.
|
| 17 |
+
- **Fallback rules improved**: Replaced brittle `len(words) <= 2` with NL keyword detection (`ROUTER_NL_KEYWORDS`: like, similar, recommend, want, looking, ...). Short queries (≤6 words) without NL keywords → FAST; queries with NL keywords → DEEP.
|
| 18 |
+
- **Config**: `natural_language_keywords` in router config; `ROUTER_NL_KEYWORDS` in `src/config.py`.
|
| 19 |
+
|
| 20 |
+
### Added - Latency Optimizations (LATENCY_OPTIMIZATION.md)
|
| 21 |
+
- **1. 裁剪候选集**: `RERANK_CANDIDATES_MAX=20` (env overridable); rerank top 20 instead of 50.
|
| 22 |
+
- **2. ColBERT**: `RERANKER_BACKEND=colbert`; optional `llama-index-postprocessor-colbert-rerank`.
|
| 23 |
+
- **3. Rerank 异步化**: `fast=true` skips rerank (~150ms); `async_rerank=true` returns RRF first, reranks in background, next request gets cached reranked.
|
| 24 |
+
- **4. ONNX 量化**: `RERANKER_BACKEND=onnx` (default); `onnxruntime` for ~2x CrossEncoder speedup.
|
| 25 |
+
- API: `POST /recommend` accepts `fast`, `async_rerank`; `web/src/api.js` updated.
|
| 26 |
+
|
| 27 |
+
### Added - Cold-Start Optimizations (P0–P2)
|
| 28 |
+
- **P0**: Popularity fallback — enabled Popularity channel by default; `RecallFusion` and `RecommendationService` fallback to popular books when all recall channels return empty.
|
| 29 |
+
- **P0**: `recent_isbns` API param — `/api/recommend/personal` accepts comma-separated ISBNs from current session; injected into SASRec for 1-click cold-start convergence.
|
| 30 |
+
- **P1**: Frontend passes `recent_isbns` — session-level tracking of viewed books; passed to personalized API on Start Discovery.
|
| 31 |
+
- **P2**: Onboarding flow — `OnboardingModal` when new user (no collection); pick 3–5 books from popular list to seed preferences; `GET /api/onboarding/books`.
|
| 32 |
+
- **P2**: Zero-shot intent probing — `src/core/intent_prober.py` uses LLM to infer categories/emotions/keywords from user query; `GET /api/intent/probe`; `intent_query` param on personal API seeds SASRec via semantic search when user has no history.
|
| 33 |
|
| 34 |
### Added - 2026-01-29 (Frontend Refactor: React Router SPA)
|
| 35 |
- **React Router SPA**: Refactored monolithic 960-line `App.jsx` into React Router architecture with 3 route pages and 5 reusable components.
|
benchmarks/benchmark.py
CHANGED
|
@@ -4,15 +4,23 @@ Performance Benchmark Script for Book Recommender System
|
|
| 4 |
This script measures:
|
| 5 |
1. Vector search latency
|
| 6 |
2. End-to-end recommendation latency
|
| 7 |
-
3. Throughput (queries per second)
|
|
|
|
| 8 |
|
| 9 |
Usage:
|
| 10 |
python benchmarks/benchmark.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
"""
|
| 12 |
|
|
|
|
| 13 |
import sys
|
| 14 |
import time
|
| 15 |
import statistics
|
|
|
|
| 16 |
from pathlib import Path
|
| 17 |
|
| 18 |
# Add project root to path
|
|
@@ -82,11 +90,11 @@ def benchmark_full_recommendation(recommender: BookRecommender, n_runs: int = 30
|
|
| 82 |
|
| 83 |
|
| 84 |
def benchmark_throughput(recommender: BookRecommender, duration_sec: int = 10) -> dict:
|
| 85 |
-
"""Measure queries per second over a time window."""
|
| 86 |
query_count = 0
|
| 87 |
start = time.perf_counter()
|
| 88 |
query_idx = 0
|
| 89 |
-
|
| 90 |
while (time.perf_counter() - start) < duration_sec:
|
| 91 |
recommender.get_recommendations_sync(
|
| 92 |
TEST_QUERIES[query_idx % len(TEST_QUERIES)],
|
|
@@ -95,17 +103,63 @@ def benchmark_throughput(recommender: BookRecommender, duration_sec: int = 10) -
|
|
| 95 |
)
|
| 96 |
query_count += 1
|
| 97 |
query_idx += 1
|
| 98 |
-
|
| 99 |
elapsed = time.perf_counter() - start
|
| 100 |
-
|
| 101 |
return {
|
| 102 |
-
"operation": "Throughput Test",
|
| 103 |
"duration_sec": round(elapsed, 2),
|
| 104 |
"total_queries": query_count,
|
| 105 |
"qps": round(query_count / elapsed, 2),
|
| 106 |
}
|
| 107 |
|
| 108 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
def print_results(results: list[dict]):
|
| 110 |
"""Print benchmark results in a formatted table."""
|
| 111 |
print("\n" + "=" * 70)
|
|
@@ -146,37 +200,65 @@ def save_results(results: list[dict], filepath: str = "benchmarks/results.md"):
|
|
| 146 |
f.write("## Interpretation\n\n")
|
| 147 |
f.write("- **Vector Search**: Time to query ChromaDB and retrieve top-k results\n")
|
| 148 |
f.write("- **Full Recommendation**: End-to-end latency including filtering and formatting\n")
|
| 149 |
-
f.write("- **Throughput**: Sustained
|
|
|
|
| 150 |
|
| 151 |
print(f"\n✅ Results saved to {filepath}")
|
| 152 |
|
| 153 |
|
| 154 |
def main():
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
print("🚀 Initializing Book Recommender System...")
|
| 156 |
print(" (This may take a moment to load models and vector database)")
|
| 157 |
-
|
| 158 |
try:
|
| 159 |
recommender = BookRecommender()
|
| 160 |
except Exception as e:
|
| 161 |
print(f"❌ Failed to initialize: {e}")
|
| 162 |
return
|
| 163 |
-
|
| 164 |
print("✅ System initialized. Starting benchmarks...\n")
|
| 165 |
-
|
| 166 |
results = []
|
| 167 |
-
|
| 168 |
# Benchmark 1: Vector Search
|
| 169 |
print("📊 Running Vector Search benchmark...")
|
| 170 |
results.append(benchmark_vector_search(recommender.vector_db))
|
| 171 |
-
|
| 172 |
# Benchmark 2: Full Recommendation
|
| 173 |
print("📊 Running Full Recommendation benchmark...")
|
| 174 |
results.append(benchmark_full_recommendation(recommender))
|
| 175 |
-
|
| 176 |
-
# Benchmark 3: Throughput
|
| 177 |
-
print("📊 Running Throughput benchmark (10 seconds)...")
|
| 178 |
results.append(benchmark_throughput(recommender))
|
| 179 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 180 |
# Print and save results
|
| 181 |
print_results(results)
|
| 182 |
save_results(results)
|
|
|
|
| 4 |
This script measures:
|
| 5 |
1. Vector search latency
|
| 6 |
2. End-to-end recommendation latency
|
| 7 |
+
3. Throughput (queries per second, sequential)
|
| 8 |
+
4. Concurrent throughput (QPS under N parallel workers)
|
| 9 |
|
| 10 |
Usage:
|
| 11 |
python benchmarks/benchmark.py
|
| 12 |
+
python benchmarks/benchmark.py --concurrent 5 # 5 concurrent workers
|
| 13 |
+
|
| 14 |
+
Note: For HTTP-level load testing (simulating real users), use Locust:
|
| 15 |
+
pip install locust
|
| 16 |
+
locust -f benchmarks/locustfile.py --host=http://localhost:8000
|
| 17 |
"""
|
| 18 |
|
| 19 |
+
import argparse
|
| 20 |
import sys
|
| 21 |
import time
|
| 22 |
import statistics
|
| 23 |
+
from concurrent.futures import ThreadPoolExecutor, as_completed
|
| 24 |
from pathlib import Path
|
| 25 |
|
| 26 |
# Add project root to path
|
|
|
|
| 90 |
|
| 91 |
|
| 92 |
def benchmark_throughput(recommender: BookRecommender, duration_sec: int = 10) -> dict:
|
| 93 |
+
"""Measure queries per second over a time window (sequential)."""
|
| 94 |
query_count = 0
|
| 95 |
start = time.perf_counter()
|
| 96 |
query_idx = 0
|
| 97 |
+
|
| 98 |
while (time.perf_counter() - start) < duration_sec:
|
| 99 |
recommender.get_recommendations_sync(
|
| 100 |
TEST_QUERIES[query_idx % len(TEST_QUERIES)],
|
|
|
|
| 103 |
)
|
| 104 |
query_count += 1
|
| 105 |
query_idx += 1
|
| 106 |
+
|
| 107 |
elapsed = time.perf_counter() - start
|
| 108 |
+
|
| 109 |
return {
|
| 110 |
+
"operation": "Throughput Test (sequential)",
|
| 111 |
"duration_sec": round(elapsed, 2),
|
| 112 |
"total_queries": query_count,
|
| 113 |
"qps": round(query_count / elapsed, 2),
|
| 114 |
}
|
| 115 |
|
| 116 |
|
| 117 |
+
def _run_one_query(recommender: BookRecommender, query: str) -> tuple[float, int]:
|
| 118 |
+
"""Run a single recommendation and return (latency_ms, 1)."""
|
| 119 |
+
start = time.perf_counter()
|
| 120 |
+
recommender.get_recommendations_sync(query, category="All", tone="All")
|
| 121 |
+
return (time.perf_counter() - start) * 1000, 1
|
| 122 |
+
|
| 123 |
+
|
| 124 |
+
def benchmark_concurrent(
|
| 125 |
+
recommender: BookRecommender,
|
| 126 |
+
n_workers: int = 5,
|
| 127 |
+
total_queries: int = 50,
|
| 128 |
+
) -> dict:
|
| 129 |
+
"""
|
| 130 |
+
Measure throughput under concurrent load using ThreadPoolExecutor.
|
| 131 |
+
|
| 132 |
+
Simulates N parallel clients to expose:
|
| 133 |
+
- VectorDB connection/query limits under load
|
| 134 |
+
- GIL contention if CPU-bound (embedding, rerank)
|
| 135 |
+
- I/O blocking in ChromaDB / LLM calls
|
| 136 |
+
"""
|
| 137 |
+
queries = [TEST_QUERIES[i % len(TEST_QUERIES)] for i in range(total_queries)]
|
| 138 |
+
latencies: list[float] = []
|
| 139 |
+
start = time.perf_counter()
|
| 140 |
+
|
| 141 |
+
with ThreadPoolExecutor(max_workers=n_workers) as executor:
|
| 142 |
+
futures = [
|
| 143 |
+
executor.submit(_run_one_query, recommender, q) for q in queries
|
| 144 |
+
]
|
| 145 |
+
for future in as_completed(futures):
|
| 146 |
+
lat_ms, _ = future.result()
|
| 147 |
+
latencies.append(lat_ms)
|
| 148 |
+
|
| 149 |
+
wall_sec = time.perf_counter() - start
|
| 150 |
+
|
| 151 |
+
return {
|
| 152 |
+
"operation": f"Concurrent Throughput ({n_workers} workers)",
|
| 153 |
+
"workers": n_workers,
|
| 154 |
+
"total_queries": total_queries,
|
| 155 |
+
"wall_sec": round(wall_sec, 2),
|
| 156 |
+
"qps": round(total_queries / wall_sec, 2),
|
| 157 |
+
"mean_latency_ms": round(statistics.mean(latencies), 2),
|
| 158 |
+
"median_latency_ms": round(statistics.median(latencies), 2),
|
| 159 |
+
"p95_latency_ms": round(sorted(latencies)[int(len(latencies) * 0.95)], 2),
|
| 160 |
+
}
|
| 161 |
+
|
| 162 |
+
|
| 163 |
def print_results(results: list[dict]):
|
| 164 |
"""Print benchmark results in a formatted table."""
|
| 165 |
print("\n" + "=" * 70)
|
|
|
|
| 200 |
f.write("## Interpretation\n\n")
|
| 201 |
f.write("- **Vector Search**: Time to query ChromaDB and retrieve top-k results\n")
|
| 202 |
f.write("- **Full Recommendation**: End-to-end latency including filtering and formatting\n")
|
| 203 |
+
f.write("- **Throughput (sequential)**: Sustained QPS when processing one query at a time\n")
|
| 204 |
+
f.write("- **Concurrent Throughput**: QPS under N parallel workers; exposes GIL/IO bottlenecks\n")
|
| 205 |
|
| 206 |
print(f"\n✅ Results saved to {filepath}")
|
| 207 |
|
| 208 |
|
| 209 |
def main():
|
| 210 |
+
parser = argparse.ArgumentParser(description="Benchmark Book Recommender System")
|
| 211 |
+
parser.add_argument(
|
| 212 |
+
"--concurrent",
|
| 213 |
+
type=int,
|
| 214 |
+
default=0,
|
| 215 |
+
metavar="N",
|
| 216 |
+
help="Add concurrent benchmark with N workers (e.g. 5). 0 = skip.",
|
| 217 |
+
)
|
| 218 |
+
parser.add_argument(
|
| 219 |
+
"--concurrent-queries",
|
| 220 |
+
type=int,
|
| 221 |
+
default=50,
|
| 222 |
+
help="Total queries for concurrent benchmark (default: 50)",
|
| 223 |
+
)
|
| 224 |
+
args = parser.parse_args()
|
| 225 |
+
|
| 226 |
print("🚀 Initializing Book Recommender System...")
|
| 227 |
print(" (This may take a moment to load models and vector database)")
|
| 228 |
+
|
| 229 |
try:
|
| 230 |
recommender = BookRecommender()
|
| 231 |
except Exception as e:
|
| 232 |
print(f"❌ Failed to initialize: {e}")
|
| 233 |
return
|
| 234 |
+
|
| 235 |
print("✅ System initialized. Starting benchmarks...\n")
|
| 236 |
+
|
| 237 |
results = []
|
| 238 |
+
|
| 239 |
# Benchmark 1: Vector Search
|
| 240 |
print("📊 Running Vector Search benchmark...")
|
| 241 |
results.append(benchmark_vector_search(recommender.vector_db))
|
| 242 |
+
|
| 243 |
# Benchmark 2: Full Recommendation
|
| 244 |
print("📊 Running Full Recommendation benchmark...")
|
| 245 |
results.append(benchmark_full_recommendation(recommender))
|
| 246 |
+
|
| 247 |
+
# Benchmark 3: Sequential Throughput
|
| 248 |
+
print("📊 Running Sequential Throughput benchmark (10 seconds)...")
|
| 249 |
results.append(benchmark_throughput(recommender))
|
| 250 |
+
|
| 251 |
+
# Benchmark 4: Concurrent Throughput (optional)
|
| 252 |
+
if args.concurrent > 0:
|
| 253 |
+
print(f"📊 Running Concurrent Throughput ({args.concurrent} workers, {args.concurrent_queries} queries)...")
|
| 254 |
+
results.append(
|
| 255 |
+
benchmark_concurrent(
|
| 256 |
+
recommender,
|
| 257 |
+
n_workers=args.concurrent,
|
| 258 |
+
total_queries=args.concurrent_queries,
|
| 259 |
+
)
|
| 260 |
+
)
|
| 261 |
+
|
| 262 |
# Print and save results
|
| 263 |
print_results(results)
|
| 264 |
save_results(results)
|
benchmarks/locustfile.py
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Locust load test for Book Recommender API.
|
| 3 |
+
|
| 4 |
+
Simulates concurrent HTTP requests to measure real-world throughput.
|
| 5 |
+
Run API server first, then:
|
| 6 |
+
|
| 7 |
+
pip install locust
|
| 8 |
+
locust -f benchmarks/locustfile.py --host=http://localhost:8000
|
| 9 |
+
|
| 10 |
+
Then open http://localhost:8089 to drive the load test.
|
| 11 |
+
"""
|
| 12 |
+
|
| 13 |
+
import random
|
| 14 |
+
from locust import HttpUser, task, between
|
| 15 |
+
|
| 16 |
+
# Mirror TEST_QUERIES from benchmark.py for consistency
|
| 17 |
+
TEST_QUERIES = [
|
| 18 |
+
"a romantic comedy set in New York",
|
| 19 |
+
"a philosophical novel about the meaning of life",
|
| 20 |
+
"a fast-paced thriller with plot twists",
|
| 21 |
+
"a coming-of-age story about friendship and loss",
|
| 22 |
+
"a historical fiction set during World War II",
|
| 23 |
+
"a science fiction story about space exploration",
|
| 24 |
+
"a mystery novel with an unreliable narrator",
|
| 25 |
+
"a fantasy epic with dragons and magic",
|
| 26 |
+
"a memoir about overcoming adversity",
|
| 27 |
+
"a literary fiction exploring family dynamics",
|
| 28 |
+
]
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
class RecommenderUser(HttpUser):
|
| 32 |
+
"""Simulates a user hitting the recommendation API."""
|
| 33 |
+
|
| 34 |
+
wait_time = between(0.5, 2.0) # 0.5–2s between requests
|
| 35 |
+
|
| 36 |
+
@task(10)
|
| 37 |
+
def recommend(self):
|
| 38 |
+
"""Primary: POST /recommend."""
|
| 39 |
+
q = random.choice(TEST_QUERIES)
|
| 40 |
+
self.client.post(
|
| 41 |
+
"/recommend",
|
| 42 |
+
json={"query": q, "category": "All"},
|
| 43 |
+
)
|
| 44 |
+
|
| 45 |
+
@task(1)
|
| 46 |
+
def health(self):
|
| 47 |
+
"""Occasional health check."""
|
| 48 |
+
self.client.get("/health")
|
benchmarks/results.md
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Performance Benchmark Results
|
| 2 |
+
|
| 3 |
+
**Date**: 2026-02-12 01:02:27
|
| 4 |
+
|
| 5 |
+
## System Info
|
| 6 |
+
- Dataset: 5,000+ books
|
| 7 |
+
- Embedding Model: all-MiniLM-L6-v2 (384 dim)
|
| 8 |
+
- Vector DB: ChromaDB with HNSW index
|
| 9 |
+
|
| 10 |
+
## Results
|
| 11 |
+
|
| 12 |
+
### Vector Search (k=50)
|
| 13 |
+
|
| 14 |
+
| Metric | Value |
|
| 15 |
+
|--------|-------|
|
| 16 |
+
| runs | 50 |
|
| 17 |
+
| mean_ms | 11.49 |
|
| 18 |
+
| median_ms | 6.43 |
|
| 19 |
+
| std_ms | 27.41 |
|
| 20 |
+
| min_ms | 5.49 |
|
| 21 |
+
| max_ms | 200.46 |
|
| 22 |
+
| p95_ms | 15.51 |
|
| 23 |
+
|
| 24 |
+
### Full Recommendation
|
| 25 |
+
|
| 26 |
+
| Metric | Value |
|
| 27 |
+
|--------|-------|
|
| 28 |
+
| runs | 30 |
|
| 29 |
+
| mean_ms | 3876.27 |
|
| 30 |
+
| median_ms | 260.87 |
|
| 31 |
+
| std_ms | 5445.93 |
|
| 32 |
+
| min_ms | 14.54 |
|
| 33 |
+
| max_ms | 16609.18 |
|
| 34 |
+
| p95_ms | 11694.41 |
|
| 35 |
+
|
| 36 |
+
### Throughput Test (sequential)
|
| 37 |
+
|
| 38 |
+
| Metric | Value |
|
| 39 |
+
|--------|-------|
|
| 40 |
+
| duration_sec | 10.1 |
|
| 41 |
+
| total_queries | 89 |
|
| 42 |
+
| qps | 8.81 |
|
| 43 |
+
|
| 44 |
+
### Concurrent Throughput (3 workers)
|
| 45 |
+
|
| 46 |
+
| Metric | Value |
|
| 47 |
+
|--------|-------|
|
| 48 |
+
| workers | 3 |
|
| 49 |
+
| total_queries | 12 |
|
| 50 |
+
| wall_sec | 1.29 |
|
| 51 |
+
| qps | 9.28 |
|
| 52 |
+
| mean_latency_ms | 298.3 |
|
| 53 |
+
| median_latency_ms | 370.19 |
|
| 54 |
+
| p95_latency_ms | 579.95 |
|
| 55 |
+
|
| 56 |
+
## Interpretation
|
| 57 |
+
|
| 58 |
+
- **Vector Search**: Time to query ChromaDB and retrieve top-k results
|
| 59 |
+
- **Full Recommendation**: End-to-end latency including filtering and formatting
|
| 60 |
+
- **Throughput (sequential)**: Sustained QPS when processing one query at a time
|
| 61 |
+
- **Concurrent Throughput**: QPS under N parallel workers; exposes GIL/IO bottlenecks
|
benchmarks/test_concurrent_benchmark.py
ADDED
|
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Quick test for concurrent benchmark logic without loading full recommender.
|
| 3 |
+
Run: python benchmarks/test_concurrent_benchmark.py
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import sys
|
| 7 |
+
import time
|
| 8 |
+
import statistics
|
| 9 |
+
from concurrent.futures import ThreadPoolExecutor, as_completed
|
| 10 |
+
from pathlib import Path
|
| 11 |
+
|
| 12 |
+
sys.path.insert(0, str(Path(__file__).parent.parent))
|
| 13 |
+
|
| 14 |
+
# Mock recommender that simulates ~100ms latency
|
| 15 |
+
class MockRecommender:
|
| 16 |
+
def get_recommendations_sync(self, query: str, category: str = "All", tone: str = "All"):
|
| 17 |
+
time.sleep(0.1)
|
| 18 |
+
return [{"title": "Mock Book", "isbn": "123"}]
|
| 19 |
+
|
| 20 |
+
TEST_QUERIES = ["query A", "query B", "query C"]
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
def _run_one_query(recommender, query: str) -> tuple[float, int]:
|
| 24 |
+
start = time.perf_counter()
|
| 25 |
+
recommender.get_recommendations_sync(query, category="All", tone="All")
|
| 26 |
+
return (time.perf_counter() - start) * 1000, 1
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
def benchmark_concurrent(recommender, n_workers: int = 5, total_queries: int = 15) -> dict:
|
| 30 |
+
queries = [TEST_QUERIES[i % len(TEST_QUERIES)] for i in range(total_queries)]
|
| 31 |
+
latencies = []
|
| 32 |
+
start = time.perf_counter()
|
| 33 |
+
|
| 34 |
+
with ThreadPoolExecutor(max_workers=n_workers) as executor:
|
| 35 |
+
futures = [executor.submit(_run_one_query, recommender, q) for q in queries]
|
| 36 |
+
for future in as_completed(futures):
|
| 37 |
+
lat_ms, _ = future.result()
|
| 38 |
+
latencies.append(lat_ms)
|
| 39 |
+
|
| 40 |
+
wall_sec = time.perf_counter() - start
|
| 41 |
+
|
| 42 |
+
return {
|
| 43 |
+
"operation": f"Concurrent ({n_workers} workers)",
|
| 44 |
+
"workers": n_workers,
|
| 45 |
+
"total_queries": total_queries,
|
| 46 |
+
"wall_sec": round(wall_sec, 2),
|
| 47 |
+
"qps": round(total_queries / wall_sec, 2),
|
| 48 |
+
"mean_latency_ms": round(statistics.mean(latencies), 2),
|
| 49 |
+
}
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
def main():
|
| 53 |
+
mock = MockRecommender()
|
| 54 |
+
|
| 55 |
+
# Sequential: 15 * 100ms = ~1.5s
|
| 56 |
+
print("Sequential (1 worker):")
|
| 57 |
+
r1 = benchmark_concurrent(mock, n_workers=1, total_queries=15)
|
| 58 |
+
print(f" wall_sec={r1['wall_sec']}, qps={r1['qps']}, mean_ms={r1['mean_latency_ms']}")
|
| 59 |
+
|
| 60 |
+
# Concurrent: 15 queries with 5 workers -> ~3 batches of 5 -> ~300ms
|
| 61 |
+
print("\nConcurrent (5 workers):")
|
| 62 |
+
r5 = benchmark_concurrent(mock, n_workers=5, total_queries=15)
|
| 63 |
+
print(f" wall_sec={r5['wall_sec']}, qps={r5['qps']}, mean_ms={r5['mean_latency_ms']}")
|
| 64 |
+
|
| 65 |
+
# Concurrency should give ~5x speedup
|
| 66 |
+
speedup = r1["wall_sec"] / r5["wall_sec"]
|
| 67 |
+
print(f"\nSpeedup: {speedup:.1f}x (expected ~5x for 5 workers)")
|
| 68 |
+
assert r5["qps"] > r1["qps"], "Concurrent QPS should exceed sequential"
|
| 69 |
+
print("OK: Concurrent benchmark logic works correctly.")
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
if __name__ == "__main__":
|
| 73 |
+
main()
|
docs/LATENCY_OPTIMIZATION.md
ADDED
|
@@ -0,0 +1,130 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Latency Optimization: Full Recommendation Pipeline
|
| 2 |
+
|
| 3 |
+
## Current State
|
| 4 |
+
|
| 5 |
+
| Metric | Value | Target (Spotify-style) |
|
| 6 |
+
|--------|-------|------------------------|
|
| 7 |
+
| P95 Full Recommendation | ~1250ms | < 100ms |
|
| 8 |
+
| Mean | ~700–900ms | - |
|
| 9 |
+
|
| 10 |
+
**面试官点评**: 在 Spotify,推荐接口通常要在 100ms 内返回。1.2s 对用户来说是可以感知的卡顿。
|
| 11 |
+
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
## Latency Breakdown
|
| 15 |
+
|
| 16 |
+
Approximate warm-query breakdown (from `benchmarks/benchmark.py` + `docs/experiments/reports/rerank_report.md`):
|
| 17 |
+
|
| 18 |
+
| Stage | Location | Latency | Notes |
|
| 19 |
+
|-------|----------|---------|-------|
|
| 20 |
+
| Router | `src/core/router.py` | ~1ms | Rule-based, fast |
|
| 21 |
+
| Sparse (FTS5) | `vector_db._sparse_fts_search` | ~20–50ms | SQLite MATCH |
|
| 22 |
+
| Dense (Chroma) | `vector_db.search` | ~50–100ms | HNSW + MiniLM |
|
| 23 |
+
| RRF Fusion | `vector_db.hybrid_search` | ~5ms | In-memory |
|
| 24 |
+
| **Cross-Encoder Rerank** | `src/core/reranker.py` | **~400–900ms** | **主要瓶颈** |
|
| 25 |
+
| Metadata Enrichment | `enrich_and_format` | ~50–100ms | SQLite lookups |
|
| 26 |
+
|
| 27 |
+
**Rerank 详情**:
|
| 28 |
+
- 模型: `cross-encoder/ms-marco-MiniLM-L-6-v2`
|
| 29 |
+
- 候选数: `max(k*4, 20)` = 50 (当 `k=10`)
|
| 30 |
+
- 每个 (query, doc) pair 需完整前向传播
|
| 31 |
+
- 50 对 × ~15–20ms/pair ≈ 750–1000ms
|
| 32 |
+
|
| 33 |
+
---
|
| 34 |
+
|
| 35 |
+
## Root Causes
|
| 36 |
+
|
| 37 |
+
1. **Cross-Encoder 过重**
|
| 38 |
+
- 每对 (query, doc) 都要做完整 attention,无法像 Bi-Encoder 那样预计算 doc 向量
|
| 39 |
+
- 候选数 50 导致串行推理时间长
|
| 40 |
+
|
| 41 |
+
2. **Benchmark 查询全部触发 Rerank**
|
| 42 |
+
- `TEST_QUERIES` 均为自然语言(如 "a romantic comedy set in New York")
|
| 43 |
+
- Router 规则: `len(words) > 2` 且无 detail 关键词 → **DEEP** → `rerank=True`
|
| 44 |
+
- 所以每次 benchmark 都跑 Cross-Encoder
|
| 45 |
+
|
| 46 |
+
3. **LangGraph Agentic 模式更慢**
|
| 47 |
+
- Router → Retrieve → Evaluate(LLM 调用)→ 可选 Web Fallback
|
| 48 |
+
- 串行执行,无并行优化
|
| 49 |
+
|
| 50 |
+
---
|
| 51 |
+
|
| 52 |
+
## Optimization Options
|
| 53 |
+
|
| 54 |
+
### 1. 裁剪候选集(Quick Win)
|
| 55 |
+
|
| 56 |
+
**当前**: `rerank_candidates = top_candidates[:max(k*4, 20)]` → 50 个
|
| 57 |
+
|
| 58 |
+
**建议**: 降为 20 个,或通过 config 可配置
|
| 59 |
+
|
| 60 |
+
```python
|
| 61 |
+
# config.py
|
| 62 |
+
RERANK_CANDIDATES_MAX = 20 # 从 50 降到 20,预期 latency 减半
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
**Trade-off**: 若 Top-20 中漏掉真实相关书,召回会略降;通常 20 足够覆盖。
|
| 66 |
+
|
| 67 |
+
---
|
| 68 |
+
|
| 69 |
+
### 2. ColBERT(Late Interaction)替代 Cross-Encoder
|
| 70 |
+
|
| 71 |
+
**原理**: ColBERT 对 query 和 doc 分别编码,再用 token-level MaxSim 打分,doc 向量可预计算缓存。
|
| 72 |
+
|
| 73 |
+
| 方案 | 推理方式 | 预计算 | 典型 Latency |
|
| 74 |
+
|------|----------|--------|--------------|
|
| 75 |
+
| Cross-Encoder | 每对 (q,d) 完整 forward | 否 | ~15–20ms/pair |
|
| 76 |
+
| ColBERT | q 编码 1 次 + doc 向量 dot | 是(doc 可缓存) | ~2–5ms/doc |
|
| 77 |
+
|
| 78 |
+
**实现要点**:
|
| 79 |
+
- 使用 `colbert-ai/colbertv2` 或类似库
|
| 80 |
+
- 预计算书籍描述的 token embeddings 存入向量库
|
| 81 |
+
- 在线只需 encode query + 与候选 doc 向量做 MaxSim
|
| 82 |
+
|
| 83 |
+
**Trade-off**: 需要额外索引建设和依赖,效果可能与 Cross-Encoder 相当或略逊。
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
### 3. Rerank 异步化
|
| 88 |
+
|
| 89 |
+
**思路**: 先返回 Hybrid RRF 的 Top-K,再后台异步 Rerank,结果通过 WebSocket/轮询或下次请求返回。
|
| 90 |
+
|
| 91 |
+
```
|
| 92 |
+
用户请求 → 立即返回 RRF Top-10 (~150ms) → 后台 Rerank → 推送精排结果(可选)
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
**Trade-off**: 实现复杂,需改动 API 和前端;首屏结果质量略降。
|
| 96 |
+
|
| 97 |
+
---
|
| 98 |
+
|
| 99 |
+
### 4. ONNX 量化(已有规划)
|
| 100 |
+
|
| 101 |
+
`rerank_report.md` 已提到: 使用 Cross-Encoder 的 ONNX 版本可获约 2x 加速。
|
| 102 |
+
|
| 103 |
+
---
|
| 104 |
+
|
| 105 |
+
### 5. 动态 Rerank 策略(已部分实现)
|
| 106 |
+
|
| 107 |
+
Router 已对 ISBN/关键词 禁用 Rerank;可进一步收紧:
|
| 108 |
+
- 仅当 query 长度 > 某阈值且非纯关键词时启用
|
| 109 |
+
- 或增加「低延迟模式」:用户可选「快速」vs「精准」
|
| 110 |
+
|
| 111 |
+
---
|
| 112 |
+
|
| 113 |
+
## Implementation Status (v2.7+)
|
| 114 |
+
|
| 115 |
+
| 优化 | 状态 | 说明 |
|
| 116 |
+
|------|------|------|
|
| 117 |
+
| 1. 裁剪候选集 | ✅ | `RERANK_CANDIDATES_MAX=20` (config), env 可覆盖 |
|
| 118 |
+
| 2. ColBERT | ✅ | `RERANKER_BACKEND=colbert`, 需 `pip install llama-index-postprocessor-colbert-rerank` |
|
| 119 |
+
| 3. Rerank 异步化 | ✅ | `fast=true` 跳过 rerank; `async_rerank=true` 先返 RRF,后台精排并缓存 |
|
| 120 |
+
| 4. ONNX 量化 | ✅ | `RERANKER_BACKEND=onnx` (默认), 需 `onnxruntime` |
|
| 121 |
+
|
| 122 |
+
### API 用法
|
| 123 |
+
|
| 124 |
+
```bash
|
| 125 |
+
# 快速模式 (~150ms)
|
| 126 |
+
curl -X POST /recommend -d '{"query":"romantic comedy","fast":true}'
|
| 127 |
+
|
| 128 |
+
# 异步精排:先返 RRF,下次同 query 返缓存精排
|
| 129 |
+
curl -X POST /recommend -d '{"query":"romantic comedy","async_rerank":true}'
|
| 130 |
+
```
|
docs/interview_guide.md
CHANGED
|
@@ -98,7 +98,37 @@
|
|
| 98 |
|
| 99 |
## 🔬 深度技术问题 (Advanced Technical Q&A)
|
| 100 |
|
| 101 |
-
### Q5.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
|
| 103 |
**问题**:你在 TECHNICAL_REPORT 中使用了 "Hard negative sampling from recall results"。这样做会不会导致 **False Negative** 问题(即把用户其实喜欢但没点击的物品当成了负样本)?在训练 DIN 或 LGBMRanker 时,你是如何平衡 Random Negatives 和 Hard Negatives 的比例的?这对模型收敛有什么影响?
|
| 104 |
|
|
@@ -114,7 +144,7 @@
|
|
| 114 |
|
| 115 |
---
|
| 116 |
|
| 117 |
-
###
|
| 118 |
|
| 119 |
**问题**:SASRec 主要是离线训练的。在 Spotify 场景下,如果用户刚刚连续听了 3 首 "Heavy Metal",我们希望下一首推荐立刻跟上这个兴趣变化。在目前的架构下,如何将用户的**实时交互序列**(还没落库到 CSV)注入到 SASRec 或 DIN 的推理过程中?需要在 `RecommendationService` 里增加什么逻辑?
|
| 120 |
|
|
@@ -135,7 +165,7 @@
|
|
| 135 |
|
| 136 |
---
|
| 137 |
|
| 138 |
-
###
|
| 139 |
|
| 140 |
**问题**:目前关注的是 HR@10 和 NDCG。作为内容平台,发现推荐列表里全是热门书(Harry Potter 效应)。如果要求在不显著降低 Accuracy 的前提下,提升推荐结果的 **Diversity(多样性)** 和 **Serendipity(惊喜感)**,你会如何在 Ranking 阶段或 Rerank 阶段修改目标函数或逻辑?
|
| 141 |
|
|
@@ -162,7 +192,7 @@
|
|
| 162 |
|
| 163 |
## 📋 已知限制与改进方向 (Known Limitations & Improvement)
|
| 164 |
|
| 165 |
-
###
|
| 166 |
|
| 167 |
**现象**:代码库在向 production 演进过程中,仍保留了一些研究原型风格的痕迹。
|
| 168 |
|
|
|
|
| 98 |
|
| 99 |
## 🔬 深度技术问题 (Advanced Technical Q&A)
|
| 100 |
|
| 101 |
+
### Q5. ChromaDB/SQLite 内存与扩展性:千万级迁移
|
| 102 |
+
|
| 103 |
+
**问题**:你选择了 ChromaDB (embedded) 和 SQLite。这对于演示很好,但对于千万级 Item 的库(Spotify 级别),这不可行。**如何迁移到 Milvus/Qdrant?如何对 ANN 索引(HNSW)进行分片?**
|
| 104 |
+
|
| 105 |
+
**考察点**:对向量数据库扩展性、分布式 ANN 的理解。
|
| 106 |
+
|
| 107 |
+
**建议回答**:
|
| 108 |
+
|
| 109 |
+
> 当前架构(ChromaDB + SQLite)适合 20 万级数据和演示。千万级规模下存在以下瓶颈:
|
| 110 |
+
>
|
| 111 |
+
> **ChromaDB**:嵌入式、单机、索引加载到内存。10M × 384 维 × 4B ≈ 15GB 向量,HNSW 图结构可能再放大 10–50 倍,单机内存和 CPU 无法支撑。
|
| 112 |
+
>
|
| 113 |
+
> **SQLite**:单文件、单写锁、磁盘 I/O 成为瓶颈。
|
| 114 |
+
>
|
| 115 |
+
> **迁移策略**:
|
| 116 |
+
>
|
| 117 |
+
> 1. **抽象 VectorStore 接口**:在 `vector_db.py` 中抽象 `VectorStoreInterface`,实现 `ChromaVectorStore`、`QdrantVectorStore`、`MilvusVectorStore`,通过配置切换,便于迁移。
|
| 118 |
+
> 2. **选型**:Milvus 适合大数据、分析 + 检索、原生分布式;Qdrant 更轻量、纯向量检索。千万级两者皆可。
|
| 119 |
+
> 3. **迁移步骤**:导出 Chroma 的 (id, embedding, metadata) → 在 Milvus/Qdrant 创建 Collection、配置 HNSW 参数 → 批量 upsert → 配置切换。
|
| 120 |
+
>
|
| 121 |
+
> **HNSW 分片**:
|
| 122 |
+
>
|
| 123 |
+
> - **按 ID 哈希分片**:`hash(id) % N` 分布到 N 个 shard,每 shard 内建 HNSW。查询时并发打 N 个 shard,各取 top_k,再 merge 取最终 top_k。
|
| 124 |
+
> - **按 embedding 聚类分片**:K-Means 聚类,query 先定位所属簇,只查少数 shard(减少查询范围,但需处理冷启动和数据倾斜)。
|
| 125 |
+
> - **利用 Milvus/Qdrant 内置能力**:两者都支持分布式分片,可直接使用其 Sharding 配置,无需自建。
|
| 126 |
+
>
|
| 127 |
+
> **与 Q4 的衔接**:metadata_store 的 SQLite 按 Q4 方案改造(Redis + PostgreSQL/Cassandra); sparse 检索 FTS5 可迁移到 Elasticsearch/Meilisearch 做 hybrid。
|
| 128 |
+
|
| 129 |
+
---
|
| 130 |
+
|
| 131 |
+
### Q6. 负采样 (Negative Sampling)
|
| 132 |
|
| 133 |
**问题**:你在 TECHNICAL_REPORT 中使用了 "Hard negative sampling from recall results"。这样做会不会导致 **False Negative** 问题(即把用户其实喜欢但没点击的物品当成了负样本)?在训练 DIN 或 LGBMRanker 时,你是如何平衡 Random Negatives 和 Hard Negatives 的比例的?这对模型收敛有什么影响?
|
| 134 |
|
|
|
|
| 144 |
|
| 145 |
---
|
| 146 |
|
| 147 |
+
### Q7. 实时性 (Real-time / Near-line)
|
| 148 |
|
| 149 |
**问题**:SASRec 主要是离线训练的。在 Spotify 场景下,如果用户刚刚连续听了 3 首 "Heavy Metal",我们希望下一首推荐立刻跟上这个兴趣变化。在目前的架构下,如何将用户的**实时交互序列**(还没落库到 CSV)注入到 SASRec 或 DIN 的推理过程中?需要在 `RecommendationService` 里增加什么逻辑?
|
| 150 |
|
|
|
|
| 165 |
|
| 166 |
---
|
| 167 |
|
| 168 |
+
### Q8. 评估指标:Diversity 与 Serendipity
|
| 169 |
|
| 170 |
**问题**:目前关注的是 HR@10 和 NDCG。作为内容平台,发现推荐列表里全是热门书(Harry Potter 效应)。如果要求在不显著降低 Accuracy 的前提下,提升推荐结果的 **Diversity(多样性)** 和 **Serendipity(惊喜感)**,你会如何在 Ranking 阶段或 Rerank 阶段修改目标函数或逻辑?
|
| 171 |
|
|
|
|
| 192 |
|
| 193 |
## 📋 已知限制与改进方向 (Known Limitations & Improvement)
|
| 194 |
|
| 195 |
+
### Q9. "Research" 风格的代码残留
|
| 196 |
|
| 197 |
**现象**:代码库在向 production 演进过程中,仍保留了一些研究原型风格的痕迹。
|
| 198 |
|
requirements.txt
CHANGED
|
@@ -24,6 +24,7 @@ langchain-openai
|
|
| 24 |
transformers>=4.40.0
|
| 25 |
torch
|
| 26 |
sentence-transformers
|
|
|
|
| 27 |
gensim>=4.3.0
|
| 28 |
lightgbm
|
| 29 |
xgboost>=2.0.0
|
|
@@ -43,6 +44,9 @@ requests
|
|
| 43 |
# Intent classifier backends (optional)
|
| 44 |
# fasttext # Uncomment for FastText backend: pip install fasttext
|
| 45 |
|
|
|
|
|
|
|
|
|
|
| 46 |
# LLM Agent & Fine-tuning
|
| 47 |
faiss-cpu
|
| 48 |
diffusers
|
|
|
|
| 24 |
transformers>=4.40.0
|
| 25 |
torch
|
| 26 |
sentence-transformers
|
| 27 |
+
onnxruntime>=1.16.0 # For CrossEncoder backend=onnx (~2x faster)
|
| 28 |
gensim>=4.3.0
|
| 29 |
lightgbm
|
| 30 |
xgboost>=2.0.0
|
|
|
|
| 44 |
# Intent classifier backends (optional)
|
| 45 |
# fasttext # Uncomment for FastText backend: pip install fasttext
|
| 46 |
|
| 47 |
+
# Latency: ColBERT reranker (optional, RERANKER_BACKEND=colbert)
|
| 48 |
+
# llama-index-postprocessor-colbert-rerank
|
| 49 |
+
|
| 50 |
# LLM Agent & Fine-tuning
|
| 51 |
faiss-cpu
|
| 52 |
diffusers
|
scripts/model/train_intent_router.py
CHANGED
|
@@ -67,6 +67,18 @@ SEED_DATA = [
|
|
| 67 |
("music", "fast"),
|
| 68 |
("art", "fast"),
|
| 69 |
("philosophy", "fast"),
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
# deep: natural language, complex queries
|
| 71 |
("What are the best books about artificial intelligence for beginners", "deep"),
|
| 72 |
("I'm looking for something similar to Harry Potter", "deep"),
|
|
@@ -87,6 +99,12 @@ SEED_DATA = [
|
|
| 87 |
("Recommend me novels with strong female protagonists", "deep"),
|
| 88 |
("What to read to understand economics", "deep"),
|
| 89 |
("Books on meditation and mindfulness", "deep"),
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
]
|
| 91 |
|
| 92 |
|
|
@@ -144,8 +162,8 @@ def main():
|
|
| 144 |
pred = result.predict(sample)[0][0].replace("__label__", "")
|
| 145 |
elif args.backend == "distilbert":
|
| 146 |
from transformers import pipeline
|
| 147 |
-
|
| 148 |
-
|
| 149 |
else:
|
| 150 |
pred = result.predict([sample])[0]
|
| 151 |
ok = "✓" if pred == intent else "✗"
|
|
|
|
| 67 |
("music", "fast"),
|
| 68 |
("art", "fast"),
|
| 69 |
("philosophy", "fast"),
|
| 70 |
+
# fast: book titles (keyword-like, BM25 works well)
|
| 71 |
+
("War and Peace", "fast"),
|
| 72 |
+
("The Lord of the Rings", "fast"),
|
| 73 |
+
("Harry Potter", "fast"),
|
| 74 |
+
("1984", "fast"),
|
| 75 |
+
("To Kill a Mockingbird", "fast"),
|
| 76 |
+
("The Great Gatsby", "fast"),
|
| 77 |
+
("Pride and Prejudice", "fast"),
|
| 78 |
+
("Dune", "fast"),
|
| 79 |
+
("Sapiens", "fast"),
|
| 80 |
+
("Atomic Habits", "fast"),
|
| 81 |
+
("Deep Work", "fast"),
|
| 82 |
# deep: natural language, complex queries
|
| 83 |
("What are the best books about artificial intelligence for beginners", "deep"),
|
| 84 |
("I'm looking for something similar to Harry Potter", "deep"),
|
|
|
|
| 99 |
("Recommend me novels with strong female protagonists", "deep"),
|
| 100 |
("What to read to understand economics", "deep"),
|
| 101 |
("Books on meditation and mindfulness", "deep"),
|
| 102 |
+
# deep: natural language with book references (need context, not just keyword)
|
| 103 |
+
("books like War and Peace", "deep"),
|
| 104 |
+
("similar to The Lord of the Rings", "deep"),
|
| 105 |
+
("recommend something like Harry Potter", "deep"),
|
| 106 |
+
("what to read after 1984", "deep"),
|
| 107 |
+
("books similar to Sapiens", "deep"),
|
| 108 |
]
|
| 109 |
|
| 110 |
|
|
|
|
| 162 |
pred = result.predict(sample)[0][0].replace("__label__", "")
|
| 163 |
elif args.backend == "distilbert":
|
| 164 |
from transformers import pipeline
|
| 165 |
+
pipe = pipeline("zero-shot-classification", model="distilbert-base-uncased", device=-1)
|
| 166 |
+
pred = pipe(sample, INTENTS, multi_label=False)["labels"][0]
|
| 167 |
else:
|
| 168 |
pred = result.predict([sample])[0]
|
| 169 |
ok = "✓" if pred == intent else "✗"
|
src/config.py
CHANGED
|
@@ -32,6 +32,12 @@ CACHE_TTL = 3600 # 1 hour
|
|
| 32 |
TOP_K_INITIAL = 50
|
| 33 |
TOP_K_FINAL = 10
|
| 34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
# Debug mode: set DEBUG=1 to enable verbose logging (research prototype style)
|
| 36 |
DEBUG = os.getenv("DEBUG", "0") == "1"
|
| 37 |
|
|
@@ -47,6 +53,10 @@ def _load_router_config() -> dict:
|
|
| 47 |
"new", "newest", "latest", "recent", "modern", "contemporary", "current",
|
| 48 |
],
|
| 49 |
"strong_freshness_keywords": ["newest", "latest"],
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
}
|
| 51 |
path = CONFIG_DIR / "router.json"
|
| 52 |
if path.exists():
|
|
@@ -82,3 +92,6 @@ ROUTER_FRESHNESS_KEYWORDS: frozenset[str] = frozenset(
|
|
| 82 |
ROUTER_STRONG_FRESHNESS_KEYWORDS: frozenset[str] = frozenset(
|
| 83 |
str(k).lower() for k in _ROUTER_CFG.get("strong_freshness_keywords", [])
|
| 84 |
)
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
TOP_K_INITIAL = 50
|
| 33 |
TOP_K_FINAL = 10
|
| 34 |
|
| 35 |
+
# Latency: Rerank candidate cap (lower = faster, LATENCY_OPTIMIZATION.md)
|
| 36 |
+
RERANK_CANDIDATES_MAX = int(os.getenv("RERANK_CANDIDATES_MAX", "20"))
|
| 37 |
+
|
| 38 |
+
# Reranker backend: cross_encoder | onnx | colbert (onnx ~2x faster, colbert optional)
|
| 39 |
+
RERANKER_BACKEND = os.getenv("RERANKER_BACKEND", "onnx")
|
| 40 |
+
|
| 41 |
# Debug mode: set DEBUG=1 to enable verbose logging (research prototype style)
|
| 42 |
DEBUG = os.getenv("DEBUG", "0") == "1"
|
| 43 |
|
|
|
|
| 53 |
"new", "newest", "latest", "recent", "modern", "contemporary", "current",
|
| 54 |
],
|
| 55 |
"strong_freshness_keywords": ["newest", "latest"],
|
| 56 |
+
"natural_language_keywords": [
|
| 57 |
+
"like", "similar", "recommend", "want", "looking", "books", "something",
|
| 58 |
+
"suggest", "recommendations", "after", "read", "if", "liked",
|
| 59 |
+
],
|
| 60 |
}
|
| 61 |
path = CONFIG_DIR / "router.json"
|
| 62 |
if path.exists():
|
|
|
|
| 92 |
ROUTER_STRONG_FRESHNESS_KEYWORDS: frozenset[str] = frozenset(
|
| 93 |
str(k).lower() for k in _ROUTER_CFG.get("strong_freshness_keywords", [])
|
| 94 |
)
|
| 95 |
+
ROUTER_NL_KEYWORDS: frozenset[str] = frozenset(
|
| 96 |
+
str(k).lower() for k in _ROUTER_CFG.get("natural_language_keywords", [])
|
| 97 |
+
)
|
src/core/intent_prober.py
ADDED
|
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
P2: Zero-shot intent probing for cold-start users.
|
| 3 |
+
|
| 4 |
+
Uses LLM to infer categories, emotions, and keywords from a user's first query.
|
| 5 |
+
When user has no history, this helps seed preferences for faster convergence.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import json
|
| 9 |
+
import re
|
| 10 |
+
from typing import Optional
|
| 11 |
+
|
| 12 |
+
from src.utils import setup_logger
|
| 13 |
+
|
| 14 |
+
logger = setup_logger(__name__)
|
| 15 |
+
|
| 16 |
+
# Categories we support (match router/metadata)
|
| 17 |
+
KNOWN_CATEGORIES = [
|
| 18 |
+
"Fiction", "History", "Philosophy", "Science", "Art",
|
| 19 |
+
"Biography", "Mystery", "Romance", "Fantasy", "Science Fiction",
|
| 20 |
+
"Literary", "General",
|
| 21 |
+
]
|
| 22 |
+
|
| 23 |
+
EMOTION_KEYWORDS = [
|
| 24 |
+
"happy", "sad", "suspenseful", "angry", "surprising",
|
| 25 |
+
"heartbreaking", "uplifting", "thought-provoking", "relaxing",
|
| 26 |
+
]
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
def probe_intent(query: str, llm=None) -> dict:
|
| 30 |
+
"""
|
| 31 |
+
Infer user intent from a short query (zero-shot, no history).
|
| 32 |
+
|
| 33 |
+
Returns:
|
| 34 |
+
dict with keys: categories, emotions, keywords, summary
|
| 35 |
+
"""
|
| 36 |
+
if not query or not query.strip():
|
| 37 |
+
return {"categories": [], "emotions": [], "keywords": [], "summary": ""}
|
| 38 |
+
|
| 39 |
+
if llm is None:
|
| 40 |
+
try:
|
| 41 |
+
from src.core.llm import get_llm_model
|
| 42 |
+
import os
|
| 43 |
+
provider = os.getenv("LLM_PROVIDER", "ollama")
|
| 44 |
+
api_key = os.getenv("OPENAI_API_KEY") if provider == "openai" else None
|
| 45 |
+
llm = get_llm_model(provider=provider, api_key=api_key)
|
| 46 |
+
except Exception as e:
|
| 47 |
+
logger.warning(f"Intent prober: LLM not available ({e}), using rule-based fallback")
|
| 48 |
+
return _rule_based_intent(query)
|
| 49 |
+
|
| 50 |
+
prompt = f"""Analyze this book preference query and return JSON only.
|
| 51 |
+
|
| 52 |
+
Query: "{query.strip()}"
|
| 53 |
+
|
| 54 |
+
Extract:
|
| 55 |
+
- categories: list of book categories from {KNOWN_CATEGORIES} that match (max 3)
|
| 56 |
+
- emotions: list of emotions/moods from {EMOTION_KEYWORDS} that match (max 2)
|
| 57 |
+
- keywords: 2-4 short searchable keywords (e.g. "WWII", "detective", "love story")
|
| 58 |
+
- summary: one short sentence summarizing what the user wants
|
| 59 |
+
|
| 60 |
+
Return only valid JSON, no markdown:
|
| 61 |
+
{{"categories": [...], "emotions": [...], "keywords": [...], "summary": "..."}}"""
|
| 62 |
+
|
| 63 |
+
try:
|
| 64 |
+
response = llm.invoke(prompt)
|
| 65 |
+
text = response.content if hasattr(response, "content") else str(response)
|
| 66 |
+
# Extract JSON from response (handle markdown code blocks)
|
| 67 |
+
json_match = re.search(r"\{[^{}]*\}", text, re.DOTALL)
|
| 68 |
+
if json_match:
|
| 69 |
+
data = json.loads(json_match.group())
|
| 70 |
+
return {
|
| 71 |
+
"categories": data.get("categories", [])[:3],
|
| 72 |
+
"emotions": data.get("emotions", [])[:2],
|
| 73 |
+
"keywords": data.get("keywords", [])[:4],
|
| 74 |
+
"summary": data.get("summary", "")[:200],
|
| 75 |
+
}
|
| 76 |
+
except Exception as e:
|
| 77 |
+
logger.warning(f"Intent prober LLM failed: {e}")
|
| 78 |
+
|
| 79 |
+
return _rule_based_intent(query)
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
def _rule_based_intent(query: str) -> dict:
|
| 83 |
+
"""Fallback when LLM unavailable: simple keyword matching."""
|
| 84 |
+
lower = query.lower().strip()
|
| 85 |
+
categories = []
|
| 86 |
+
emotions = []
|
| 87 |
+
keywords = []
|
| 88 |
+
|
| 89 |
+
cat_map = {
|
| 90 |
+
"fiction": "Fiction", "history": "History", "philosophy": "Philosophy",
|
| 91 |
+
"science": "Science", "art": "Art", "mystery": "Mystery", "romance": "Romance",
|
| 92 |
+
"fantasy": "Fantasy", "sci-fi": "Science Fiction", "biography": "Biography",
|
| 93 |
+
}
|
| 94 |
+
for k, v in cat_map.items():
|
| 95 |
+
if re.search(r"\b" + re.escape(k) + r"\b", lower):
|
| 96 |
+
categories.append(v)
|
| 97 |
+
|
| 98 |
+
for e in EMOTION_KEYWORDS:
|
| 99 |
+
if re.search(r"\b" + re.escape(e) + r"\b", lower):
|
| 100 |
+
emotions.append(e)
|
| 101 |
+
|
| 102 |
+
# Extract likely keywords (words 4+ chars, not common)
|
| 103 |
+
stop = {"book", "books", "want", "like", "looking", "something", "that", "with", "the", "and"}
|
| 104 |
+
words = [w for w in re.findall(r"\b\w{4,}\b", lower) if w not in stop][:4]
|
| 105 |
+
keywords.extend(words)
|
| 106 |
+
|
| 107 |
+
return {
|
| 108 |
+
"categories": categories[:3] or ["General"],
|
| 109 |
+
"emotions": emotions[:2],
|
| 110 |
+
"keywords": keywords[:4],
|
| 111 |
+
"summary": query[:150] if query else "",
|
| 112 |
+
}
|
src/core/recommendation_orchestrator.py
CHANGED
|
@@ -54,9 +54,13 @@ class RecommendationOrchestrator:
|
|
| 54 |
tone: str = "All",
|
| 55 |
user_id: str = "local",
|
| 56 |
use_agentic: bool = False,
|
|
|
|
|
|
|
| 57 |
) -> List[Dict[str, Any]]:
|
| 58 |
"""
|
| 59 |
Generate book recommendations. Async for web search fallback.
|
|
|
|
|
|
|
| 60 |
"""
|
| 61 |
if not query or not query.strip():
|
| 62 |
return []
|
|
@@ -67,17 +71,34 @@ class RecommendationOrchestrator:
|
|
| 67 |
logger.info(f"Returning cached results for key: {cache_key}")
|
| 68 |
return cached
|
| 69 |
|
| 70 |
-
logger.info(f"Processing request: query='{query}', category='{category}', use_agentic={use_agentic}")
|
|
|
|
|
|
|
| 71 |
|
| 72 |
if use_agentic:
|
| 73 |
results = await self._get_recommendations_agentic(query, category)
|
| 74 |
else:
|
| 75 |
-
results = await self._get_recommendations_classic(query, category)
|
| 76 |
|
| 77 |
if results:
|
| 78 |
self.cache.set(cache_key, results)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
return results
|
| 80 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
def get_recommendations_sync(
|
| 82 |
self,
|
| 83 |
query: str,
|
|
@@ -85,10 +106,12 @@ class RecommendationOrchestrator:
|
|
| 85 |
tone: str = "All",
|
| 86 |
user_id: str = "local",
|
| 87 |
use_agentic: bool = False,
|
|
|
|
|
|
|
| 88 |
) -> List[Dict[str, Any]]:
|
| 89 |
"""Sync wrapper for scripts/CLI."""
|
| 90 |
import asyncio
|
| 91 |
-
return asyncio.run(self.get_recommendations(query, category, tone, user_id, use_agentic))
|
| 92 |
|
| 93 |
async def _get_recommendations_agentic(self, query: str, category: str) -> List[Dict[str, Any]]:
|
| 94 |
"""LangGraph workflow: Router -> Retrieve -> Evaluate -> (optional) Web Fallback."""
|
|
@@ -103,7 +126,7 @@ class RecommendationOrchestrator:
|
|
| 103 |
books_list = final_state.get("isbn_list", [])
|
| 104 |
return enrich_and_format(books_list, category, TOP_K_FINAL, "local", metadata_store_inst=self._meta)
|
| 105 |
|
| 106 |
-
async def _get_recommendations_classic(self, query: str, category: str) -> List[Dict[str, Any]]:
|
| 107 |
"""Classic Router -> Hybrid/Small-to-Big -> optional Web Fallback."""
|
| 108 |
from src.core.router import QueryRouter
|
| 109 |
|
|
@@ -111,6 +134,8 @@ class RecommendationOrchestrator:
|
|
| 111 |
decision = router.route(query)
|
| 112 |
logger.info(f"Retrieval Strategy: {decision}")
|
| 113 |
|
|
|
|
|
|
|
| 114 |
if decision["strategy"] == "small_to_big":
|
| 115 |
recs = self.vector_db.small_to_big_search(query, k=TOP_K_INITIAL)
|
| 116 |
else:
|
|
@@ -118,7 +143,7 @@ class RecommendationOrchestrator:
|
|
| 118 |
query,
|
| 119 |
k=TOP_K_INITIAL,
|
| 120 |
alpha=decision.get("alpha", 0.5),
|
| 121 |
-
rerank=
|
| 122 |
temporal=decision.get("temporal", False),
|
| 123 |
)
|
| 124 |
|
|
|
|
| 54 |
tone: str = "All",
|
| 55 |
user_id: str = "local",
|
| 56 |
use_agentic: bool = False,
|
| 57 |
+
fast: bool = False,
|
| 58 |
+
async_rerank: bool = False,
|
| 59 |
) -> List[Dict[str, Any]]:
|
| 60 |
"""
|
| 61 |
Generate book recommendations. Async for web search fallback.
|
| 62 |
+
fast: Skip rerank for low latency (~150ms).
|
| 63 |
+
async_rerank: Return RRF immediately, rerank in background; next request gets cached reranked.
|
| 64 |
"""
|
| 65 |
if not query or not query.strip():
|
| 66 |
return []
|
|
|
|
| 71 |
logger.info(f"Returning cached results for key: {cache_key}")
|
| 72 |
return cached
|
| 73 |
|
| 74 |
+
logger.info(f"Processing request: query='{query}', category='{category}', use_agentic={use_agentic}, fast={fast}, async_rerank={async_rerank}")
|
| 75 |
+
|
| 76 |
+
skip_rerank = fast or async_rerank
|
| 77 |
|
| 78 |
if use_agentic:
|
| 79 |
results = await self._get_recommendations_agentic(query, category)
|
| 80 |
else:
|
| 81 |
+
results = await self._get_recommendations_classic(query, category, skip_rerank=skip_rerank)
|
| 82 |
|
| 83 |
if results:
|
| 84 |
self.cache.set(cache_key, results)
|
| 85 |
+
|
| 86 |
+
if async_rerank and not use_agentic and skip_rerank:
|
| 87 |
+
import asyncio
|
| 88 |
+
asyncio.create_task(self._background_rerank_and_cache(query, category, cache_key))
|
| 89 |
+
|
| 90 |
return results
|
| 91 |
|
| 92 |
+
async def _background_rerank_and_cache(self, query: str, category: str, cache_key: str) -> None:
|
| 93 |
+
"""Run full pipeline with rerank and cache for async_rerank flow."""
|
| 94 |
+
try:
|
| 95 |
+
results = await self._get_recommendations_classic(query, category, skip_rerank=False)
|
| 96 |
+
if results:
|
| 97 |
+
self.cache.set(cache_key, results)
|
| 98 |
+
logger.info(f"Background rerank completed for query '{query[:30]}...'")
|
| 99 |
+
except Exception as e:
|
| 100 |
+
logger.warning(f"Background rerank failed: {e}")
|
| 101 |
+
|
| 102 |
def get_recommendations_sync(
|
| 103 |
self,
|
| 104 |
query: str,
|
|
|
|
| 106 |
tone: str = "All",
|
| 107 |
user_id: str = "local",
|
| 108 |
use_agentic: bool = False,
|
| 109 |
+
fast: bool = False,
|
| 110 |
+
async_rerank: bool = False,
|
| 111 |
) -> List[Dict[str, Any]]:
|
| 112 |
"""Sync wrapper for scripts/CLI."""
|
| 113 |
import asyncio
|
| 114 |
+
return asyncio.run(self.get_recommendations(query, category, tone, user_id, use_agentic, fast, async_rerank))
|
| 115 |
|
| 116 |
async def _get_recommendations_agentic(self, query: str, category: str) -> List[Dict[str, Any]]:
|
| 117 |
"""LangGraph workflow: Router -> Retrieve -> Evaluate -> (optional) Web Fallback."""
|
|
|
|
| 126 |
books_list = final_state.get("isbn_list", [])
|
| 127 |
return enrich_and_format(books_list, category, TOP_K_FINAL, "local", metadata_store_inst=self._meta)
|
| 128 |
|
| 129 |
+
async def _get_recommendations_classic(self, query: str, category: str, skip_rerank: bool = False) -> List[Dict[str, Any]]:
|
| 130 |
"""Classic Router -> Hybrid/Small-to-Big -> optional Web Fallback."""
|
| 131 |
from src.core.router import QueryRouter
|
| 132 |
|
|
|
|
| 134 |
decision = router.route(query)
|
| 135 |
logger.info(f"Retrieval Strategy: {decision}")
|
| 136 |
|
| 137 |
+
do_rerank = decision["rerank"] and not skip_rerank
|
| 138 |
+
|
| 139 |
if decision["strategy"] == "small_to_big":
|
| 140 |
recs = self.vector_db.small_to_big_search(query, k=TOP_K_INITIAL)
|
| 141 |
else:
|
|
|
|
| 143 |
query,
|
| 144 |
k=TOP_K_INITIAL,
|
| 145 |
alpha=decision.get("alpha", 0.5),
|
| 146 |
+
rerank=do_rerank,
|
| 147 |
temporal=decision.get("temporal", False),
|
| 148 |
)
|
| 149 |
|
src/core/reranker.py
CHANGED
|
@@ -1,104 +1,145 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
from src.utils import setup_logger
|
| 5 |
|
| 6 |
logger = setup_logger(__name__)
|
| 7 |
|
| 8 |
-
# 轻量级重排序模型,速度快且效果不错
|
| 9 |
DEFAULT_RERANKER_MODEL = "cross-encoder/ms-marco-MiniLM-L-6-v2"
|
| 10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
class RerankerService:
|
| 12 |
"""
|
| 13 |
-
Singleton
|
| 14 |
-
This significantly improves RAG precision by scoring the exact relevance
|
| 15 |
-
of (query, document) pairs.
|
| 16 |
"""
|
| 17 |
_instance = None
|
| 18 |
-
|
| 19 |
def __new__(cls):
|
| 20 |
if cls._instance is None:
|
| 21 |
cls._instance = super(RerankerService, cls).__new__(cls)
|
| 22 |
cls._instance.model = None
|
|
|
|
| 23 |
return cls._instance
|
| 24 |
-
|
| 25 |
def __init__(self):
|
| 26 |
if self.model is None:
|
| 27 |
self._load_model()
|
| 28 |
-
|
| 29 |
def _load_model(self):
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
self.model =
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
|
|
|
|
|
|
| 40 |
"""
|
| 41 |
-
Rerank
|
| 42 |
-
|
| 43 |
-
Args:
|
| 44 |
-
query: User question
|
| 45 |
-
docs: List of dicts, each must have a 'content' field (or 'description')
|
| 46 |
-
top_k: Number of results to return
|
| 47 |
-
|
| 48 |
-
Returns:
|
| 49 |
-
Top-K sorted documents with added 'score' field.
|
| 50 |
"""
|
| 51 |
if not self.model or not docs:
|
| 52 |
return docs[:top_k]
|
| 53 |
-
|
| 54 |
-
# Prepare pairs for Cross-Encoder: [[query, doc1], [query, doc2], ...]
|
| 55 |
-
# We assume 'description' or 'page_content' holds the text
|
| 56 |
-
pairs = []
|
| 57 |
-
valid_docs = []
|
| 58 |
-
|
| 59 |
-
for doc in docs:
|
| 60 |
-
# Handle LangChain Document object
|
| 61 |
-
if hasattr(doc, "page_content"):
|
| 62 |
-
text = doc.page_content
|
| 63 |
-
# Handle Dict
|
| 64 |
-
else:
|
| 65 |
-
text = doc.get("description") or doc.get("page_content") or str(doc)
|
| 66 |
-
|
| 67 |
-
pairs.append([query, text])
|
| 68 |
-
valid_docs.append(doc)
|
| 69 |
-
|
| 70 |
-
if not pairs:
|
| 71 |
-
return docs[:top_k]
|
| 72 |
|
| 73 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
scores = self.model.predict(pairs)
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
else:
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
# Sort descending by score
|
| 94 |
-
def get_score(doc):
|
| 95 |
-
if hasattr(doc, "metadata"):
|
| 96 |
-
return doc.metadata.get("relevance_score", 0)
|
| 97 |
-
return doc.get("score", 0)
|
| 98 |
-
|
| 99 |
-
scored_results.sort(key=get_score, reverse=True)
|
| 100 |
-
|
| 101 |
-
return scored_results[:top_k]
|
| 102 |
-
|
| 103 |
-
# Global instance
|
| 104 |
reranker = RerankerService()
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Reranker: Cross-Encoder (torch/ONNX) or ColBERT (optional).
|
| 3 |
+
Backend selectable via RERANKER_BACKEND env: cross_encoder | onnx | colbert.
|
| 4 |
+
ONNX ~2x faster than torch; ColBERT requires llama-index-postprocessor-colbert-rerank.
|
| 5 |
+
"""
|
| 6 |
+
from typing import List, Dict, Any
|
| 7 |
+
|
| 8 |
+
from src.config import RERANKER_BACKEND
|
| 9 |
from src.utils import setup_logger
|
| 10 |
|
| 11 |
logger = setup_logger(__name__)
|
| 12 |
|
|
|
|
| 13 |
DEFAULT_RERANKER_MODEL = "cross-encoder/ms-marco-MiniLM-L-6-v2"
|
| 14 |
|
| 15 |
+
|
| 16 |
+
def _load_cross_encoder(backend: str):
|
| 17 |
+
"""Load CrossEncoder with torch or ONNX backend. Falls back to torch if ONNX fails."""
|
| 18 |
+
from sentence_transformers import CrossEncoder
|
| 19 |
+
import torch
|
| 20 |
+
|
| 21 |
+
device = "mps" if torch.backends.mps.is_available() else "cpu"
|
| 22 |
+
be = "onnx" if backend == "onnx" else "torch"
|
| 23 |
+
|
| 24 |
+
try:
|
| 25 |
+
logger.info(f"Loading Reranker ({DEFAULT_RERANKER_MODEL}) backend={be} on {device}...")
|
| 26 |
+
model = CrossEncoder(DEFAULT_RERANKER_MODEL, device=device, backend=be)
|
| 27 |
+
logger.info("Reranker model loaded.")
|
| 28 |
+
return model
|
| 29 |
+
except Exception as e:
|
| 30 |
+
if be == "onnx":
|
| 31 |
+
logger.warning(f"ONNX backend failed (pip install onnxruntime?), falling back to torch: {e}")
|
| 32 |
+
return CrossEncoder(DEFAULT_RERANKER_MODEL, device=device, backend="torch")
|
| 33 |
+
raise
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
def _load_colbert():
|
| 37 |
+
"""Load ColBERT reranker via llama-index (optional dep)."""
|
| 38 |
+
try:
|
| 39 |
+
from llama_index.postprocessor.colbert_rerank import ColbertRerank
|
| 40 |
+
|
| 41 |
+
return ColbertRerank(
|
| 42 |
+
model_name="colbert-ir/colbertv2.0",
|
| 43 |
+
top_n=10,
|
| 44 |
+
)
|
| 45 |
+
except ImportError as e:
|
| 46 |
+
logger.warning(f"ColBERT not available (pip install llama-index-postprocessor-colbert-rerank): {e}")
|
| 47 |
+
return None
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
def _get_text(doc: Any) -> str:
|
| 51 |
+
if hasattr(doc, "page_content"):
|
| 52 |
+
return doc.page_content
|
| 53 |
+
return doc.get("description") or doc.get("page_content") or str(doc)
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
def _set_score(doc: Any, score: float) -> None:
|
| 57 |
+
if hasattr(doc, "metadata"):
|
| 58 |
+
doc.metadata["relevance_score"] = score
|
| 59 |
+
else:
|
| 60 |
+
doc["score"] = score
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
def _get_score(doc: Any) -> float:
|
| 64 |
+
if hasattr(doc, "metadata"):
|
| 65 |
+
return doc.metadata.get("relevance_score", 0)
|
| 66 |
+
return doc.get("score", 0)
|
| 67 |
+
|
| 68 |
+
|
| 69 |
class RerankerService:
|
| 70 |
"""
|
| 71 |
+
Singleton reranker: Cross-Encoder (torch/ONNX) or ColBERT.
|
|
|
|
|
|
|
| 72 |
"""
|
| 73 |
_instance = None
|
| 74 |
+
|
| 75 |
def __new__(cls):
|
| 76 |
if cls._instance is None:
|
| 77 |
cls._instance = super(RerankerService, cls).__new__(cls)
|
| 78 |
cls._instance.model = None
|
| 79 |
+
cls._instance._backend = None
|
| 80 |
return cls._instance
|
| 81 |
+
|
| 82 |
def __init__(self):
|
| 83 |
if self.model is None:
|
| 84 |
self._load_model()
|
| 85 |
+
|
| 86 |
def _load_model(self):
|
| 87 |
+
backend = (RERANKER_BACKEND or "").lower()
|
| 88 |
+
|
| 89 |
+
if backend == "colbert":
|
| 90 |
+
self.model = _load_colbert()
|
| 91 |
+
self._backend = "colbert" if self.model else "cross_encoder"
|
| 92 |
+
if self._backend == "cross_encoder":
|
| 93 |
+
self.model = _load_cross_encoder("torch")
|
| 94 |
+
else:
|
| 95 |
+
self._backend = "onnx" if backend == "onnx" else "cross_encoder"
|
| 96 |
+
self.model = _load_cross_encoder(self._backend)
|
| 97 |
+
|
| 98 |
+
def rerank(self, query: str, docs: List[Any], top_k: int = 5) -> List[Any]:
|
| 99 |
"""
|
| 100 |
+
Rerank documents by relevance to query.
|
| 101 |
+
docs: List of dicts or LangChain Document with description/page_content.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
"""
|
| 103 |
if not self.model or not docs:
|
| 104 |
return docs[:top_k]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
|
| 106 |
+
if self._backend == "colbert":
|
| 107 |
+
return self._rerank_colbert(query, docs, top_k)
|
| 108 |
+
return self._rerank_cross_encoder(query, docs, top_k)
|
| 109 |
+
|
| 110 |
+
def _rerank_cross_encoder(self, query: str, docs: List[Any], top_k: int) -> List[Any]:
|
| 111 |
+
pairs = [[query, _get_text(d)] for d in docs]
|
| 112 |
scores = self.model.predict(pairs)
|
| 113 |
+
|
| 114 |
+
for i, doc in enumerate(docs):
|
| 115 |
+
_set_score(doc, float(scores[i]))
|
| 116 |
+
|
| 117 |
+
docs.sort(key=_get_score, reverse=True)
|
| 118 |
+
return docs[:top_k]
|
| 119 |
+
|
| 120 |
+
def _rerank_colbert(self, query: str, docs: List[Any], top_k: int) -> List[Any]:
|
| 121 |
+
from llama_index.schema import NodeWithScore, TextNode
|
| 122 |
+
|
| 123 |
+
# Keep ref to original doc for metadata (isbn, etc.)
|
| 124 |
+
nodes = []
|
| 125 |
+
for d in docs:
|
| 126 |
+
meta = d.metadata if hasattr(d, "metadata") else (d if isinstance(d, dict) else {})
|
| 127 |
+
node = TextNode(text=_get_text(d), metadata={"__original": d})
|
| 128 |
+
nodes.append(NodeWithScore(node=node, score=0.0))
|
| 129 |
+
|
| 130 |
+
reranked = self.model.postprocess_nodes(nodes, query_str=query)
|
| 131 |
+
|
| 132 |
+
result = []
|
| 133 |
+
for nws in reranked[:top_k]:
|
| 134 |
+
orig = getattr(nws.node, "metadata", {}).get("__original")
|
| 135 |
+
if orig is not None:
|
| 136 |
+
_set_score(orig, float(nws.score or 0))
|
| 137 |
+
result.append(orig)
|
| 138 |
else:
|
| 139 |
+
from langchain_core.documents import Document
|
| 140 |
+
doc = Document(page_content=nws.node.text, metadata={"relevance_score": float(nws.score or 0)})
|
| 141 |
+
result.append(doc)
|
| 142 |
+
return result
|
| 143 |
+
|
| 144 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 145 |
reranker = RerankerService()
|
src/core/router.py
CHANGED
|
@@ -90,8 +90,12 @@ class QueryRouter:
|
|
| 90 |
freshness_fallback: bool = False,
|
| 91 |
target_year: Optional[int] = None
|
| 92 |
) -> Dict[str, Any]:
|
| 93 |
-
"""
|
| 94 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
|
| 96 |
base_result = {
|
| 97 |
"temporal": is_temporal,
|
|
@@ -103,10 +107,15 @@ class QueryRouter:
|
|
| 103 |
if any(w.lower() in ROUTER_DETAIL_KEYWORDS for w in words):
|
| 104 |
logger.info("Router (rules): Detail Query -> SMALL_TO_BIG")
|
| 105 |
return {**base_result, "strategy": "small_to_big", "alpha": 0.5, "rerank": False, "k_final": 5}
|
| 106 |
-
|
| 107 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 108 |
return {**base_result, "strategy": "fast", "alpha": 0.5, "rerank": False, "k_final": 5}
|
| 109 |
-
logger.info("Router (rules):
|
| 110 |
return {**base_result, "strategy": "deep", "alpha": 0.5, "rerank": True, "k_final": 10}
|
| 111 |
|
| 112 |
def route(self, query: str) -> Dict[str, Any]:
|
|
|
|
| 90 |
freshness_fallback: bool = False,
|
| 91 |
target_year: Optional[int] = None
|
| 92 |
) -> Dict[str, Any]:
|
| 93 |
+
"""
|
| 94 |
+
Fallback: rule-based routing when classifier not loaded.
|
| 95 |
+
Uses NL keywords (like, similar, recommend...) instead of brittle word-count.
|
| 96 |
+
Book titles (e.g. "War and Peace", "The Lord of the Rings") -> FAST.
|
| 97 |
+
"""
|
| 98 |
+
from src.config import ROUTER_DETAIL_KEYWORDS, ROUTER_NL_KEYWORDS
|
| 99 |
|
| 100 |
base_result = {
|
| 101 |
"temporal": is_temporal,
|
|
|
|
| 107 |
if any(w.lower() in ROUTER_DETAIL_KEYWORDS for w in words):
|
| 108 |
logger.info("Router (rules): Detail Query -> SMALL_TO_BIG")
|
| 109 |
return {**base_result, "strategy": "small_to_big", "alpha": 0.5, "rerank": False, "k_final": 5}
|
| 110 |
+
# NL keywords indicate recommendation intent -> DEEP
|
| 111 |
+
if any(w.lower() in ROUTER_NL_KEYWORDS for w in words):
|
| 112 |
+
logger.info("Router (rules): NL keywords -> DEEP (Temporal=%s, Freshness=%s)", is_temporal, freshness_fallback)
|
| 113 |
+
return {**base_result, "strategy": "deep", "alpha": 0.5, "rerank": True, "k_final": 10}
|
| 114 |
+
# Short query without NL keywords: book title or keyword -> FAST
|
| 115 |
+
if len(words) <= 6:
|
| 116 |
+
logger.info("Router (rules): Keyword/Title -> FAST (Temporal=%s, Freshness=%s)", is_temporal, freshness_fallback)
|
| 117 |
return {**base_result, "strategy": "fast", "alpha": 0.5, "rerank": False, "k_final": 5}
|
| 118 |
+
logger.info("Router (rules): Long query -> DEEP (Temporal=%s, Freshness=%s)", is_temporal, freshness_fallback)
|
| 119 |
return {**base_result, "strategy": "deep", "alpha": 0.5, "rerank": True, "k_final": 10}
|
| 120 |
|
| 121 |
def route(self, query: str) -> Dict[str, Any]:
|
src/main.py
CHANGED
|
@@ -99,6 +99,8 @@ class RecommendationRequest(BaseModel):
|
|
| 99 |
category: str = "All"
|
| 100 |
user_id: Optional[str] = "local"
|
| 101 |
use_agentic: Optional[bool] = False # LangGraph workflow: Router -> Retrieve -> Evaluate -> Web Fallback
|
|
|
|
|
|
|
| 102 |
|
| 103 |
|
| 104 |
class FeatureContribution(BaseModel):
|
|
@@ -187,6 +189,8 @@ async def get_recommendations(request: RecommendationRequest):
|
|
| 187 |
category=request.category,
|
| 188 |
user_id=request.user_id if hasattr(request, 'user_id') else "local",
|
| 189 |
use_agentic=request.use_agentic or False,
|
|
|
|
|
|
|
| 190 |
)
|
| 191 |
return {"recommendations": results}
|
| 192 |
except Exception as e:
|
|
@@ -349,22 +353,58 @@ async def run_benchmark():
|
|
| 349 |
# --- Personalized Recommendation API ---
|
| 350 |
|
| 351 |
@app.get("/api/recommend/personal", response_model=RecommendationResponse)
|
| 352 |
-
def personalized_recommendations(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 353 |
"""
|
| 354 |
Get personalized recommendations for a user.
|
| 355 |
Uses 6-channel recall (ItemCF/UserCF/Swing/SASRec/YoutubeDNN/Popularity) + LGBMRanker.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 356 |
"""
|
| 357 |
-
|
| 358 |
-
|
| 359 |
-
|
| 360 |
-
user_id = "A1ZQ1LUQ9R6JHZ"
|
| 361 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 362 |
# Check initialization
|
| 363 |
if not rec_service:
|
| 364 |
raise HTTPException(status_code=503, detail="Service not ready")
|
| 365 |
-
|
| 366 |
try:
|
| 367 |
-
recs = rec_service.get_recommendations(
|
|
|
|
|
|
|
| 368 |
|
| 369 |
# Enrich with metadata
|
| 370 |
from src.utils import enrich_book_metadata
|
|
@@ -430,6 +470,51 @@ def personalized_recommendations(user_id: str = "local", top_k: int = 10):
|
|
| 430 |
# In production, maybe return fallback popular items instead of error
|
| 431 |
raise HTTPException(status_code=500, detail=str(e))
|
| 432 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 433 |
# Allow local frontend dev origins
|
| 434 |
# Added LAST so it wraps the app outermost (first to process request)
|
| 435 |
app.add_middleware(
|
|
|
|
| 99 |
category: str = "All"
|
| 100 |
user_id: Optional[str] = "local"
|
| 101 |
use_agentic: Optional[bool] = False # LangGraph workflow: Router -> Retrieve -> Evaluate -> Web Fallback
|
| 102 |
+
fast: Optional[bool] = False # Skip rerank for ~150ms latency
|
| 103 |
+
async_rerank: Optional[bool] = False # Return RRF first, rerank in background; next request gets cached
|
| 104 |
|
| 105 |
|
| 106 |
class FeatureContribution(BaseModel):
|
|
|
|
| 189 |
category=request.category,
|
| 190 |
user_id=request.user_id if hasattr(request, 'user_id') else "local",
|
| 191 |
use_agentic=request.use_agentic or False,
|
| 192 |
+
fast=request.fast or False,
|
| 193 |
+
async_rerank=request.async_rerank or False,
|
| 194 |
)
|
| 195 |
return {"recommendations": results}
|
| 196 |
except Exception as e:
|
|
|
|
| 353 |
# --- Personalized Recommendation API ---
|
| 354 |
|
| 355 |
@app.get("/api/recommend/personal", response_model=RecommendationResponse)
|
| 356 |
+
def personalized_recommendations(
|
| 357 |
+
user_id: str = "local",
|
| 358 |
+
top_k: int = 10,
|
| 359 |
+
limit: Optional[int] = None,
|
| 360 |
+
recent_isbns: Optional[str] = None,
|
| 361 |
+
intent_query: Optional[str] = None,
|
| 362 |
+
):
|
| 363 |
"""
|
| 364 |
Get personalized recommendations for a user.
|
| 365 |
Uses 6-channel recall (ItemCF/UserCF/Swing/SASRec/YoutubeDNN/Popularity) + LGBMRanker.
|
| 366 |
+
|
| 367 |
+
P0: recent_isbns — Comma-separated ISBNs from current session (e.g. just-viewed).
|
| 368 |
+
Injected into SASRec for cold-start convergence (1+ clicks).
|
| 369 |
+
P2: intent_query — Zero-shot intent probing when user has no history.
|
| 370 |
+
Probes LLM for categories/keywords, does semantic search, seeds SASRec.
|
| 371 |
"""
|
| 372 |
+
k = limit if limit is not None else top_k
|
| 373 |
+
# Demo logic: Map 'local' to a real user for demonstration (skip when intent_query = cold-start)
|
| 374 |
+
if user_id in ["local", "demo"] and not intent_query:
|
| 375 |
+
user_id = "A1ZQ1LUQ9R6JHZ"
|
| 376 |
+
|
| 377 |
+
# P0: Parse recent_isbns for real-time cold-start
|
| 378 |
+
real_time_seq = None
|
| 379 |
+
if recent_isbns:
|
| 380 |
+
real_time_seq = [x.strip() for x in recent_isbns.split(",") if x.strip()]
|
| 381 |
+
|
| 382 |
+
# P2: Zero-shot intent probing — when no recent_isbns, use query to seed
|
| 383 |
+
if not real_time_seq and intent_query and intent_query.strip():
|
| 384 |
+
from src.core.intent_prober import probe_intent
|
| 385 |
+
intent = probe_intent(intent_query.strip())
|
| 386 |
+
semantic_query = " ".join(
|
| 387 |
+
intent.get("keywords", []) + intent.get("categories", []) + [intent.get("summary", "")]
|
| 388 |
+
).strip()
|
| 389 |
+
if semantic_query and recommender:
|
| 390 |
+
try:
|
| 391 |
+
rag_results = recommender.get_recommendations_sync(
|
| 392 |
+
semantic_query, category="All", tone="All", user_id=user_id
|
| 393 |
+
)
|
| 394 |
+
seed_isbns = [r.get("isbn") for r in (rag_results or [])[:5] if r.get("isbn")]
|
| 395 |
+
if seed_isbns:
|
| 396 |
+
real_time_seq = seed_isbns
|
| 397 |
+
except Exception as e:
|
| 398 |
+
logger.warning(f"Intent-to-seed failed: {e}")
|
| 399 |
+
|
| 400 |
# Check initialization
|
| 401 |
if not rec_service:
|
| 402 |
raise HTTPException(status_code=503, detail="Service not ready")
|
| 403 |
+
|
| 404 |
try:
|
| 405 |
+
recs = rec_service.get_recommendations(
|
| 406 |
+
user_id, top_k=k, real_time_sequence=real_time_seq
|
| 407 |
+
)
|
| 408 |
|
| 409 |
# Enrich with metadata
|
| 410 |
from src.utils import enrich_book_metadata
|
|
|
|
| 470 |
# In production, maybe return fallback popular items instead of error
|
| 471 |
raise HTTPException(status_code=500, detail=str(e))
|
| 472 |
|
| 473 |
+
|
| 474 |
+
@app.get("/api/intent/probe")
|
| 475 |
+
def probe_intent_endpoint(query: str = ""):
|
| 476 |
+
"""
|
| 477 |
+
P2: Zero-shot intent probing for cold-start users.
|
| 478 |
+
Returns inferred categories, emotions, keywords from user's first query.
|
| 479 |
+
"""
|
| 480 |
+
from src.core.intent_prober import probe_intent
|
| 481 |
+
try:
|
| 482 |
+
result = probe_intent(query)
|
| 483 |
+
return result
|
| 484 |
+
except Exception as e:
|
| 485 |
+
logger.error(f"Intent probe failed: {e}")
|
| 486 |
+
raise HTTPException(status_code=500, detail=str(e))
|
| 487 |
+
|
| 488 |
+
|
| 489 |
+
@app.get("/api/onboarding/books")
|
| 490 |
+
def get_onboarding_books(limit: int = 24):
|
| 491 |
+
"""
|
| 492 |
+
P2: Return popular books for new-user onboarding.
|
| 493 |
+
Lets user pick 3–5 to seed preferences (cold-start).
|
| 494 |
+
"""
|
| 495 |
+
if not rec_service:
|
| 496 |
+
raise HTTPException(status_code=503, detail="Service not ready")
|
| 497 |
+
try:
|
| 498 |
+
items = rec_service.get_popular_books(limit)
|
| 499 |
+
from src.utils import enrich_book_metadata
|
| 500 |
+
results = []
|
| 501 |
+
for isbn, meta in items:
|
| 502 |
+
meta = meta or {}
|
| 503 |
+
meta = enrich_book_metadata(meta, str(isbn))
|
| 504 |
+
results.append({
|
| 505 |
+
"isbn": isbn,
|
| 506 |
+
"title": meta.get("title") or f"ISBN: {isbn}",
|
| 507 |
+
"authors": meta.get("authors", "Unknown"),
|
| 508 |
+
"description": meta.get("description", ""),
|
| 509 |
+
"thumbnail": meta.get("thumbnail") or "/content/cover-not-found.jpg",
|
| 510 |
+
"category": meta.get("category", "General"),
|
| 511 |
+
})
|
| 512 |
+
return {"books": results}
|
| 513 |
+
except Exception as e:
|
| 514 |
+
logger.error(f"Error in onboarding books: {e}")
|
| 515 |
+
raise HTTPException(status_code=500, detail=str(e))
|
| 516 |
+
|
| 517 |
+
|
| 518 |
# Allow local frontend dev origins
|
| 519 |
# Added LAST so it wraps the app outermost (first to process request)
|
| 520 |
app.add_middleware(
|
src/recall/fusion.py
CHANGED
|
@@ -20,7 +20,7 @@ DEFAULT_CHANNEL_CONFIG = {
|
|
| 20 |
"usercf": {"enabled": False, "weight": 1.0},
|
| 21 |
"swing": {"enabled": False, "weight": 1.0},
|
| 22 |
"item2vec": {"enabled": False, "weight": 0.8},
|
| 23 |
-
"popularity": {"enabled":
|
| 24 |
}
|
| 25 |
|
| 26 |
|
|
@@ -123,6 +123,10 @@ class RecallFusion:
|
|
| 123 |
self._add_to_candidates(candidates, recs, cfg["popularity"]["weight"])
|
| 124 |
|
| 125 |
sorted_cands = sorted(candidates.items(), key=lambda x: x[1], reverse=True)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
return sorted_cands[:k]
|
| 127 |
|
| 128 |
def _add_to_candidates(self, candidates, recs, weight: float) -> None:
|
|
|
|
| 20 |
"usercf": {"enabled": False, "weight": 1.0},
|
| 21 |
"swing": {"enabled": False, "weight": 1.0},
|
| 22 |
"item2vec": {"enabled": False, "weight": 0.8},
|
| 23 |
+
"popularity": {"enabled": True, "weight": 0.5}, # P0: Cold-start fallback
|
| 24 |
}
|
| 25 |
|
| 26 |
|
|
|
|
| 123 |
self._add_to_candidates(candidates, recs, cfg["popularity"]["weight"])
|
| 124 |
|
| 125 |
sorted_cands = sorted(candidates.items(), key=lambda x: x[1], reverse=True)
|
| 126 |
+
# P0: Cold-start fallback — when all channels return empty, use popularity
|
| 127 |
+
if not sorted_cands:
|
| 128 |
+
pop_recs = self.popularity.recommend(user_id, top_k=k)
|
| 129 |
+
sorted_cands = [(item, s) for item, s in pop_recs]
|
| 130 |
return sorted_cands[:k]
|
| 131 |
|
| 132 |
def _add_to_candidates(self, candidates, recs, weight: float) -> None:
|
src/recommender.py
CHANGED
|
@@ -39,9 +39,11 @@ class BookRecommender:
|
|
| 39 |
tone: str = "All",
|
| 40 |
user_id: str = "local",
|
| 41 |
use_agentic: bool = False,
|
|
|
|
|
|
|
| 42 |
) -> List[Dict[str, Any]]:
|
| 43 |
return await self._orchestrator.get_recommendations(
|
| 44 |
-
query, category, tone, user_id, use_agentic
|
| 45 |
)
|
| 46 |
|
| 47 |
def get_recommendations_sync(
|
|
@@ -51,9 +53,11 @@ class BookRecommender:
|
|
| 51 |
tone: str = "All",
|
| 52 |
user_id: str = "local",
|
| 53 |
use_agentic: bool = False,
|
|
|
|
|
|
|
| 54 |
) -> List[Dict[str, Any]]:
|
| 55 |
return self._orchestrator.get_recommendations_sync(
|
| 56 |
-
query, category, tone, user_id, use_agentic
|
| 57 |
)
|
| 58 |
|
| 59 |
def get_similar_books(
|
|
|
|
| 39 |
tone: str = "All",
|
| 40 |
user_id: str = "local",
|
| 41 |
use_agentic: bool = False,
|
| 42 |
+
fast: bool = False,
|
| 43 |
+
async_rerank: bool = False,
|
| 44 |
) -> List[Dict[str, Any]]:
|
| 45 |
return await self._orchestrator.get_recommendations(
|
| 46 |
+
query, category, tone, user_id, use_agentic, fast, async_rerank
|
| 47 |
)
|
| 48 |
|
| 49 |
def get_recommendations_sync(
|
|
|
|
| 53 |
tone: str = "All",
|
| 54 |
user_id: str = "local",
|
| 55 |
use_agentic: bool = False,
|
| 56 |
+
fast: bool = False,
|
| 57 |
+
async_rerank: bool = False,
|
| 58 |
) -> List[Dict[str, Any]]:
|
| 59 |
return self._orchestrator.get_recommendations_sync(
|
| 60 |
+
query, category, tone, user_id, use_agentic, fast, async_rerank
|
| 61 |
)
|
| 62 |
|
| 63 |
def get_similar_books(
|
src/services/recommend_service.py
CHANGED
|
@@ -155,6 +155,10 @@ class RecommendationService:
|
|
| 155 |
candidates = self.fusion.get_recall_items(
|
| 156 |
user_id, k=200, real_time_seq=real_time_sequence
|
| 157 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 158 |
if not candidates:
|
| 159 |
return []
|
| 160 |
|
|
@@ -267,6 +271,27 @@ class RecommendationService:
|
|
| 267 |
|
| 268 |
return unique_results
|
| 269 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 270 |
if __name__ == "__main__":
|
| 271 |
import logging
|
| 272 |
logger.setLevel(logging.INFO)
|
|
|
|
| 155 |
candidates = self.fusion.get_recall_items(
|
| 156 |
user_id, k=200, real_time_seq=real_time_sequence
|
| 157 |
)
|
| 158 |
+
# P1: Cold-start fallback — when recall returns empty, use popularity
|
| 159 |
+
if not candidates:
|
| 160 |
+
pop_recs = self.fusion.popularity.recommend(user_id, top_k=200)
|
| 161 |
+
candidates = list(pop_recs)
|
| 162 |
if not candidates:
|
| 163 |
return []
|
| 164 |
|
|
|
|
| 271 |
|
| 272 |
return unique_results
|
| 273 |
|
| 274 |
+
def get_popular_books(self, limit: int = 24) -> list:
|
| 275 |
+
"""
|
| 276 |
+
P2: Return popular books for onboarding selection.
|
| 277 |
+
Used when new user has no history — lets them pick 3–5 to seed preferences.
|
| 278 |
+
"""
|
| 279 |
+
self.load_resources()
|
| 280 |
+
recs = self.fusion.popularity.recommend(user_id=None, top_k=limit)
|
| 281 |
+
results = []
|
| 282 |
+
seen_titles = set()
|
| 283 |
+
for isbn, _ in recs:
|
| 284 |
+
meta = self.metadata_store.get_book_metadata(str(isbn))
|
| 285 |
+
title = (meta.get("title") or "").lower().strip()
|
| 286 |
+
if title and title in seen_titles:
|
| 287 |
+
continue
|
| 288 |
+
if title:
|
| 289 |
+
seen_titles.add(title)
|
| 290 |
+
results.append((isbn, meta or {}))
|
| 291 |
+
if len(results) >= limit:
|
| 292 |
+
break
|
| 293 |
+
return results
|
| 294 |
+
|
| 295 |
if __name__ == "__main__":
|
| 296 |
import logging
|
| 297 |
logger.setLevel(logging.INFO)
|
src/vector_db.py
CHANGED
|
@@ -2,7 +2,7 @@ from typing import List, Any
|
|
| 2 |
# Using community version to avoid 'BaseBlobParser' version conflict in langchain-chroma/core
|
| 3 |
from langchain_community.vectorstores import Chroma
|
| 4 |
from langchain_huggingface import HuggingFaceEmbeddings
|
| 5 |
-
from src.config import REVIEW_HIGHLIGHTS_TXT, CHROMA_DB_DIR, EMBEDDING_MODEL
|
| 6 |
from src.utils import setup_logger
|
| 7 |
from src.core.metadata_store import metadata_store
|
| 8 |
from src.core.online_books_store import online_books_store
|
|
@@ -220,8 +220,7 @@ class VectorDB:
|
|
| 220 |
final_results = top_candidates[:k]
|
| 221 |
if rerank:
|
| 222 |
from src.core.reranker import reranker
|
| 223 |
-
|
| 224 |
-
rerank_candidates = top_candidates[:max(k*4, 20)]
|
| 225 |
logger.info(f"Reranking top {len(rerank_candidates)} candidates...")
|
| 226 |
final_results = reranker.rerank(query, rerank_candidates, top_k=k)
|
| 227 |
|
|
|
|
| 2 |
# Using community version to avoid 'BaseBlobParser' version conflict in langchain-chroma/core
|
| 3 |
from langchain_community.vectorstores import Chroma
|
| 4 |
from langchain_huggingface import HuggingFaceEmbeddings
|
| 5 |
+
from src.config import REVIEW_HIGHLIGHTS_TXT, CHROMA_DB_DIR, EMBEDDING_MODEL, RERANK_CANDIDATES_MAX
|
| 6 |
from src.utils import setup_logger
|
| 7 |
from src.core.metadata_store import metadata_store
|
| 8 |
from src.core.online_books_store import online_books_store
|
|
|
|
| 220 |
final_results = top_candidates[:k]
|
| 221 |
if rerank:
|
| 222 |
from src.core.reranker import reranker
|
| 223 |
+
rerank_candidates = top_candidates[:min(len(top_candidates), RERANK_CANDIDATES_MAX)]
|
|
|
|
| 224 |
logger.info(f"Reranking top {len(rerank_candidates)} candidates...")
|
| 225 |
final_results = reranker.rerank(query, rerank_candidates, top_k=k)
|
| 226 |
|
web/src/App.jsx
CHANGED
|
@@ -19,6 +19,7 @@ import Header from "./components/Header";
|
|
| 19 |
import BookDetailModal from "./components/BookDetailModal";
|
| 20 |
import SettingsModal from "./components/SettingsModal";
|
| 21 |
import AddBookModal from "./components/AddBookModal";
|
|
|
|
| 22 |
|
| 23 |
// Pages
|
| 24 |
import GalleryPage from "./pages/GalleryPage";
|
|
@@ -57,6 +58,13 @@ const App = () => {
|
|
| 57 |
return stored === "mock" || !stored ? "ollama" : stored;
|
| 58 |
});
|
| 59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
// --- Add Book Modal State ---
|
| 61 |
const [showAddBook, setShowAddBook] = useState(false);
|
| 62 |
const [googleQuery, setGoogleQuery] = useState("");
|
|
@@ -64,6 +72,14 @@ const App = () => {
|
|
| 64 |
const [isSearching, setIsSearching] = useState(false);
|
| 65 |
const [addingBookId, setAddingBookId] = useState(null);
|
| 66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
// --- Load favorites and stats on startup or user change ---
|
| 68 |
useEffect(() => {
|
| 69 |
setLoading(true);
|
|
@@ -78,11 +94,13 @@ const App = () => {
|
|
| 78 |
reading: 0,
|
| 79 |
finished: 0,
|
| 80 |
})),
|
| 81 |
-
getPersonalizedRecommendations(userId).catch(() => []),
|
| 82 |
]).then(([favs, stats, personalRecs]) => {
|
| 83 |
setMyCollection(favs);
|
| 84 |
setReadingStats(stats);
|
| 85 |
-
|
|
|
|
|
|
|
| 86 |
const mappedRecs = personalRecs.map((r, idx) => ({
|
| 87 |
id: r.isbn,
|
| 88 |
title: r.title,
|
|
@@ -283,6 +301,13 @@ const App = () => {
|
|
| 283 |
};
|
| 284 |
|
| 285 |
const openBook = (book) => {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 286 |
setSelectedBook({
|
| 287 |
...book,
|
| 288 |
aiHighlight: "\u2728 ...",
|
|
@@ -319,8 +344,15 @@ const App = () => {
|
|
| 319 |
setBooks([]);
|
| 320 |
try {
|
| 321 |
let recs;
|
| 322 |
-
|
| 323 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 324 |
} else {
|
| 325 |
recs = await recommend(searchQuery, searchCategory, searchMood, userId);
|
| 326 |
}
|
|
@@ -384,6 +416,44 @@ const App = () => {
|
|
| 384 |
/>
|
| 385 |
)}
|
| 386 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 387 |
{showAddBook && (
|
| 388 |
<AddBookModal
|
| 389 |
onClose={() => setShowAddBook(false)}
|
|
|
|
| 19 |
import BookDetailModal from "./components/BookDetailModal";
|
| 20 |
import SettingsModal from "./components/SettingsModal";
|
| 21 |
import AddBookModal from "./components/AddBookModal";
|
| 22 |
+
import OnboardingModal from "./components/OnboardingModal";
|
| 23 |
|
| 24 |
// Pages
|
| 25 |
import GalleryPage from "./pages/GalleryPage";
|
|
|
|
| 58 |
return stored === "mock" || !stored ? "ollama" : stored;
|
| 59 |
});
|
| 60 |
|
| 61 |
+
// --- P1: Session-level recent ISBNs for cold-start ---
|
| 62 |
+
const [recentIsbns, setRecentIsbns] = useState([]);
|
| 63 |
+
const MAX_RECENT_ISBNS = 10;
|
| 64 |
+
|
| 65 |
+
// --- P2: Onboarding (new user, no collection) ---
|
| 66 |
+
const [showOnboarding, setShowOnboarding] = useState(false);
|
| 67 |
+
|
| 68 |
// --- Add Book Modal State ---
|
| 69 |
const [showAddBook, setShowAddBook] = useState(false);
|
| 70 |
const [googleQuery, setGoogleQuery] = useState("");
|
|
|
|
| 72 |
const [isSearching, setIsSearching] = useState(false);
|
| 73 |
const [addingBookId, setAddingBookId] = useState(null);
|
| 74 |
|
| 75 |
+
// --- P2: Show onboarding when new user (no collection, not completed) ---
|
| 76 |
+
useEffect(() => {
|
| 77 |
+
const completed = localStorage.getItem("onboarding_complete") === "true";
|
| 78 |
+
if (!completed && userId === "local") {
|
| 79 |
+
setShowOnboarding(true);
|
| 80 |
+
}
|
| 81 |
+
}, [userId]);
|
| 82 |
+
|
| 83 |
// --- Load favorites and stats on startup or user change ---
|
| 84 |
useEffect(() => {
|
| 85 |
setLoading(true);
|
|
|
|
| 94 |
reading: 0,
|
| 95 |
finished: 0,
|
| 96 |
})),
|
| 97 |
+
getPersonalizedRecommendations(userId, 20, recentIsbns).catch(() => []),
|
| 98 |
]).then(([favs, stats, personalRecs]) => {
|
| 99 |
setMyCollection(favs);
|
| 100 |
setReadingStats(stats);
|
| 101 |
+
if (favs.length > 0) {
|
| 102 |
+
localStorage.setItem("onboarding_complete", "true");
|
| 103 |
+
}
|
| 104 |
const mappedRecs = personalRecs.map((r, idx) => ({
|
| 105 |
id: r.isbn,
|
| 106 |
title: r.title,
|
|
|
|
| 301 |
};
|
| 302 |
|
| 303 |
const openBook = (book) => {
|
| 304 |
+
// P1: Track session-level recent views for cold-start
|
| 305 |
+
if (book?.isbn) {
|
| 306 |
+
setRecentIsbns((prev) => {
|
| 307 |
+
const next = [book.isbn, ...prev.filter((i) => i !== book.isbn)].slice(0, MAX_RECENT_ISBNS);
|
| 308 |
+
return next;
|
| 309 |
+
});
|
| 310 |
+
}
|
| 311 |
setSelectedBook({
|
| 312 |
...book,
|
| 313 |
aiHighlight: "\u2728 ...",
|
|
|
|
| 344 |
setBooks([]);
|
| 345 |
try {
|
| 346 |
let recs;
|
| 347 |
+
// P2: Cold-start with intent — when no collection and user typed a mood, use intent-seeded personal recs
|
| 348 |
+
const useIntentSeed = myCollection.length === 0 && searchQuery.trim();
|
| 349 |
+
if (!searchQuery || useIntentSeed) {
|
| 350 |
+
recs = await getPersonalizedRecommendations(
|
| 351 |
+
userId,
|
| 352 |
+
20,
|
| 353 |
+
recentIsbns,
|
| 354 |
+
useIntentSeed ? searchQuery : null
|
| 355 |
+
);
|
| 356 |
} else {
|
| 357 |
recs = await recommend(searchQuery, searchCategory, searchMood, userId);
|
| 358 |
}
|
|
|
|
| 416 |
/>
|
| 417 |
)}
|
| 418 |
|
| 419 |
+
{showOnboarding && (
|
| 420 |
+
<OnboardingModal
|
| 421 |
+
onComplete={async () => {
|
| 422 |
+
setShowOnboarding(false);
|
| 423 |
+
const [favs, stats, personalRecs] = await Promise.all([
|
| 424 |
+
getFavorites(userId).catch(() => []),
|
| 425 |
+
getUserStats(userId).catch(() => ({ total: 0, want_to_read: 0, reading: 0, finished: 0 })),
|
| 426 |
+
getPersonalizedRecommendations(userId, 20, recentIsbns).catch(() => []),
|
| 427 |
+
]);
|
| 428 |
+
setMyCollection(favs);
|
| 429 |
+
setReadingStats(stats);
|
| 430 |
+
const mapped = (personalRecs || []).map((r, idx) => ({
|
| 431 |
+
id: r.isbn,
|
| 432 |
+
title: r.title,
|
| 433 |
+
author: r.authors,
|
| 434 |
+
category: r.category || "General",
|
| 435 |
+
mood: r.emotions && Object.keys(r.emotions).length > 0
|
| 436 |
+
? Object.entries(r.emotions).reduce((a, b) => (a[1] > b[1] ? a : b))[0]
|
| 437 |
+
: "Literary",
|
| 438 |
+
rank: idx + 1,
|
| 439 |
+
rating: r.average_rating || 0,
|
| 440 |
+
tags: r.tags || [],
|
| 441 |
+
review_highlights: r.review_highlights || [],
|
| 442 |
+
desc: r.description,
|
| 443 |
+
img: r.thumbnail,
|
| 444 |
+
isbn: r.isbn,
|
| 445 |
+
emotions: r.emotions || {},
|
| 446 |
+
explanations: r.explanations || [],
|
| 447 |
+
aiHighlight: "\u2014",
|
| 448 |
+
suggestedQuestions: ["Why was this recommended?", "Similar to what I've read?", "What's the core highlight?"],
|
| 449 |
+
}));
|
| 450 |
+
setBooks(mapped);
|
| 451 |
+
}}
|
| 452 |
+
onAddFavorite={(isbn) => addFavorite(isbn, userId)}
|
| 453 |
+
onSkip={() => setShowOnboarding(false)}
|
| 454 |
+
/>
|
| 455 |
+
)}
|
| 456 |
+
|
| 457 |
{showAddBook && (
|
| 458 |
<AddBookModal
|
| 459 |
onClose={() => setShowAddBook(false)}
|
web/src/api.js
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
const API_URL = import.meta.env.VITE_API_URL || (import.meta.env.PROD ? "" : "http://127.0.0.1:6006");
|
| 2 |
|
| 3 |
-
export async function recommend(query, category = "All", tone = "All", user_id = "local", use_agentic = false) {
|
| 4 |
-
const body = { query, category, tone, user_id, use_agentic };
|
| 5 |
const resp = await fetch(`${API_URL}/recommend`, {
|
| 6 |
method: "POST",
|
| 7 |
headers: { "Content-Type": "application/json" },
|
|
@@ -12,9 +12,23 @@ export async function recommend(query, category = "All", tone = "All", user_id =
|
|
| 12 |
return data.recommendations || [];
|
| 13 |
}
|
| 14 |
|
| 15 |
-
export async function
|
| 16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
const params = new URLSearchParams({ user_id, limit: limit.toString() });
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
const resp = await fetch(`${API_URL}/api/recommend/personal?${params.toString()}`);
|
| 19 |
if (!resp.ok) throw new Error(await resp.text());
|
| 20 |
const data = await resp.json();
|
|
|
|
| 1 |
const API_URL = import.meta.env.VITE_API_URL || (import.meta.env.PROD ? "" : "http://127.0.0.1:6006");
|
| 2 |
|
| 3 |
+
export async function recommend(query, category = "All", tone = "All", user_id = "local", use_agentic = false, fast = false, async_rerank = false) {
|
| 4 |
+
const body = { query, category, tone, user_id, use_agentic, fast, async_rerank };
|
| 5 |
const resp = await fetch(`${API_URL}/recommend`, {
|
| 6 |
method: "POST",
|
| 7 |
headers: { "Content-Type": "application/json" },
|
|
|
|
| 12 |
return data.recommendations || [];
|
| 13 |
}
|
| 14 |
|
| 15 |
+
export async function getOnboardingBooks(limit = 24) {
|
| 16 |
+
const resp = await fetch(`${API_URL}/api/onboarding/books?limit=${limit}`);
|
| 17 |
+
if (!resp.ok) throw new Error(await resp.text());
|
| 18 |
+
const data = await resp.json();
|
| 19 |
+
return data.books || [];
|
| 20 |
+
}
|
| 21 |
+
|
| 22 |
+
export async function getPersonalizedRecommendations(user_id = "local", limit = 20, recent_isbns = null, intent_query = null) {
|
| 23 |
+
// P1: recent_isbns — session-level ISBNs for cold-start (1+ clicks)
|
| 24 |
+
// P2: intent_query — zero-shot intent probing when user has no history
|
| 25 |
const params = new URLSearchParams({ user_id, limit: limit.toString() });
|
| 26 |
+
if (recent_isbns && Array.isArray(recent_isbns) && recent_isbns.length > 0) {
|
| 27 |
+
params.set("recent_isbns", recent_isbns.join(","));
|
| 28 |
+
}
|
| 29 |
+
if (intent_query && typeof intent_query === "string" && intent_query.trim()) {
|
| 30 |
+
params.set("intent_query", intent_query.trim());
|
| 31 |
+
}
|
| 32 |
const resp = await fetch(`${API_URL}/api/recommend/personal?${params.toString()}`);
|
| 33 |
if (!resp.ok) throw new Error(await resp.text());
|
| 34 |
const data = await resp.json();
|
web/src/components/OnboardingModal.jsx
ADDED
|
@@ -0,0 +1,137 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/**
|
| 2 |
+
* P2: New-user onboarding — pick 3–5 books to seed preferences.
|
| 3 |
+
* Shown when myCollection is empty and onboarding not completed.
|
| 4 |
+
*/
|
| 5 |
+
import React, { useState, useEffect } from "react";
|
| 6 |
+
import { getOnboardingBooks } from "../api";
|
| 7 |
+
|
| 8 |
+
const PLACEHOLDER_IMG = "/content/cover-not-found.jpg";
|
| 9 |
+
const MIN_SELECT = 3;
|
| 10 |
+
const MAX_SELECT = 5;
|
| 11 |
+
|
| 12 |
+
const OnboardingModal = ({ onComplete, onAddFavorite, onSkip }) => {
|
| 13 |
+
const [books, setBooks] = useState([]);
|
| 14 |
+
const [selected, setSelected] = useState(new Set());
|
| 15 |
+
const [loading, setLoading] = useState(true);
|
| 16 |
+
const [error, setError] = useState("");
|
| 17 |
+
|
| 18 |
+
useEffect(() => {
|
| 19 |
+
getOnboardingBooks(24)
|
| 20 |
+
.then(setBooks)
|
| 21 |
+
.catch((e) => setError(e.message))
|
| 22 |
+
.finally(() => setLoading(false));
|
| 23 |
+
}, []);
|
| 24 |
+
|
| 25 |
+
const toggle = (isbn) => {
|
| 26 |
+
setSelected((prev) => {
|
| 27 |
+
const next = new Set(prev);
|
| 28 |
+
if (next.has(isbn)) {
|
| 29 |
+
next.delete(isbn);
|
| 30 |
+
} else if (next.size < MAX_SELECT) {
|
| 31 |
+
next.add(isbn);
|
| 32 |
+
}
|
| 33 |
+
return next;
|
| 34 |
+
});
|
| 35 |
+
};
|
| 36 |
+
|
| 37 |
+
const handleComplete = async () => {
|
| 38 |
+
if (selected.size < MIN_SELECT) return;
|
| 39 |
+
try {
|
| 40 |
+
for (const isbn of selected) {
|
| 41 |
+
await onAddFavorite(isbn);
|
| 42 |
+
}
|
| 43 |
+
localStorage.setItem("onboarding_complete", "true");
|
| 44 |
+
onComplete();
|
| 45 |
+
} catch (e) {
|
| 46 |
+
setError(e.message);
|
| 47 |
+
}
|
| 48 |
+
};
|
| 49 |
+
|
| 50 |
+
const canComplete = selected.size >= MIN_SELECT;
|
| 51 |
+
|
| 52 |
+
return (
|
| 53 |
+
<div className="fixed inset-0 z-50 flex items-center justify-center bg-black/50 p-4">
|
| 54 |
+
<div className="bg-white max-w-3xl w-full max-h-[90vh] overflow-hidden shadow-xl">
|
| 55 |
+
<div className="p-6 border-b border-[#eee]">
|
| 56 |
+
<h2 className="text-xl font-bold text-[#333]">Welcome — Pick Your Favorites</h2>
|
| 57 |
+
<p className="text-sm text-gray-500 mt-1">
|
| 58 |
+
Select 3–5 books you like to get personalized recommendations.
|
| 59 |
+
</p>
|
| 60 |
+
</div>
|
| 61 |
+
<div className="p-6 overflow-y-auto max-h-[50vh]">
|
| 62 |
+
{loading && (
|
| 63 |
+
<div className="text-center text-gray-400 py-8">Loading popular books...</div>
|
| 64 |
+
)}
|
| 65 |
+
{error && (
|
| 66 |
+
<div className="text-center text-red-500 py-4 text-sm">{error}</div>
|
| 67 |
+
)}
|
| 68 |
+
{!loading && !error && (
|
| 69 |
+
<div className="grid grid-cols-3 md:grid-cols-4 gap-4">
|
| 70 |
+
{books.map((book) => {
|
| 71 |
+
const isSelected = selected.has(book.isbn);
|
| 72 |
+
return (
|
| 73 |
+
<button
|
| 74 |
+
key={book.isbn}
|
| 75 |
+
type="button"
|
| 76 |
+
onClick={() => toggle(book.isbn)}
|
| 77 |
+
className={`text-left border-2 transition-all p-2 ${
|
| 78 |
+
isSelected ? "border-[#b392ac] bg-[#faf5f7]" : "border-[#eee] hover:border-[#ddd]"
|
| 79 |
+
}`}
|
| 80 |
+
>
|
| 81 |
+
<div className="aspect-[3/4] bg-gray-100 mb-2 overflow-hidden">
|
| 82 |
+
<img
|
| 83 |
+
src={book.thumbnail || PLACEHOLDER_IMG}
|
| 84 |
+
alt={book.title}
|
| 85 |
+
className="w-full h-full object-cover"
|
| 86 |
+
onError={(e) => {
|
| 87 |
+
e.target.onerror = null;
|
| 88 |
+
e.target.src = PLACEHOLDER_IMG;
|
| 89 |
+
}}
|
| 90 |
+
/>
|
| 91 |
+
</div>
|
| 92 |
+
<p className="text-[10px] font-bold text-[#555] truncate" title={book.title}>
|
| 93 |
+
{book.title}
|
| 94 |
+
</p>
|
| 95 |
+
{isSelected && (
|
| 96 |
+
<span className="text-[10px] text-[#b392ac] font-bold">✓ Selected</span>
|
| 97 |
+
)}
|
| 98 |
+
</button>
|
| 99 |
+
);
|
| 100 |
+
})}
|
| 101 |
+
</div>
|
| 102 |
+
)}
|
| 103 |
+
</div>
|
| 104 |
+
<div className="p-6 border-t border-[#eee] flex justify-between items-center">
|
| 105 |
+
<span className="text-xs text-gray-500">
|
| 106 |
+
{selected.size} selected (min {MIN_SELECT}, max {MAX_SELECT})
|
| 107 |
+
</span>
|
| 108 |
+
<div className="flex gap-2">
|
| 109 |
+
{onSkip && (
|
| 110 |
+
<button
|
| 111 |
+
type="button"
|
| 112 |
+
onClick={() => {
|
| 113 |
+
localStorage.setItem("onboarding_complete", "true");
|
| 114 |
+
onSkip();
|
| 115 |
+
}}
|
| 116 |
+
className="px-4 py-2 text-sm text-gray-500 hover:text-gray-700"
|
| 117 |
+
>
|
| 118 |
+
Skip for now
|
| 119 |
+
</button>
|
| 120 |
+
)}
|
| 121 |
+
<button
|
| 122 |
+
onClick={handleComplete}
|
| 123 |
+
disabled={!canComplete}
|
| 124 |
+
className={`px-6 py-2 text-sm font-bold ${
|
| 125 |
+
canComplete ? "bg-[#b392ac] text-white" : "bg-gray-200 text-gray-400 cursor-not-allowed"
|
| 126 |
+
}`}
|
| 127 |
+
>
|
| 128 |
+
Start Exploring
|
| 129 |
+
</button>
|
| 130 |
+
</div>
|
| 131 |
+
</div>
|
| 132 |
+
</div>
|
| 133 |
+
</div>
|
| 134 |
+
);
|
| 135 |
+
};
|
| 136 |
+
|
| 137 |
+
export default OnboardingModal;
|