File size: 6,476 Bytes
c7256ee d73bcb6 d755b80 d73bcb6 40d1258 d73bcb6 40d1258 d73bcb6 d755b80 d73bcb6 d755b80 d73bcb6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | ---
title: NLP RAG
emoji: 🏢
colorFrom: gray
colorTo: green
sdk: docker
pinned: false
license: mit
short_description: NLP Spring 2026 Project 1
---
RAG-based Question-Answering System for Cognitive Behavior Therapy (CBT)
## Overview
This project is a Retrieval-Augmented Generation (RAG) system built to answer CBT-related questions using grounded evidence from source manuals instead of relying on generic model knowledge. It combines hybrid retrieval, re-ranking, and strict response constraints so the assistant stays accurate, clinically focused, and less prone to hallucinations.
## Index
- [Overview](#overview)
- [Live Demo and Repository](#live-demo-and-repository)
- [Live Web Interface](#live-web-interface)
- [Tech Stack](#tech-stack)
- [System Architecture](#system-architecture)
- [Key Features](#key-features)
- [Installation and Setup](#installation-and-setup)
- [Configuration](#configuration)
- [Testing](#testing)
- [Running the Main Pipeline](#running-the-main-pipeline)
- [Contributors](#contributors)
## Live Demo and Repository
- Live Demo: https://rag-as-3-nlp.vercel.app/
- Code Repository: https://github.com/ramailkk/RAG-AS3-NLP
## Live Web Interface
<img width="1895" height="986" alt="image" src="https://github.com/user-attachments/assets/95eeba40-10c6-4137-af1a-5d83cc1b3a3c" />
<img width="1908" height="990" alt="image" src="https://github.com/user-attachments/assets/d8746422-900d-4101-9d8a-287a0eb5a22f" />
## Tech Stack
- Frontend: Vercel (Node.js/React)
- Backend: Hugging Face Spaces (FastAPI)
- Vector Database: Pinecone
- Embeddings: jinaai/jina-embeddings-v2-small-en
- LLMs: Llama-3-8B (Primary), TinyAya, Mistral-7B, Qwen-2.5
- Re-ranking: Voyage AI (rerank-2.5) and Cross-Encoder (ms-marco-MiniLM-L-6-v2)
- Retrieval: Hybrid Search (Dense + BM25 Sparse)
## System Architecture
The system operates through a high-precision multi-stage pipeline to ensure clinical safety and data grounding:
- Hybrid Retrieval: Simultaneously queries dense vector indices for semantic intent and sparse BM25 indices for specific clinical terminology such as Socratic Questioning or Cognitive Distortions.
- Fusion & Re-ranking: Uses Reciprocal Rank Fusion (RRF) to merge results, followed by a Cross-Encoder stage to re-evaluate the relevance of chunks against the user query.
- Diversity Filtering (MMR): Implements Maximal Marginal Relevance to ensure the context provided to the LLM is not redundant.
- Prompt Engineering: Employs a specialized persona that acts as an empathetic CBT therapist with strict grounding constraints to prevent the use of outside knowledge.
- Automated Evaluation: An LLM-as-a-Judge framework calculates:
- Faithfulness: Verifying claims against the source document.
- Relevancy: Ensuring the answer directly addresses the user's query.
## Key Features
- Clinical Domain Focus: Optimized for high-density information found in mental health manuals.
- Zero Tolerance for Hallucinations: Includes a fallback protocol to state when information is missing rather than inventing therapeutic advice.
- Advanced Chunking: Uses sentence-level and recursive character splitting to preserve the logical flow of therapeutic guidelines and patient transcripts.
- Multi-Model Support: Tested across multiple LLMs to find the best balance between latency and grounding.
## Installation and Setup
### Backend Setup
The backend handles document processing, Pinecone vector operations, and the hybrid retrieval logic.
1. Initialize Virtual Environment:
```bash
python -m venv .venv
# Windows
source .venv/Scripts/activate
# Linux/Mac
source .venv/bin/activate
```
2. Install Dependencies:
```bash
pip install -r requirements.txt
```
3. Launch API Server:
```bash
uvicorn backend.api:app --reload --host 0.0.0.0 --port 8000
```
### Frontend Setup
The frontend provides the interactive chat interface and real-time evaluation scores.
1. Navigate and Install:
```bash
cd frontend
npm install
```
2. Start Development Server:
```bash
npm run dev
```
## Configuration
To replicate the system, ensure your environment variables contain valid API keys for:
- Pinecone for vector storage
- OpenRouter or Hugging Face Inference API for LLM access
- Voyage AI for re-ranking
## Testing
Run `test.py` to benchmark the chunking strategies and retrieval configurations, then generate a complete Markdown report of the results.
```bash
python test.py
```
This script evaluates multiple test queries across the configured chunking techniques and retrieval strategies, then writes the full output to `retrieval_report.md`. Use that report to choose the best chunking strategy and retrieval configuration.
### Key variables you can change in `test.py`
- `test_queries`: the questions used for benchmarking.
- `CHUNKING_TECHNIQUES_FILTERED`: the chunking strategies included in the report.
- `RETRIEVAL_STRATEGIES`: the retrieval modes and MMR settings being compared.
- `index_name`: the Pinecone index that stores the chunked data.
- `top_k` and `final_k`: how many candidates are retrieved and how many are kept in the final context.
## Running the Main Pipeline
After testing, run `main.py` to reproduce the main experiment with the selected configuration and evaluate faithfulness and relevancy across the model set. This script is part of the reproducibility workflow, since changing its configuration lets you rerun the same evaluation under different chunking, retrieval, and model settings.
```bash
python main.py
```
This step runs the end-to-end comparison flow for all models, measures faithfulness and relevancy for each one, and writes the detailed findings to `rag_ablation_findings.md`.
### Key variables you can change in `main.py`
- `CHUNKING_TECHNIQUES` or the technique filter used in the script: controls which chunking methods are evaluated.
- `test_queries`: the query set used for the ablation study.
- `MODEL_MAP`: the model lineup being compared.
- `retrieval_strategy`: the retrieval mode, MMR setting, and label for each run.
- `top_k` and `final_k`: candidate retrieval depth and final context size.
- `temperature` in `cfg.gen`: generation randomness for the model outputs.
- `output_file`: the markdown report written by the run, usually `rag_ablation_findings.md`.
## Contributors
- Ramail Khan ([ramailkk](https://github.com/ramailkk))
- Qamar Raza ([Qar-Raz](https://github.com/Qar-Raz))
- Muddasir Javed ([bsparx](https://github.com/bsparx))
|