--- title: PhilVerify API emoji: 🔍 colorFrom: red colorTo: blue sdk: docker app_port: 7860 pinned: false ---

Multimodal fake news detection for Philippine social media.

--- ## ✨ Features - **🎤 Multimodal Detection** — Verify raw text, news URLs, images, and video/audio - **🖼️ Image OCR** — Extract and analyze text from screenshots and images (Tesseract fil+eng) - **🎬 Video Frame OCR** — Extract on-screen text from video frames alongside Whisper speech transcription - **🔊 Speech Transcription** — Transcribe audio/video content using OpenAI Whisper - **🇵🇭 Language-Aware** — Seamlessly handles Tagalog, English, and Taglish content - **🧠 Advanced NLP Pipeline** — Real-time entity recognition, sentiment/emotion analysis, and clickbait detection - **⚖️ Two-Layer Scoring** — Combines ML classification (TF-IDF) with NewsAPI evidence retrieval - **🛡️ PH-Domain Verification** — Integrated database of Philippine news domain credibility tiers --- ## 🚀 Deployment | Service | Platform | URL | |---------|----------|-----| | **Frontend** | Firebase Hosting | https://philverify.web.app | | **Backend API** | Hugging Face Spaces (Docker) | https://semiautomat1c-philverify-api.hf.space | | **API Docs** | Swagger UI (auto-generated) | https://semiautomat1c-philverify-api.hf.space/docs | --- ## 🖥️ Local Development ### Prerequisites 1. **Python 3.12+** 2. **Tesseract OCR** — `brew install tesseract tesseract-lang` 3. **ffmpeg** — `brew install ffmpeg` (required for video frame extraction) 4. **Node.js 18+** (for frontend) ### Installation ```bash # Clone the repository git clone https://github.com/SemiAutomat1c/philverify.git cd philverify # Set up backend python3 -m venv venv source venv/bin/activate pip install -r requirements.txt # Set up frontend cd frontend npm install ``` ### Run ```bash # Backend (from project root, with venv active) uvicorn main:app --reload --port 8000 # Frontend (in a separate terminal) cd frontend npm run dev ``` The frontend dev server proxies `/api` requests to `http://localhost:8000` automatically. ### Environment Variables Copy `.env.example` to `.env` and fill in your keys: ``` NEWS_API_KEY=your_newsapi_key FIREBASE_PROJECT_ID=your_project_id ``` For frontend production builds, set `VITE_API_BASE_URL` in `frontend/.env.production`: ``` VITE_API_BASE_URL=https://your-hf-space.hf.space/api ``` --- ## 🛠️ Tech Stack | Component | Technology | |-----------|------------| | **Core Backend** | Python 3.12, FastAPI, Pydantic v2 | | **NLP Engine** | spaCy, HuggingFace Transformers, langdetect | | **ML Classification** | scikit-learn (TF-IDF + Logistic Regression) | | **OCR** | Tesseract (fil+eng), pytesseract, Pillow | | **ASR** | OpenAI Whisper (base model) | | **Video Processing** | ffmpeg (frame extraction), asyncio parallel pipeline | | **Frontend** | React 18, TailwindCSS, Chart.js, Vite 7 | | **Backend Hosting** | Hugging Face Spaces (Docker SDK, port 7860) | | **Frontend Hosting** | Firebase Hosting | --- ## 📁 Project Structure ``` PhilVerify/ ├── main.py # FastAPI app entry point + health endpoints ├── config.py # Settings (pydantic-settings) ├── requirements.txt ├── Dockerfile # Docker image for HF Spaces (port 7860) ├── domain_credibility.json # PH news domain credibility tier database │ ├── api/ │ ├── schemas.py # Pydantic request/response models │ └── routes/ │ ├── verify.py # POST /api/verify — handles text/url/image/video │ ├── history.py # GET /api/history │ └── trends.py # GET /api/trends │ ├── nlp/ # NLP preprocessing pipeline │ ├── preprocessor.py # Clean, tokenize, remove stopwords (EN+TL) │ ├── language_detector.py # Tagalog / English / Taglish detection │ ├── ner.py # Named entity recognition + PH entity hints │ ├── sentiment.py # Sentiment + emotion analysis │ ├── clickbait.py # Clickbait pattern detection │ └── claim_extractor.py # Extract falsifiable claim for evidence search │ ├── ml/ │ └── tfidf_classifier.py # Layer 1 — TF-IDF baseline classifier │ ├── evidence/ │ └── news_fetcher.py # Layer 2 — NewsAPI + cosine similarity │ ├── scoring/ │ └── engine.py # Orchestrates full pipeline + final score │ ├── inputs/ │ ├── url_scraper.py # BeautifulSoup article extractor │ ├── ocr.py # Tesseract OCR for images │ ├── asr.py # Whisper ASR + combined video transcription │ └── video_ocr.py # ffmpeg frame extraction + Tesseract OCR for video │ ├── frontend/ # React + Vite frontend │ ├── src/ │ │ ├── pages/ │ │ │ └── VerifyPage.jsx # Main fact-check UI (tabs, results, chips) │ │ └── api.js # API client (supports VITE_API_BASE_URL) │ └── .env.production # Production API base URL │ └── tests/ └── test_philverify.py # Unit + integration tests ``` --- ## 📅 Roadmap - [x] Phase 1 — FastAPI backend skeleton - [x] Phase 2 — NLP preprocessing pipeline - [x] Phase 3 — TF-IDF baseline classifier - [x] Phase 4 — NewsAPI evidence retrieval - [x] Phase 5 — React web dashboard with multimodal input - [x] Phase 6 — Deploy to Hugging Face Spaces (backend) + Firebase (frontend) - [x] Phase 7 — Video frame OCR (ffmpeg + Tesseract alongside Whisper ASR) - [ ] Phase 8 — Scoring engine refinement (stance detection) - [ ] Phase 9 — Chrome Extension (Manifest V3) - [ ] Phase 10 — Fine-tune XLM-RoBERTa / TLUnified-RoBERTa --- ## 🤝 Contributing Contributions welcome! Please feel free to submit a Pull Request. ---

⚠️ Disclaimer
This tool is meant for research and educational purposes. Use responsibly and ethically when verifying information on social media.

## 📝 License MIT