--- title: Paper Espresso emoji: ☕️ colorFrom: pink colorTo: blue sdk: docker app_port: 8501 tags: - streamlit pinned: false short_description: Paper Espresso --- # Paper Espresso An LLM-powered system that collects, summarizes, and analyzes AI research papers from [HuggingFace Daily Papers](https://huggingface.co/papers). Paper Link: [Paper Espresso: From Paper Overload to Research Insight](https://arxiv.org/abs/2604.04562) ## Features - **Bilingual Summarization** — Gemini-generated structured analysis (TL;DR, strengths, limitations, topics, keywords) in English and Chinese - **Multi-Granularity Trending** — Daily, weekly, and monthly trend analysis with topic clustering - **Interactive UI** — Streamlit web app with topic filtering, language toggle, and paper detail views - **HuggingFace Hub Storage** — All data persisted to public datasets for reproducibility ## Quick Start ### Prerequisites ```bash # Clone and install git clone https://github.com/Elfsong/Daily_Paper_Reader.git cd Daily_Paper_Reader uv sync ``` Create a `.env` file in the project root: ``` GEMINI_API_KEY=your_gemini_api_key HF_TOKEN=your_huggingface_token ``` ### Web App ```bash uv run streamlit run src/streamlit_app.py ``` ### CLI: Daily Paper Retriever `src/daily_retrieve.py` is a standalone CLI tool for batch collecting and summarizing papers. #### Basic Usage ```bash # Collect yesterday's papers uv run python src/daily_retrieve.py # Collect a specific date uv run python src/daily_retrieve.py --date 2026-03-25 # Collect a date range uv run python src/daily_retrieve.py --date 2026-03-01 --end 2026-03-31 # Parallel collection (16 workers) uv run python src/daily_retrieve.py --date 2026-01-01 --end 2026-03-31 --workers 16 # Skip pushing to HuggingFace uv run python src/daily_retrieve.py --date 2026-03-25 --no-push ``` #### Options | Flag | Description | Default | |------|-------------|---------| | `--date DATE` | Start date (YYYY-MM-DD) | Yesterday | | `--end DATE` | End date, inclusive (for range) | Same as `--date` | | `--workers N` | Parallel workers for date range | 1 | | `--no-push` | Skip pushing to HuggingFace | False | #### Pipeline For each date, the tool runs: 1. **Check HF** — Skip if papers + trending already exist on HuggingFace 2. **Fetch** — Get paper list from HuggingFace Daily Papers API 3. **Cache Merge** — Load existing summaries from local JSON and HF dataset 4. **Summarize** — Call Gemini for papers without summaries (with PDF grounding) 5. **Trending** — Generate daily trend analysis via Gemini (if not on HF) 6. **Push** — Upload papers and trending to HuggingFace Hub Papers with transient errors (e.g., missing API key) are automatically retried on subsequent runs. #### Progress Display Multi-date runs show a live progress dashboard with per-date progress bars, elapsed time, and API cost tracking: ``` 📰 Daily Paper Retriever [━━━━━━━━━━━━━────────────] 14/30 days ⏱ 03:21 💰 $0.1842 (42 calls, 98,201 tok) ────────────────────────────────────────────────────────────────────────────────────────────── 2026-03-01 [━━━━━━━━━━━━━━━━━━━━━━━━━] 100% (22/22) ✓ done 2026-03-02 [━━━━━━━━━━━━━━━━━━━━━━━━━] 100% (41/41) ✓ synced 2026-03-03 [━━━━━━━━━━━━━━━━━━━━━━━━━] 100% (43/43) ✓ all cached 2026-03-04 [━━━━━━━━━━━━━━───────────] 56% (12/21) Attention Is All You Need... 2026-03-05 [·························] waiting ``` ## Data - **Paper summaries**: [`Elfsong/hf_paper_summary`](https://huggingface.co/datasets/Elfsong/hf_paper_summary) - **Trending analyses**: [`Elfsong/hf_paper_trending`](https://huggingface.co/datasets/Elfsong/hf_paper_trending)