Paper_Espresso / README.md
Elfsong's picture
Update README.md
b7236bc verified
metadata
title: Paper Espresso
emoji: ☕️
colorFrom: pink
colorTo: blue
sdk: docker
app_port: 8501
tags:
  - streamlit
pinned: false
short_description: Paper Espresso

Paper Espresso

An LLM-powered system that collects, summarizes, and analyzes AI research papers from HuggingFace Daily Papers.

Paper Link: Paper Espresso: From Paper Overload to Research Insight

Features

  • Bilingual Summarization — Gemini-generated structured analysis (TL;DR, strengths, limitations, topics, keywords) in English and Chinese
  • Multi-Granularity Trending — Daily, weekly, and monthly trend analysis with topic clustering
  • Interactive UI — Streamlit web app with topic filtering, language toggle, and paper detail views
  • HuggingFace Hub Storage — All data persisted to public datasets for reproducibility

Quick Start

Prerequisites

# Clone and install
git clone https://github.com/Elfsong/Daily_Paper_Reader.git
cd Daily_Paper_Reader
uv sync

Create a .env file in the project root:

GEMINI_API_KEY=your_gemini_api_key
HF_TOKEN=your_huggingface_token

Web App

uv run streamlit run src/streamlit_app.py

CLI: Daily Paper Retriever

src/daily_retrieve.py is a standalone CLI tool for batch collecting and summarizing papers.

Basic Usage

# Collect yesterday's papers
uv run python src/daily_retrieve.py

# Collect a specific date
uv run python src/daily_retrieve.py --date 2026-03-25

# Collect a date range
uv run python src/daily_retrieve.py --date 2026-03-01 --end 2026-03-31

# Parallel collection (16 workers)
uv run python src/daily_retrieve.py --date 2026-01-01 --end 2026-03-31 --workers 16

# Skip pushing to HuggingFace
uv run python src/daily_retrieve.py --date 2026-03-25 --no-push

Options

Flag Description Default
--date DATE Start date (YYYY-MM-DD) Yesterday
--end DATE End date, inclusive (for range) Same as --date
--workers N Parallel workers for date range 1
--no-push Skip pushing to HuggingFace False

Pipeline

For each date, the tool runs:

  1. Check HF — Skip if papers + trending already exist on HuggingFace
  2. Fetch — Get paper list from HuggingFace Daily Papers API
  3. Cache Merge — Load existing summaries from local JSON and HF dataset
  4. Summarize — Call Gemini for papers without summaries (with PDF grounding)
  5. Trending — Generate daily trend analysis via Gemini (if not on HF)
  6. Push — Upload papers and trending to HuggingFace Hub

Papers with transient errors (e.g., missing API key) are automatically retried on subsequent runs.

Progress Display

Multi-date runs show a live progress dashboard with per-date progress bars, elapsed time, and API cost tracking:

  📰 Daily Paper Retriever   [━━━━━━━━━━━━━────────────]  14/30 days   ⏱  03:21   💰 $0.1842 (42 calls, 98,201 tok)
  ──────────────────────────────────────────────────────────────────────────────────────────────
    2026-03-01  [━━━━━━━━━━━━━━━━━━━━━━━━━] 100% (22/22)  ✓ done
    2026-03-02  [━━━━━━━━━━━━━━━━━━━━━━━━━] 100% (41/41)  ✓ synced
    2026-03-03  [━━━━━━━━━━━━━━━━━━━━━━━━━] 100% (43/43)  ✓ all cached
    2026-03-04  [━━━━━━━━━━━━━━───────────]  56% (12/21)  Attention Is All You Need...
    2026-03-05  [·························]             waiting

Data