---
title: Paper Espresso
emoji: ☕️
colorFrom: pink
colorTo: blue
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Paper Espresso
---

# Paper Espresso

An LLM-powered system that collects, summarizes, and analyzes AI research papers from [HuggingFace Daily Papers](https://huggingface.co/papers).

Paper Link: [Paper Espresso: From Paper Overload to Research Insight](https://arxiv.org/abs/2604.04562)

## Features

- **Bilingual Summarization** — Gemini-generated structured analysis (TL;DR, strengths, limitations, topics, keywords) in English and Chinese
- **Multi-Granularity Trending** — Daily, weekly, and monthly trend analysis with topic clustering
- **Interactive UI** — Streamlit web app with topic filtering, language toggle, and paper detail views
- **HuggingFace Hub Storage** — All data persisted to public datasets for reproducibility

## Quick Start

### Prerequisites

```bash
# Clone and install
git clone https://github.com/Elfsong/Daily_Paper_Reader.git
cd Daily_Paper_Reader
uv sync
```

Create a `.env` file in the project root:

```
GEMINI_API_KEY=your_gemini_api_key
HF_TOKEN=your_huggingface_token
```

### Web App

```bash
uv run streamlit run src/streamlit_app.py
```

### CLI: Daily Paper Retriever

`src/daily_retrieve.py` is a standalone CLI tool for batch collecting and summarizing papers.

#### Basic Usage

```bash
# Collect yesterday's papers
uv run python src/daily_retrieve.py

# Collect a specific date
uv run python src/daily_retrieve.py --date 2026-03-25

# Collect a date range
uv run python src/daily_retrieve.py --date 2026-03-01 --end 2026-03-31

# Parallel collection (16 workers)
uv run python src/daily_retrieve.py --date 2026-01-01 --end 2026-03-31 --workers 16

# Skip pushing to HuggingFace
uv run python src/daily_retrieve.py --date 2026-03-25 --no-push
```

#### Options

| Flag | Description | Default |
|------|-------------|---------|
| `--date DATE` | Start date (YYYY-MM-DD) | Yesterday |
| `--end DATE` | End date, inclusive (for range) | Same as `--date` |
| `--workers N` | Parallel workers for date range | 1 |
| `--no-push` | Skip pushing to HuggingFace | False |

#### Pipeline

For each date, the tool runs:

1. **Check HF** — Skip if papers + trending already exist on HuggingFace
2. **Fetch** — Get paper list from HuggingFace Daily Papers API
3. **Cache Merge** — Load existing summaries from local JSON and HF dataset
4. **Summarize** — Call Gemini for papers without summaries (with PDF grounding)
5. **Trending** — Generate daily trend analysis via Gemini (if not on HF)
6. **Push** — Upload papers and trending to HuggingFace Hub

Papers with transient errors (e.g., missing API key) are automatically retried on subsequent runs.

#### Progress Display

Multi-date runs show a live progress dashboard with per-date progress bars, elapsed time, and API cost tracking:

```
  📰 Daily Paper Retriever   [━━━━━━━━━━━━━────────────]  14/30 days   ⏱  03:21   💰 $0.1842 (42 calls, 98,201 tok)
  ──────────────────────────────────────────────────────────────────────────────────────────────
    2026-03-01  [━━━━━━━━━━━━━━━━━━━━━━━━━] 100% (22/22)  ✓ done
    2026-03-02  [━━━━━━━━━━━━━━━━━━━━━━━━━] 100% (41/41)  ✓ synced
    2026-03-03  [━━━━━━━━━━━━━━━━━━━━━━━━━] 100% (43/43)  ✓ all cached
    2026-03-04  [━━━━━━━━━━━━━━───────────]  56% (12/21)  Attention Is All You Need...
    2026-03-05  [·························]             waiting
```

## Data

- **Paper summaries**: [`Elfsong/hf_paper_summary`](https://huggingface.co/datasets/Elfsong/hf_paper_summary)
- **Trending analyses**: [`Elfsong/hf_paper_trending`](https://huggingface.co/datasets/Elfsong/hf_paper_trending)