DistilBERT Email Sentiment Analysis

A fine-tuned DistilBERT model for email sentiment classification. This model analyzes the tone and sentiment of professional/corporate emails, classifying them as positive or negative with a confidence score.

Model Details

Property	Value
Base Model	`distilbert-base-uncased`
Task	Binary Sentiment Classification
Language	English
Parameters	~66M
License	MIT

Usage

Quick Start

from transformers import pipeline

classifier = pipeline(
    "sentiment-analysis",
    model="distilbert-mail/distilbert-mail-analysis",
    model_kwargs={"weights_only": False}
)

result = classifier("Dear team, I'm pleased to inform you that the project has been completed ahead of schedule.")
print(result)
# [{'label': 'POSITIVE', 'score': 0.9342}]

PyTorch

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("distilbert-mail/distilbert-mail-analysis")
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-mail/distilbert-mail-analysis",
    weights_only=False
)

inputs = tokenizer("We regret to inform you that your application has been declined.", return_tensors="pt")
outputs = model(**inputs)

probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(probs)

Training

Dataset

The model was fine-tuned on a curated combination of:

IMDB Reviews — general sentiment patterns
Enron Email Corpus — email-specific language features
Internal corporate email samples (anonymized) — professional tone detection

Hyperparameters

Parameter	Value
Learning Rate	2e-5
Batch Size	32
Epochs	4
Optimizer	AdamW
Weight Decay	0.01
Warmup Steps	500
Max Sequence Length	512

Results

Metric	Score
Accuracy	91.2%
F1 Score	90.8%
Precision	91.5%
Recall	90.1%

Use Cases

Email triage: Automatically categorize incoming emails by sentiment
Customer support: Detect negative sentiment in support tickets
HR Analytics: Analyze employee communication tone
Sales intelligence: Gauge client sentiment from email threads

Limitations

Optimized for English-language emails; performance may degrade on other languages
Short emails (< 10 words) may produce less reliable predictions
Sarcasm and irony detection is limited
Best suited for professional/corporate email contexts

Requirements

torch>=1.9.0
transformers>=4.20.0

Troubleshooting

If you encounter errors loading the model directly (e.g. network issues or library version conflicts), download it locally first:

from huggingface_hub import snapshot_download
from transformers import pipeline

# Step 1: Download model to local folder
snapshot_download(
    repo_id="distilbert-mail/distilbert-mail-analysis",
    local_dir="./distilbert-mail-analysis"
)

# Step 2: Load from local folder
classifier = pipeline(
    "sentiment-analysis",
    model="./distilbert-mail-analysis",
    model_kwargs={"weights_only": False}
)

result = classifier("Dear team, the project has been completed ahead of schedule.")
print(result)

Citation

@misc{distilbert-mail-analysis,
  title={DistilBERT Email Sentiment Analysis},
  author={distilbert-mail},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/distilbert-mail/distilbert-mail-analysis}
}

Downloads last month: 263

Datasets used to train distilbert-mail/distilbert-mail-analysis

Evaluation results

Accuracy
self-reported

0.912
F1 Score
self-reported

0.908