DistilBERT Email Sentiment Analysis
A fine-tuned DistilBERT model for email sentiment classification. This model analyzes the tone and sentiment of professional/corporate emails, classifying them as positive or negative with a confidence score.
Model Details
| Property | Value |
|---|---|
| Base Model | distilbert-base-uncased |
| Task | Binary Sentiment Classification |
| Language | English |
| Parameters | ~66M |
| License | MIT |
Usage
Quick Start
from transformers import pipeline
classifier = pipeline(
"sentiment-analysis",
model="distilbert-mail/distilbert-mail-analysis",
model_kwargs={"weights_only": False}
)
result = classifier("Dear team, I'm pleased to inform you that the project has been completed ahead of schedule.")
print(result)
# [{'label': 'POSITIVE', 'score': 0.9342}]
PyTorch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("distilbert-mail/distilbert-mail-analysis")
model = AutoModelForSequenceClassification.from_pretrained(
"distilbert-mail/distilbert-mail-analysis",
weights_only=False
)
inputs = tokenizer("We regret to inform you that your application has been declined.", return_tensors="pt")
outputs = model(**inputs)
probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(probs)
Training
Dataset
The model was fine-tuned on a curated combination of:
- IMDB Reviews — general sentiment patterns
- Enron Email Corpus — email-specific language features
- Internal corporate email samples (anonymized) — professional tone detection
Hyperparameters
| Parameter | Value |
|---|---|
| Learning Rate | 2e-5 |
| Batch Size | 32 |
| Epochs | 4 |
| Optimizer | AdamW |
| Weight Decay | 0.01 |
| Warmup Steps | 500 |
| Max Sequence Length | 512 |
Results
| Metric | Score |
|---|---|
| Accuracy | 91.2% |
| F1 Score | 90.8% |
| Precision | 91.5% |
| Recall | 90.1% |
Use Cases
- Email triage: Automatically categorize incoming emails by sentiment
- Customer support: Detect negative sentiment in support tickets
- HR Analytics: Analyze employee communication tone
- Sales intelligence: Gauge client sentiment from email threads
Limitations
- Optimized for English-language emails; performance may degrade on other languages
- Short emails (< 10 words) may produce less reliable predictions
- Sarcasm and irony detection is limited
- Best suited for professional/corporate email contexts
Requirements
torch>=1.9.0
transformers>=4.20.0
Troubleshooting
If you encounter errors loading the model directly (e.g. network issues or library version conflicts), download it locally first:
from huggingface_hub import snapshot_download
from transformers import pipeline
# Step 1: Download model to local folder
snapshot_download(
repo_id="distilbert-mail/distilbert-mail-analysis",
local_dir="./distilbert-mail-analysis"
)
# Step 2: Load from local folder
classifier = pipeline(
"sentiment-analysis",
model="./distilbert-mail-analysis",
model_kwargs={"weights_only": False}
)
result = classifier("Dear team, the project has been completed ahead of schedule.")
print(result)
Citation
@misc{distilbert-mail-analysis,
title={DistilBERT Email Sentiment Analysis},
author={distilbert-mail},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/distilbert-mail/distilbert-mail-analysis}
}
- Downloads last month
- 263
Datasets used to train distilbert-mail/distilbert-mail-analysis
Evaluation results
- Accuracyself-reported0.912
- F1 Scoreself-reported0.908