Upload baseline model - All_Beauty - Accuracy: 96.44%

e28d82f verified about 1 month ago

3.92 kB

license: llama3.1
base_model: meta-llama/Llama-3.1-8B-Instruct
tags:
  - sentiment-analysis
  - amazon-reviews
  - llama-3.1
  - peft
  - lora
  - qlora
  - text-classification
datasets:
  - McAuley-Lab/Amazon-Reviews-2023
language:
  - en
metrics:
  - accuracy
  - f1
  - precision
  - recall
pipeline_tag: text-classification

LLaMA 3.1-8B Sentiment Analysis: All Beauty

Fine-tuned LLaMA 3.1-8B-Instruct for sentiment analysis on Amazon product reviews.

Model Description

This model is a QLoRA fine-tuned version of meta-llama/Llama-3.1-8B-Instruct for binary (negative/positive) sentiment classification on Amazon All Beauty reviews.

Training Configuration

Parameter	Value
Base Model	meta-llama/Llama-3.1-8B-Instruct
Training Phase	Baseline
Category	All Beauty
Classification	2-class
Training Samples	150,000
Epochs	1
Sequence Length	384 tokens
LoRA Rank (r)	128
LoRA Alpha	32
Quantization	4-bit NF4
Attention	SDPA

Performance Metrics

Overall

Metric	Score
Accuracy	0.9644 (96.44%)
Macro Precision	0.9652
Macro Recall	0.9642
Macro F1	0.9644

Per-Class

Class	Precision	Recall	F1
Negative	0.9485	0.9830	0.9654
Positive	0.9819	0.9454	0.9633

Confusion Matrix

              Pred Neg  Pred Pos
True Neg       2486        43
True Pos        135      2336

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "innerCircuit/llama3-sentiment-All-Beauty-binary-baseline-150k")
tokenizer = AutoTokenizer.from_pretrained("innerCircuit/llama3-sentiment-All-Beauty-binary-baseline-150k")

# Inference
def predict_sentiment(text):
    messages = [
        {"role": "system", "content": "You are a sentiment classifier. Classify as negative or positive. Respond with one word."},
        {"role": "user", "content": text}
    ]
    inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
    outputs = model.generate(inputs, max_new_tokens=5, do_sample=False)
    return tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True).strip()

# Example
print(predict_sentiment("This product is amazing! Best purchase ever."))
# Output: positive

Training Data

Attribute	Value
Dataset	Amazon Reviews 2023
Category	All Beauty
Training Samples	150,000
Evaluation Samples	10,000
Class Balance	Equal samples per sentiment class

Research Context

This model is part of a research project investigating LLM poisoning attacks, based on methodologies from Souly et al. (2025). The fine-tuned baseline establishes performance benchmarks prior to introducing adversarial samples.

References

Souly, A., Rando, J., et al. (2025). Poisoning attacks on LLMs require a near-constant number of poison samples. arXiv:2510.07192
Hou, Y., et al. (2024). Bridging Language and Items for Retrieval and Recommendation. arXiv:2403.03952

Citation

@misc{llama3-sentiment-All-Beauty-baseline,
  author = {Govinda Reddy, Akshay and Pranav},
  title = {LLaMA 3.1 Sentiment Analysis for Amazon Reviews},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/innerCircuit/llama3-sentiment-All-Beauty-binary-baseline-150k}}
}

License

This model is released under the Llama 3.1 Community License.

Generated: 2026-01-12 09:12:45 UTC