Merchant Consumption Category Discriminator v1

Repository: https://huggingface.co/kakao1513/merchant-consumption-category-discriminator-v1
Base checkpoint: monologg/koelectra-base-v3-discriminator
Export metadata model_name: monologg/koelectra-base-v3-discriminator
Input format: merchant_text [SEP] normalized_merchant_text
Validation macro F1: 0.6288
Test macro F1: 0.6730
Service fallback test macro F1: 0.6731
Selected service fallback threshold: 0.3500

Summary

This model classifies Korean merchant strings into consumption categories for the more service. The labels are weakly supervised merchant-category labels derived from internal merchant-category pipelines, so boundary errors can still appear around visually similar service names.

Files

Model weights and tokenizer are uploaded at the repository root.
Training logs and evaluation artifacts are uploaded under artifacts/.

GroupKFold Results

Number of folds: 5
Best fold by validation macro F1: fold 3 (0.6763)
Mean validation macro F1: 0.6532
Std validation macro F1: 0.0231
Mean validation accuracy: 0.6881
Full fold metrics: https://huggingface.co/kakao1513/merchant-consumption-category-discriminator-v1/resolve/main/artifacts/kfold/cv_metrics.csv

Inference

from transformers import pipeline

clf = pipeline(
    'text-classification',
    model='kakao1513/merchant-consumption-category-discriminator-v1',
    tokenizer='kakao1513/merchant-consumption-category-discriminator-v1',
    top_k=3,
)

clf('스타벅스 강남R점 [SEP] 스타벅스 강남R점')

Limitations

This classifier is optimized for merchant text classification, not full transaction understanding.
Low-confidence predictions may still need the downstream service fallback rule.

Downloads last month: 5

Safetensors

Model size

0.1B params

Tensor type

F32