Add converted tokenizer (no trust_remote_code needed)

#20
by ArthurZ HF Staff - opened

Tokenizer Conversion

This PR adds a converted tokenizer that works without trust_remote_code=True.

Conversion Details

  • Converted using: python scripts/convert_tokenizer.py moonshotai/Kimi-Linear-48B-A3B-Instruct --push-to-hub
  • Original tokenizer type: TikTokenTokenizer
  • Converted tokenizer type: TokenizersBackend

Validation Results

  • Tested on XNLI dataset (500 samples)
  • All samples match 1-1 βœ“

Usage

from transformers import AutoTokenizer

# Now works without trust_remote_code=True
tokenizer = AutoTokenizer.from_pretrained("moonshotai/Kimi-Linear-48B-A3B-Instruct")

Converted with transformers tokenizer conversion script

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment