Add converted tokenizer (no trust_remote_code needed)
#20
by
ArthurZ
HF Staff
- opened
Tokenizer Conversion
This PR adds a converted tokenizer that works without trust_remote_code=True.
Conversion Details
- Converted using:
python scripts/convert_tokenizer.py moonshotai/Kimi-Linear-48B-A3B-Instruct --push-to-hub - Original tokenizer type:
TikTokenTokenizer - Converted tokenizer type:
TokenizersBackend
Validation Results
- Tested on XNLI dataset (500 samples)
- All samples match 1-1 β
Usage
from transformers import AutoTokenizer
# Now works without trust_remote_code=True
tokenizer = AutoTokenizer.from_pretrained("moonshotai/Kimi-Linear-48B-A3B-Instruct")
Converted with transformers tokenizer conversion script