Mini-LLM

Mini-LLM is a project that aims to replicate mainstream open-source model architectures with limited computational resources, implementing mini models with 100-200M parameters. The project focuses on learning and reproducing model architectures while providing complete training and inference pipelines. For more details, please visit the Mini-LLM project.

Usage

Using Transformers Library

First, import the model registration module, then load the model using AutoModelForCausalLM:

import mini_models  # Register custom Mini-LLM models
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained("WKQ9411/Mini-Llama3-100M-Base")
tokenizer = AutoTokenizer.from_pretrained("WKQ9411/Mini-Llama3-100M-Base")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Generate text
input_text = "长城是"
input_ids = tokenizer(input_text, return_tensors="pt")["input_ids"].to(model.device)
response = model.generate(input_ids, max_new_tokens=100)
response = tokenizer.decode(response[0][len(input_ids[0]):], skip_special_tokens=True)
print(response)

Using Custom Interface

from mini_models import get_model_and_config
from transformers import AutoTokenizer
import torch

Model, Config = get_model_and_config("mini_llama3")
model = Model.from_pretrained("path/to/your/model")
tokenizer = AutoTokenizer.from_pretrained("path/to/your/tokenizer")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Use the model for generation
input_text = "长城是"
input_ids = tokenizer(input_text, return_tensors="pt")["input_ids"].to(model.device)
response = model.generate(input_ids, max_new_tokens=100)
response = tokenizer.decode(response[0][len(input_ids[0]):], skip_special_tokens=True)
print(response)

Training Data

The model was pre-trained on:

  • 20% sampled subset of OpenCSG Fineweb-Edu-Chinese-V2.1 dataset (high-quality Chinese educational content)

Limitations

This is a small-scale model designed for educational and research purposes. It may not perform as well as larger models on complex tasks.

Downloads last month
16
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WKQ9411/Mini-Llama3-100M-Base

Finetunes
1 model