tiny-epstein-100m
A small transformer model (~100M parameters) trained on the teyler/epstein-files-20k dataset. The architecture is inspired by Tiny Aya modifications and is designed for efficient on-device inference.
Model Details
- Architecture: Decoder-only transformer with parallel blocks, Grouped Query Attention (GQA), SwiGLU activation, and bias‑free LayerNorm.
- Sliding Window Attention: 3:1 local:global ratio (first 75% of layers use sliding window with RoPE; remaining layers use full attention with NoPE).
- Parameters: ~100 million
- Context Length: 1024 tokens (configurable)
- Tokenizer: GPT‑2 (same as used during training)
- Training Data: teyler/epstein-files-20k – 20,000 documents related to the Epstein files.
Intended Use
This model is primarily for research and experimentation. It can generate continuations of text given a prompt, especially on topics related to the Epstein files.
How to Use
Installation
Make sure you have torch and transformers installed.
If you want to run inference, install the required packages:
pip install torch transformers
Loading the Model and Tokenizer
import torch
from transformers import AutoTokenizer
from huggingface_hub import snapshot_download
# Download the model from Hugging Face Hub
model_path = snapshot_download(repo_id="liminerity/tiny-epstein-100m")
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Load model (custom architecture needs the model definition – see below)
# We need to define the model class again or import from a module.
# For convenience, the model definition is included in the training script.
# Here we provide a minimal loading snippet assuming you have the model class.
# Define model config (must match the saved config.json)
class ModelConfig:
vocab_size = 50257
emb_dim = 768
hidden_dim = 2048
num_layers = 12
num_heads = 12
num_kv_heads = 4
max_seq_len = 1024
window_size = 1024
sliding_window_ratio = 0.75
rope_theta = 10000.0
dtype = torch.float16
bias = False
dropout = 0.0
# Instantiate model (you need the model class definition, e.g., TinyAya)
# Here we assume you have the TinyAya class from the training script.
# If not, copy the class definition from the training script into this cell.
model = TinyAya(ModelConfig())
state_dict = torch.load(os.path.join(model_path, "pytorch_model.bin"), map_location="cpu")
model.load_state_dict(state_dict)
model.eval()
Text Generation Example
prompt = "The Epstein files reveal"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs.input_ids,
max_new_tokens=50,
temperature=0.8,
do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
The model was trained for one epoch on the full dataset using an L4 GPU in Google Colab. Optimizer: AdamW (lr=1e-4) with gradient clipping (max norm=1.0). Mixed precision (float16) was used.
Limitations
· The model is small and was trained on a limited dataset; it may produce repetitive or nonsensical outputs. · It has not undergone any safety fine‑tuning; use with caution.
License
MIT
- Downloads last month
- -