tiny-epstein-100m

A small transformer model (~100M parameters) trained on the teyler/epstein-files-20k dataset. The architecture is inspired by Tiny Aya modifications and is designed for efficient on-device inference.

Model Details

  • Architecture: Decoder-only transformer with parallel blocks, Grouped Query Attention (GQA), SwiGLU activation, and bias‑free LayerNorm.
  • Sliding Window Attention: 3:1 local:global ratio (first 75% of layers use sliding window with RoPE; remaining layers use full attention with NoPE).
  • Parameters: ~100 million
  • Context Length: 1024 tokens (configurable)
  • Tokenizer: GPT‑2 (same as used during training)
  • Training Data: teyler/epstein-files-20k – 20,000 documents related to the Epstein files.

Intended Use

This model is primarily for research and experimentation. It can generate continuations of text given a prompt, especially on topics related to the Epstein files.

How to Use

Installation

Make sure you have torch and transformers installed. If you want to run inference, install the required packages:

pip install torch transformers

Loading the Model and Tokenizer

import torch
from transformers import AutoTokenizer
from huggingface_hub import snapshot_download

# Download the model from Hugging Face Hub
model_path = snapshot_download(repo_id="liminerity/tiny-epstein-100m")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Load model (custom architecture needs the model definition – see below)
# We need to define the model class again or import from a module.
# For convenience, the model definition is included in the training script.
# Here we provide a minimal loading snippet assuming you have the model class.

# Define model config (must match the saved config.json)
class ModelConfig:
    vocab_size = 50257
    emb_dim = 768
    hidden_dim = 2048
    num_layers = 12
    num_heads = 12
    num_kv_heads = 4
    max_seq_len = 1024
    window_size = 1024
    sliding_window_ratio = 0.75
    rope_theta = 10000.0
    dtype = torch.float16
    bias = False
    dropout = 0.0

# Instantiate model (you need the model class definition, e.g., TinyAya)
# Here we assume you have the TinyAya class from the training script.
# If not, copy the class definition from the training script into this cell.
model = TinyAya(ModelConfig())
state_dict = torch.load(os.path.join(model_path, "pytorch_model.bin"), map_location="cpu")
model.load_state_dict(state_dict)
model.eval()

Text Generation Example

prompt = "The Epstein files reveal"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(
        inputs.input_ids,
        max_new_tokens=50,
        temperature=0.8,
        do_sample=True
    )
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

The model was trained for one epoch on the full dataset using an L4 GPU in Google Colab. Optimizer: AdamW (lr=1e-4) with gradient clipping (max norm=1.0). Mixed precision (float16) was used.

Limitations

· The model is small and was trained on a limited dataset; it may produce repetitive or nonsensical outputs. · It has not undergone any safety fine‑tuning; use with caution.

License

MIT

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support