PoorMansNanoGPT-TinyStories-16M

A tiny 4-layer GPT model trained on TinyStories.

Training Details

  • Dataset: TinyStories (2.1M stories)
  • Hardware: 1x A100 (< 1 hour)
  • Loss: 1.863 (train), 1.868 (val)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("gra-dient/poormans-nanogpt-tiny")
tokenizer = AutoTokenizer.from_pretrained("gra-dient/poormans-nanogpt-tiny")

Acknowledgments

This project builds upon and, in some ways, is a humble tribute to the work of Andrej Karpathy nanoGPT.

As we stand on the shoulders of giants, I hope this model series contributes something and encourages others to join in the fun.

Downloads last month
2
Safetensors
Model size
16.3M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support