PoorMansNanoGPT-TinyStories-16M

A tiny 4-layer GPT model trained on TinyStories.

Training Details

Dataset: TinyStories (2.1M stories)
Hardware: 1x A100 (< 1 hour)
Loss: 1.863 (train), 1.868 (val)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("gra-dient/poormans-nanogpt-tiny")
tokenizer = AutoTokenizer.from_pretrained("gra-dient/poormans-nanogpt-tiny")

Acknowledgments

This project builds upon and, in some ways, is a humble tribute to the work of Andrej Karpathy nanoGPT.

As we stand on the shoulders of giants, I hope this model series contributes something and encourages others to join in the fun.

Downloads last month: 2

Safetensors

Model size

16.3M params

Tensor type

F32