PoorMansNanoGPT-TinyStories-16M
A tiny 4-layer GPT model trained on TinyStories.
Training Details
- Dataset: TinyStories (2.1M stories)
- Hardware: 1x A100 (< 1 hour)
- Loss: 1.863 (train), 1.868 (val)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("gra-dient/poormans-nanogpt-tiny")
tokenizer = AutoTokenizer.from_pretrained("gra-dient/poormans-nanogpt-tiny")
Acknowledgments
This project builds upon and, in some ways, is a humble tribute to the work of Andrej Karpathy nanoGPT.
As we stand on the shoulders of giants, I hope this model series contributes something and encourages others to join in the fun.
- Downloads last month
- 2