--- language: - en license: mit tags: - pytorch - chess - gpt2 - text-generation datasets: - lichess --- [![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github)](https://github.com/l-pommeret/chess_char_test) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/129J3E6uJASrDLH7TsgOzFdSjAtvNPVRY?usp=sharing) # Zual/chess_char ## Model Description `Zual/chess_char` is a GPT-2 based model trained to generate chess games in PGN (Portable Game Notation) format. It treats chess moves as a language modeling task, learning to predict the next character in a PGN sequence. ## Intended Use This model is intended for research purposes to study the capabilities of Transformer models in learning structured, rule-based systems (like Chess) purely from observational data. **Primary Use Case:** Generating valid PGN chess game continuations from a given prefix. ## Usage You can use this model directly with the Hugging Face `transformers` library: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "Zual/chess_char" model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True) # Note: The model uses a custom tokenizer which should be loaded via the repository scripts # or by following the instructions in the GitHub repo. ``` For a complete inference example with the custom tokenizer, please refer to the [GitHub repository](https://github.com/l-pommeret/chess_char_test). ## Training Data The model was trained on a dataset of standard chess games from **Lichess** (rated 2000+, September 2016 dump). - **Source:** [Lichess Database](https://database.lichess.org/) - **Filtering:** Minimum 20 moves, no time-outs or abandonments. - **Preprocessing:** Games were converted to char-level tokens. ## Training Procedure ### Hyperparameters The model was trained with the following configuration: - **Architecture:** GPT-2 - **Layers:** 8 - **Heads:** 8 - **Embedding Dim:** 512 - **Context Size:** 1024 - **Vocab Size:** ~32 (Character-level PGN tokens) - **Batch Size:** 64 - **Learning Rate:** 1e-3 - **Optimizer:** AdamW - **Epochs:** 5 - **Mixed Precision:** FP16 ## Evaluation The model's performance is evaluated based on: 1. **Legal Move Rate:** Percentage of generated moves that are legal according to chess rules. 2. **Move Quality:** Comparison of move distributions against historical games and Stockfish evaluations (see paper). ## Limitations - The model does not "know" the rules of chess explicitly; it only predicts the next character based on statistical patterns. - While it achieves a high rate of legal moves (~98%), it may occasionally generate illegal moves or invalid PGN syntax, especially in long sequences. - It is not a chess engine and does not optimize for winning, but for mimicking human play style found in the training data.