---
language:
- en
license: mit
tags:
- pytorch
- chess
- gpt2
- text-generation
datasets:
- lichess
---

[![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github)](https://github.com/l-pommeret/chess_char_test) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/129J3E6uJASrDLH7TsgOzFdSjAtvNPVRY?usp=sharing)


# Zual/chess_char

## Model Description

`Zual/chess_char` is a GPT-2 based model trained to generate chess games in PGN (Portable Game Notation) format. It treats chess moves as a language modeling task, learning to predict the next character in a PGN sequence.

## Intended Use

This model is intended for research purposes to study the capabilities of Transformer models in learning structured, rule-based systems (like Chess) purely from observational data.

**Primary Use Case:** Generating valid PGN chess game continuations from a given prefix.

## Usage

You can use this model directly with the Hugging Face `transformers` library:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Zual/chess_char"
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Note: The model uses a custom tokenizer which should be loaded via the repository scripts
# or by following the instructions in the GitHub repo.
```

For a complete inference example with the custom tokenizer, please refer to the [GitHub repository](https://github.com/l-pommeret/chess_char_test).

## Training Data

The model was trained on a dataset of standard chess games from **Lichess** (rated 2000+, September 2016 dump).
- **Source:** [Lichess Database](https://database.lichess.org/)
- **Filtering:** Minimum 20 moves, no time-outs or abandonments.
- **Preprocessing:** Games were converted to char-level tokens.

## Training Procedure

### Hyperparameters

The model was trained with the following configuration:

- **Architecture:** GPT-2
- **Layers:** 8
- **Heads:** 8
- **Embedding Dim:** 512
- **Context Size:** 1024
- **Vocab Size:** ~32 (Character-level PGN tokens)
- **Batch Size:** 64
- **Learning Rate:** 1e-3
- **Optimizer:** AdamW
- **Epochs:** 5
- **Mixed Precision:** FP16

## Evaluation

The model's performance is evaluated based on:
1.  **Legal Move Rate:** Percentage of generated moves that are legal according to chess rules.
2.  **Move Quality:** Comparison of move distributions against historical games and Stockfish evaluations (see paper).

## Limitations

- The model does not "know" the rules of chess explicitly; it only predicts the next character based on statistical patterns.
- While it achieves a high rate of legal moves (~98%), it may occasionally generate illegal moves or invalid PGN syntax, especially in long sequences.
- It is not a chess engine and does not optimize for winning, but for mimicking human play style found in the training data.