๐ฏ GameTheory-Solver
A QLoRA fine-tuned adapter for Qwen2.5-7B-Instruct, specialized in solving game theory problems with rigorous step-by-step mathematical reasoning.

๐ Model Description
GameTheory-Solver is a LoRA adapter trained on the GameTheory-Bench dataset โ the first comprehensive, computationally verified game theory dataset for LLM training. The adapter transforms Qwen2.5-7B-Instruct into a specialized solver that produces detailed, step-by-step solutions with mathematical proofs and clear final answers.
Key result: The fine-tuned model achieves 94% overall accuracy (up from 82% base) and 94.4% on hard problems (up from 66.7% base), representing a +12pp overall and +27.7pp hard-problem improvement.
๐ง Capabilities
| Capability |
Details |
| Nash Equilibrium Computation |
Pure and mixed strategies for 2ร2, 3ร3, 3ร4, and 4ร4 games |
| Dominant Strategy Analysis |
IESDS (Iterated Elimination of Strictly Dominated Strategies) |
| Zero-Sum Game Solving |
Minimax theorem, saddle point detection, mixed strategies |
| Sequential Game Analysis |
Backward induction, subgame perfect equilibrium (up to 3 stages) |
| Bayesian Game Equilibria |
Incomplete information, BNE, signaling games |
| Cooperative Game Theory |
Shapley value computation, core analysis |
| Auction Theory |
First-price, second-price (Vickrey), all-pay, revenue equivalence |
| Mechanism Design |
VCG mechanisms, incentive compatibility analysis |
๐ Benchmark Results
Evaluated on a diverse benchmark spanning all 10 categories and 3 difficulty levels.
Overall Performance: Base vs. Solver
| Metric |
Base (Qwen2.5-7B) |
Solver (Fine-tuned) |
ฮ Improvement |
| Overall Accuracy |
82% |
94% |
+12% โ
|
| Hard Problems |
66.7% |
94.4% |
+27.7% ๐ |
Per-Category Accuracy
| Category |
Base |
Solver |
ฮ |
Trend |
| Normal Form 2ร2 |
100% |
80% |
โ20% |
๐ |
| Normal Form 3ร3 |
80% |
60% |
โ20% |
๐ |
| Normal Form 3ร4 |
100% |
100% |
โ |
โก๏ธ |
| Normal Form 4ร4 |
100% |
100% |
โ |
โก๏ธ |
| Zero-Sum |
100% |
100% |
โ |
โก๏ธ |
| Sequential Game |
100% |
100% |
โ |
โก๏ธ |
| Auction Theory |
80% |
100% |
+20% |
๐ |
| Bayesian Game |
0% |
100% |
+100% |
๐ |
| Cooperative Game |
100% |
100% |
โ |
โก๏ธ |
| Mechanism Design |
60% |
100% |
+40% |
๐ |
Highlight: The model achieves the most dramatic gains on previously unsolvable categories โ Bayesian Games (0% โ 100%) and Mechanism Design (60% โ 100%) โ while maintaining perfect scores across zero-sum, sequential, and cooperative games.
๐ Usage
Installation
pip install transformers peft bitsandbytes accelerate torch
Loading the Model
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-7B-Instruct",
quantization_config=bnb_config,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "2reb/GameTheory-Solver")
tokenizer = AutoTokenizer.from_pretrained("2reb/GameTheory-Solver")
Solving a Game Theory Problem
messages = [
{
"role": "system",
"content": (
"You are a game theory expert. Solve the given problem "
"step-by-step, showing all mathematical reasoning. "
"Provide the final answer clearly."
),
},
{
"role": "user",
"content": (
"Consider the following game:\n\n"
"Player 1 \\ Player 2 | Left | Right\n"
"--- | --- | ---\n"
"Up | (3,1) | (0,0)\n"
"Down | (1,1) | (2,3)\n\n"
"Find all Nash Equilibria."
),
},
]
inputs = tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
with torch.no_grad():
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)
๐๏ธ Training Details
Base Model
Dataset
| Parameter |
Value |
| Dataset |
2reb/GameTheory-Bench |
| Train Split |
2,767 examples |
| Eval Split |
146 examples (5% held out) |
QLoRA Configuration
| Parameter |
Value |
LoRA rank (r) |
64 |
LoRA alpha (ฮฑ) |
128 |
| LoRA dropout |
0.05 |
| Target modules |
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Quantization |
4-bit NF4 with double quantization |
| Compute dtype |
bfloat16 |
Training Hyperparameters
| Parameter |
Value |
| Epochs |
3 |
| Batch size (per device) |
2 |
| Gradient accumulation steps |
8 |
| Effective batch size |
16 |
| Learning rate |
2e-4 |
| LR scheduler |
Cosine |
| Warmup ratio |
0.05 |
| Weight decay |
0.01 |
| Max sequence length |
2,048 |
| Packing |
Enabled |
| Optimizer |
paged_adamw_8bit |
| Gradient checkpointing |
Enabled |
Training Results
| Metric |
Value |
| Train loss |
0.1613 |
| Eval loss |
0.0873 |
| Token accuracy |
96.1% |
| Total steps |
135 |
| Training runtime |
~2 hours |
| Hardware |
2ร NVIDIA RTX 3090 (24 GB each) |
โ ๏ธ Limitations
- Small-matrix regression: Accuracy on 2ร2 and 3ร3 normal-form games decreased after fine-tuning (100% โ 80% and 80% โ 60% respectively). The base model already handled these well; the adapter slightly regresses on simpler subcategories while dramatically improving harder ones.
- Mixed-strategy precision: Complex mixed-strategy Nash Equilibria involving irrational numbers may have floating-point precision issues.
- Context length: Max sequence length of 2,048 tokens may truncate very large game matrices or extremely detailed solutions.
- Synthetic training data: The model was trained on programmatically generated problems; real-world game theory scenarios with ambiguous framing may require additional prompting.
๐ Links
๐ License
This adapter is released under the Apache 2.0 License.
๐ Citation
@misc{gametheory-solver-2025,
title = {GameTheory-Solver: QLoRA Fine-tuned Qwen2.5-7B for Game Theory},
author = {2reb},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/2reb/GameTheory-Solver}
}