๐ŸŽฏ GameTheory-Solver

A QLoRA fine-tuned adapter for Qwen2.5-7B-Instruct, specialized in solving game theory problems with rigorous step-by-step mathematical reasoning.

Dataset Demo License


๐Ÿ“‹ Model Description

GameTheory-Solver is a LoRA adapter trained on the GameTheory-Bench dataset โ€” the first comprehensive, computationally verified game theory dataset for LLM training. The adapter transforms Qwen2.5-7B-Instruct into a specialized solver that produces detailed, step-by-step solutions with mathematical proofs and clear final answers.

Key result: The fine-tuned model achieves 94% overall accuracy (up from 82% base) and 94.4% on hard problems (up from 66.7% base), representing a +12pp overall and +27.7pp hard-problem improvement.

๐Ÿง  Capabilities

Capability Details
Nash Equilibrium Computation Pure and mixed strategies for 2ร—2, 3ร—3, 3ร—4, and 4ร—4 games
Dominant Strategy Analysis IESDS (Iterated Elimination of Strictly Dominated Strategies)
Zero-Sum Game Solving Minimax theorem, saddle point detection, mixed strategies
Sequential Game Analysis Backward induction, subgame perfect equilibrium (up to 3 stages)
Bayesian Game Equilibria Incomplete information, BNE, signaling games
Cooperative Game Theory Shapley value computation, core analysis
Auction Theory First-price, second-price (Vickrey), all-pay, revenue equivalence
Mechanism Design VCG mechanisms, incentive compatibility analysis

๐Ÿ“Š Benchmark Results

Evaluated on a diverse benchmark spanning all 10 categories and 3 difficulty levels.

Overall Performance: Base vs. Solver

Metric Base (Qwen2.5-7B) Solver (Fine-tuned) ฮ” Improvement
Overall Accuracy 82% 94% +12% โœ…
Hard Problems 66.7% 94.4% +27.7% ๐Ÿš€

Per-Category Accuracy

Category Base Solver ฮ” Trend
Normal Form 2ร—2 100% 80% โˆ’20% ๐Ÿ“‰
Normal Form 3ร—3 80% 60% โˆ’20% ๐Ÿ“‰
Normal Form 3ร—4 100% 100% โ€” โžก๏ธ
Normal Form 4ร—4 100% 100% โ€” โžก๏ธ
Zero-Sum 100% 100% โ€” โžก๏ธ
Sequential Game 100% 100% โ€” โžก๏ธ
Auction Theory 80% 100% +20% ๐Ÿ“ˆ
Bayesian Game 0% 100% +100% ๐Ÿš€
Cooperative Game 100% 100% โ€” โžก๏ธ
Mechanism Design 60% 100% +40% ๐Ÿ“ˆ

Highlight: The model achieves the most dramatic gains on previously unsolvable categories โ€” Bayesian Games (0% โ†’ 100%) and Mechanism Design (60% โ†’ 100%) โ€” while maintaining perfect scores across zero-sum, sequential, and cooperative games.


๐Ÿš€ Usage

Installation

pip install transformers peft bitsandbytes accelerate torch

Loading the Model

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# Quantization config (matches training)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

# Load base model + adapter
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-7B-Instruct",
    quantization_config=bnb_config,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "2reb/GameTheory-Solver")
tokenizer = AutoTokenizer.from_pretrained("2reb/GameTheory-Solver")

Solving a Game Theory Problem

messages = [
    {
        "role": "system",
        "content": (
            "You are a game theory expert. Solve the given problem "
            "step-by-step, showing all mathematical reasoning. "
            "Provide the final answer clearly."
        ),
    },
    {
        "role": "user",
        "content": (
            "Consider the following game:\n\n"
            "Player 1 \\ Player 2 | Left | Right\n"
            "--- | --- | ---\n"
            "Up | (3,1) | (0,0)\n"
            "Down | (1,1) | (2,3)\n\n"
            "Find all Nash Equilibria."
        ),
    },
]

inputs = tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)

response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)

๐Ÿ‹๏ธ Training Details

Base Model

Parameter Value
Model Qwen/Qwen2.5-7B-Instruct
Total Parameters 7.6B
Trainable Parameters 161M (2.1% of total)

Dataset

Parameter Value
Dataset 2reb/GameTheory-Bench
Train Split 2,767 examples
Eval Split 146 examples (5% held out)

QLoRA Configuration

Parameter Value
LoRA rank (r) 64
LoRA alpha (ฮฑ) 128
LoRA dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization 4-bit NF4 with double quantization
Compute dtype bfloat16

Training Hyperparameters

Parameter Value
Epochs 3
Batch size (per device) 2
Gradient accumulation steps 8
Effective batch size 16
Learning rate 2e-4
LR scheduler Cosine
Warmup ratio 0.05
Weight decay 0.01
Max sequence length 2,048
Packing Enabled
Optimizer paged_adamw_8bit
Gradient checkpointing Enabled

Training Results

Metric Value
Train loss 0.1613
Eval loss 0.0873
Token accuracy 96.1%
Total steps 135
Training runtime ~2 hours
Hardware 2ร— NVIDIA RTX 3090 (24 GB each)

โš ๏ธ Limitations

  • Small-matrix regression: Accuracy on 2ร—2 and 3ร—3 normal-form games decreased after fine-tuning (100% โ†’ 80% and 80% โ†’ 60% respectively). The base model already handled these well; the adapter slightly regresses on simpler subcategories while dramatically improving harder ones.
  • Mixed-strategy precision: Complex mixed-strategy Nash Equilibria involving irrational numbers may have floating-point precision issues.
  • Context length: Max sequence length of 2,048 tokens may truncate very large game matrices or extremely detailed solutions.
  • Synthetic training data: The model was trained on programmatically generated problems; real-world game theory scenarios with ambiguous framing may require additional prompting.

๐Ÿ”— Links

Resource Link
๐Ÿ“Š Dataset 2reb/GameTheory-Bench
๐ŸŽฎ Live Demo GameTheory-Solver-Demo
๐Ÿ  Base Model Qwen/Qwen2.5-7B-Instruct

๐Ÿ“„ License

This adapter is released under the Apache 2.0 License.

๐Ÿ“ Citation

@misc{gametheory-solver-2025,
  title   = {GameTheory-Solver: QLoRA Fine-tuned Qwen2.5-7B for Game Theory},
  author  = {2reb},
  year    = {2025},
  publisher = {Hugging Face},
  url     = {https://huggingface.co/2reb/GameTheory-Solver}
}
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Alogotron/GameTheory-Solver

Base model

Qwen/Qwen2.5-7B
Adapter
(1360)
this model

Dataset used to train Alogotron/GameTheory-Solver

Spaces using Alogotron/GameTheory-Solver 3

Evaluation results