OpenFront RL Agent

PPO-trained agent for OpenFront.io, a multiplayer territory control game.

Model Version: v13b

Current best model trained with normalized elimination reward and winner bonus.

Training Details

  • Algorithm: PPO (Proximal Policy Optimization)
  • Architecture: Actor-Critic with shared backbone (512โ†’512โ†’256)
  • Observation dim: 80 (16 player stats + 16 neighbors ร— 4 features)
  • Action space: MultiDiscrete [17 action types, 16 targets, 5 troop fractions]
  • Maps: plains, big_plains, world, giantworldmap, ocean_and_land, half_land_half_ocean (random per episode)
  • Parallel envs: 16
  • Learning rate: 1.5e-4 (constant)
  • Rollout steps: 1024
  • Batch size: 16,384
  • Value function coefficient: 0.5
  • Updates trained: 1550 (ongoing)

Reward Design (v13)

Normalized elimination reward โ€” total reward sums to +1.0 on a full win regardless of opponent count:

  • Per-kill: +1/N per opponent eliminated (N = starting opponents)
  • Winner bonus: remaining alive opponents credited as aliveCount/N when game.getWinner() fires
  • Death penalty: -1.0

Curriculum

Win-rate-gated 12-stage curriculum advancing through Easy โ†’ Medium โ†’ Hard difficulty and 2 โ†’ 15 opponents. Stages advance only when rolling win rate exceeds per-stage threshold (75% down to 45%) over 200 episodes.

Eval Results

  • Easy/2 opponents: 100% win rate (20/20 games)

Usage

from train import ActorCritic
import torch

model = ActorCritic(obs_dim=80, max_neighbors=16, hidden_sizes=[512, 512, 256])
checkpoint = torch.load("best_model.pt", map_location="cpu", weights_only=False)
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()

Repository

Trained from josh-freeman/openfront-rl.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading