jaxgmg2_3phase_optim_state_patt

Note: Einar trained these models and the description below is uncertain.

Checkpoints of RL agents from jaxgmg2_3phase_optim_state retrained using patterning: a technique that resumes training from a stored checkpoint and modifies the initial state distribution Lambda by a perturbation of magnitude h:

Lambda' = Lambda + h * delta_Lambda / ||delta_Lambda||_1

WandB: https://wandb.ai/devinterp/jaxgmg2_patt

Patterning Modes

diff: Shifts alpha (uniform vs corner mixing weight)
rsv: Perturbs along the first right singular vector of the susceptibility matrix
mp-inv: Perturbs along chi+ @ delta, where chi+ is the Moore-Penrose pseudo-inverse and delta is the cluster separation vector between phase-2b-rewarded and non-rewarded states

Sweep

Base model: jaxgmg2_3phase_optim_state/al_0.6_g_0.98_id_17_seed_980617 (phase3 checkpoint)

Parameter sweep:

patt_mode: diff, rsv, mp-inv
patt_h: 0.01, 0.03, 0.05, 0.08, 0.12, 0.16, 0.2, 0.26, 0.32, 0.38, 0.44, 0.5
resume_optim: True, False

72 total sweep configurations.

Shared Hyperparams

rl_action=train
alpha=0.6
discount_rate=0.98
lr=5e-05
num_total_env_steps=7372800000
num_rollout_steps=64
num_levels=9600
cheese_loc=any
env_layout=open
mask_type=first_episode
use_prev_action=False
grad_acc_per_chunk=4
log_optimizer_state=True
resume=jaxgmg2_3phase_optim_state/al_0.6_g_0.98_id_17_seed_980617
resume_id=phase3
eval_schedule=0:1,250:2,500:5,2000:10
wandb_project=jaxgmg2_patt
use_wandb=True
use_hf=True

Naming Schema

Checkpoints are saved to: {resume_repo}_patt/{model_id}/patt_{mode}_h_{h}_ld-opt_{0|1}/

Usage

# From projects/rl:
python -m experiments.patterning.main --patt-mode diff --patt-h 0.01 --alpha 0.6
python -m experiments.patterning.main --patt-mode rsv --patt-h 0.01 --alpha 0.6 --resume-optim True
python -m experiments.patterning.main --patt-mode mp-inv --patt-h 0.01 --alpha 0.6 --resume-id 3810

Reproduced with

See train.yaml in this repository. Run as a WandB sweep using projects/rl/experiments/patterning/sweeps/patt_sweep.yaml from the timaeus monorepo.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including timaeus/jaxgmg2_3phase_optim_state_patt

Project: RL1 RL2

Collection

Models in use for RL1 + RL2 + susceptibility html plots + susceptibility viewer + action probs viewer. RL1 experiments redone with these models. • 14 items • Updated 8 days ago