YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
jaxgmg2_3phase_optim_state_patt
Note: Einar trained these models and the description below is uncertain.
Checkpoints of RL agents from jaxgmg2_3phase_optim_state retrained using patterning: a technique that resumes training from a stored checkpoint and modifies the initial state distribution Lambda by a perturbation of magnitude h:
Lambda' = Lambda + h * delta_Lambda / ||delta_Lambda||_1
WandB: https://wandb.ai/devinterp/jaxgmg2_patt
Patterning Modes
diff: Shifts alpha (uniform vs corner mixing weight)rsv: Perturbs along the first right singular vector of the susceptibility matrixmp-inv: Perturbs along chi+ @ delta, where chi+ is the Moore-Penrose pseudo-inverse and delta is the cluster separation vector between phase-2b-rewarded and non-rewarded states
Sweep
Base model: jaxgmg2_3phase_optim_state/al_0.6_g_0.98_id_17_seed_980617 (phase3 checkpoint)
Parameter sweep:
- patt_mode: diff, rsv, mp-inv
- patt_h: 0.01, 0.03, 0.05, 0.08, 0.12, 0.16, 0.2, 0.26, 0.32, 0.38, 0.44, 0.5
- resume_optim: True, False
72 total sweep configurations.
Shared Hyperparams
rl_action=train
alpha=0.6
discount_rate=0.98
lr=5e-05
num_total_env_steps=7372800000
num_rollout_steps=64
num_levels=9600
cheese_loc=any
env_layout=open
mask_type=first_episode
use_prev_action=False
grad_acc_per_chunk=4
log_optimizer_state=True
resume=jaxgmg2_3phase_optim_state/al_0.6_g_0.98_id_17_seed_980617
resume_id=phase3
eval_schedule=0:1,250:2,500:5,2000:10
wandb_project=jaxgmg2_patt
use_wandb=True
use_hf=True
Naming Schema
Checkpoints are saved to:
{resume_repo}_patt/{model_id}/patt_{mode}_h_{h}_ld-opt_{0|1}/
Usage
# From projects/rl:
python -m experiments.patterning.main --patt-mode diff --patt-h 0.01 --alpha 0.6
python -m experiments.patterning.main --patt-mode rsv --patt-h 0.01 --alpha 0.6 --resume-optim True
python -m experiments.patterning.main --patt-mode mp-inv --patt-h 0.01 --alpha 0.6 --resume-id 3810
Reproduced with
See train.yaml in this repository. Run as a WandB sweep using
projects/rl/experiments/patterning/sweeps/patt_sweep.yaml from the
timaeus monorepo.