Project: RL1/RL2 (obsolete)
Collection
Older models that are no longer useful for anything in RL1 or RL2, or are now unused as experimentation discontinued. • 16 items • Updated
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
A run to dial in suitable values of alpha, learning rate, and discount_rate after the learning rate bug was fixed. All trained with the same seed, so not especially useful for experiments, but was useful for dialing in hyperparams.
Trained with now obsolete code on commit hash 8b73ab2736efca9a494348d2e68a62442e219aab
Wandb https://wandb.ai/devinterp/jaxgmg2_3phase_fast
Ran with
wandb sweep sweep.yaml
Contents of sweep.yaml
command:
- env
- WANDB_AGENT_MAX_INITIAL_FAILURES=1
- /root/timaeus/.venv/bin/python
- ${program}
- ${args}
- --use-wandb
- --use-hf
entity: devinterp
method: random
parameters:
alpha:
distribution: uniform
max: 0.9
min: 0.2
cheese-loc:
value: any
ckpt-dir:
value: jaxgmg2_3phase_fast
discount-rate:
values:
- 0.99
- 0.975
eval-schedule:
value: 0:1,250:2,500:5
f-str-ckpt:
value: al_{alpha:.3f}_g_{discount_rate}_seed_{seed}_pa_{num_prev_actions}_lr_{lr:.0e}
grad-acc-per-chunk:
value: 5
lr:
values:
- 5e-05
- 0.0001
- 0.0002
- 0.0005
- 0.001
- 0.002
- 0.005
mask-type:
value: first_episode
ntfy:
value: david_jaxgmg
num-levels:
value: 9600
num-prev-actions:
value: 1
num-total-env-steps:
value: 2000000000
seed:
value: 100
wandb-project:
value: jaxgmg2_3phase_fast
program: /root/timaeus/projects/rl/main_train.py
project: jaxgmg2_3phase_fast