YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
OBSOLETE
A series of models to test the viability of removing the vestige dependancy the IMPALA model had on previous action. This dependancy was drawn in from Matt's original repo. The environments are Markovian, so in principle the model shouldn't need the previous action, but in practice it does help for environments with walls, as the agent can "remember" in which direction it is walking down a narrow corridor, or from which direction it enters an intersection, which was useful for exploration.
Given we only work with "open mazes" (mazes that don't have any walls), the addition of the previous action no longer seemed to be of any help, and so was deprecated.
These models were only trained to verify that removal of previous action didn't fundamnetally change the training dynamics with and without use of previous action.
Wandb runs here, filter for *_pa_5 and *_pa_1 to see runs with previous action (the previous action dimension is 5 for null, up, left, right down) and without previous action (the previous action dimension is 1 for null).
See config.cfg for hyperparams, most are shared, the main difference is choice of alpha, discount_rate and num_prev_action, which as a result of this experiment now defaults to 1.
Hyperparams swept over
num_prev_actions=?
discount_rate=?
alpha=?
seed=?
Model name format
al_{alpha}_g_{discoutn_rate}_seed_{seed}_pa_{num_prev_actions}
Shared Hyperparams
rl_action=train
num_rollout_steps=64
lr=5e-05
eff_horizon=None
eval_every=1
use_wandb=True
use_hf=True
use_log=True
num_total_env_steps=5000000000
checkpoint=al_0.47_g_0.95_seed_105_pa_1
render_sixel=False
sixel_idx=60
seed=105
mask_type=first_episode
penalize_time=False
optim=adam
live_monitor=False
use_bf16=False
deterministic=True
eval_schedule=0:1,250:2,500:5,1000:10,2000:20
grad_acc_per_chunk=5
num_rollout_chunks=1
cheese_loc=any
env_layout=open
env_size=13
num_levels=9600
f_str_ckpt=al_{alpha}_g_{discount_rate}_seed_{seed}_pa_{num_prev_actions}
wandb_project=jaxgmg2_3phase_seed
ckpt_dir=jaxgmg2_3phase_seed
duplication_factor=-1
smoke=False
compile=True
num_chains=6
num_draws=3000
num_steps_bw_draws=1
on_policy=True
llc_nbeta=3000
localization=10
exact_solver_each_draw=False
llc_optimizer=sgld
iw_clip_eps=None
rmsprop_burnin_steps=20
llc_data_file=llc_scan_open_reinforce.pkl
llc_checkpoint_index=None
llc_checkpoint_number=None
sink=None
repo_id=davidquarel/jaxgmg_ckpt_zip
use_shuffled_checkpoints=False
force_re_download=False
off_distribution_data=False
weight_restrictions=None
weight_restrictions_invert=False
evaluate_every_position=False
use_prev_action=False
ntfy=david_jaxgmg