pi05-build-block-tower-rlt-6mix-retain-alpha05

RL Token (RLT) encoder-decoder trained on the 6-dataset build-block-tower mixture, on top of the retain/step_49999/alpha_0.5 checkpoint from pi05-build-block-tower-6mix.

What is this?

This model is a lightweight transformer encoder-decoder which takes inputs from a frozen Pi-05 VLA backbone. The encoder compresses the VLA's final-layer prefix embeddings into a single RL token via a learned query. The decoder autoregressively reconstructs the original embeddings from only this token, forcing it to act as an information bottleneck. See Xu et al. (2026), Precise Manipulation with Efficient Online RL for the method.

This variant uses the retain/alpha_0.5 backbone — a checkpoint produced by applying representation-level retention (alpha=0.5 interpolation toward the pre-fine-tuning weights) to the 6mix baseline at step 49999.

Training

Config: pi05_rlt_build_block_tower_6mix
VLA backbone: pravsels/pi05-build-block-tower-6mix retain/step_49999/alpha_0.5 (frozen, rl_vla_loss_weight=0.0)
Encoder-decoder: 2-layer transformer, 8 heads, 8192 MLP dim, 2048 embedding dim
Dataset: 6 LeRobot v2.1 datasets (build_block_tower + dAgger 1.0.0–1.4.0)
Batch size: 36
LR: 5e-5 cosine (1k warmup)
Steps: 50,000 (initial 20k + resumed 30k)
Runtime: ~14h total on 4x GH200 (Isambard)

Loss progression

Step	Train Loss	Val Loss	Gap
0	—	11354.7	—
1,000	—	2507.0	—
5,000	640.8	701.6	60.8
10,000	472.8	536.0	63.2
15,000	399.9	487.8	87.9
19,999	356.3	464.8	108.5
25,000	326.6	446.8	120.2
30,000	304.9	439.3	134.4
35,000	288.4	432.7	144.3
40,000	275.6	423.3	147.7
45,000	265.4	414.3	148.9
49,000	259.8	425.4	165.6
49,900	256.4	—	—

Val loss decreased steadily, reaching a minimum of 414.3 at step 45,000 before beginning to rise. The 45,000 checkpoint is recommended for deployment as it has the lowest validation loss.

Checkpoints

Step	Val Loss	Recommended	Params SHA256
19999	464.8		`493ae11e5c95be5340e9106e54cac3f2219f6d1407a9081fc1c35595f5143cdb`
45000	414.3	✓	`1af50d87b765942801fd6be6afb5df3bcc69065636b614702f0b1f34fd3daec1`

Verifying checkpoint hashes

cd checkpoints/<step> && find params -type f | sort | xargs sha256sum | sha256sum

Repo layout

assets/                       # Norm stats, valid indices, episode split
checkpoints/19999/params/     # Step 19999 model weights
checkpoints/45000/params/     # Step 45000 model weights (recommended)
TRAINING_LOG.md               # Training log

W&B

Training curves: https://wandb.ai/pravsels/pi05-build-block-tower-rlt-6mix-retain-alpha05/runs/g5myo76p

Usage

import openpi.models.model as _model
import openpi.training.config as _config

config = _config.get_config(pi05_rlt_build_block_tower_6mix)
params = _model.restore_params(checkpoints/45000/params, restore_type=np.ndarray)
model = config.model.load(params)

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics