pi05-build-block-tower-rlt-6mix-retain-alpha05

RL Token (RLT) encoder-decoder trained on the 6-dataset build-block-tower mixture, on top of the retain/step_49999/alpha_0.5 checkpoint from pi05-build-block-tower-6mix.

What is this?

This model is a lightweight transformer encoder-decoder which takes inputs from a frozen Pi-05 VLA backbone. The encoder compresses the VLA's final-layer prefix embeddings into a single RL token via a learned query. The decoder autoregressively reconstructs the original embeddings from only this token, forcing it to act as an information bottleneck. See Xu et al. (2026), Precise Manipulation with Efficient Online RL for the method.

This variant uses the retain/alpha_0.5 backbone β€” a checkpoint produced by applying representation-level retention (alpha=0.5 interpolation toward the pre-fine-tuning weights) to the 6mix baseline at step 49999.

Training

  • Config: pi05_rlt_build_block_tower_6mix
  • VLA backbone: pravsels/pi05-build-block-tower-6mix retain/step_49999/alpha_0.5 (frozen, rl_vla_loss_weight=0.0)
  • Encoder-decoder: 2-layer transformer, 8 heads, 8192 MLP dim, 2048 embedding dim
  • Dataset: 6 LeRobot v2.1 datasets (build_block_tower + dAgger 1.0.0–1.4.0)
  • Batch size: 36
  • LR: 5e-5 cosine (1k warmup)
  • Steps: 50,000 (initial 20k + resumed 30k)
  • Runtime: ~14h total on 4x GH200 (Isambard)

Loss progression

Step Train Loss Val Loss Gap
0 β€” 11354.7 β€”
1,000 β€” 2507.0 β€”
5,000 640.8 701.6 60.8
10,000 472.8 536.0 63.2
15,000 399.9 487.8 87.9
19,999 356.3 464.8 108.5
25,000 326.6 446.8 120.2
30,000 304.9 439.3 134.4
35,000 288.4 432.7 144.3
40,000 275.6 423.3 147.7
45,000 265.4 414.3 148.9
49,000 259.8 425.4 165.6
49,900 256.4 β€” β€”

Val loss decreased steadily, reaching a minimum of 414.3 at step 45,000 before beginning to rise. The 45,000 checkpoint is recommended for deployment as it has the lowest validation loss.

Checkpoints

Step Val Loss Recommended Params SHA256
19999 464.8 493ae11e5c95be5340e9106e54cac3f2219f6d1407a9081fc1c35595f5143cdb
45000 414.3 βœ“ 1af50d87b765942801fd6be6afb5df3bcc69065636b614702f0b1f34fd3daec1

Verifying checkpoint hashes

cd checkpoints/<step> && find params -type f | sort | xargs sha256sum | sha256sum

Repo layout

assets/                       # Norm stats, valid indices, episode split
checkpoints/19999/params/     # Step 19999 model weights
checkpoints/45000/params/     # Step 45000 model weights (recommended)
TRAINING_LOG.md               # Training log

W&B

Training curves: https://wandb.ai/pravsels/pi05-build-block-tower-rlt-6mix-retain-alpha05/runs/g5myo76p

Usage

import openpi.models.model as _model
import openpi.training.config as _config

config = _config.get_config(pi05_rlt_build_block_tower_6mix)
params = _model.restore_params(checkpoints/45000/params, restore_type=np.ndarray)
model = config.model.load(params)
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading