FloydARC (ARC-AGI Reasoning)
Model Summary
FloydARC is a neural algorithmic reasoning model adapted from FloydNet for the ARC-AGI benchmark. This checkpoint is trained primarily on ARC-style synthetic and curated data, and is designed to solve ARC tasks via iterative refinement and test-time adaptation, rather than large-scale web pretraining.
Among models trained mainly on ARC-like data, FloydARC achieves state-of-the-art performance on both ARC-AGI-1 and ARC-AGI-2, significantly narrowing the gap to very large proprietary models.
Performance
FloydARC demonstrates strong generalization on ARC benchmarks under standard evaluation protocols.
ARC-AGI benchmark results:
| Model | #Params | ARC-AGI-1 | ARC-AGI-2 |
|---|---|---|---|
| VARC | 73M | 60.4 | 11.1 |
| Loop-ViT | 11.2M | 61.2 | 10.3 |
| HRM | 27M | 40.3 | 5.0 |
| FloydARC | 153.7M | 70.5 | 15.3 |
Model Details
- Model ID:
ocxlabs/FloydARC - Task: Abstraction and Reasoning Corpus (ARC-AGI)
- Architecture: FloydNet-based global relational reasoning with looped refinement
- Input / Output: ARC grid-based visual reasoning (query canvas → predicted answer canvas)
- License: Apache 2.0
Usage: Inference & Evaluation
This checkpoint is intended for research and evaluation use on ARC-AGI. Full reproduction of reported results requires multi-GPU inference with test-time training.
1. Download checkpoint
Download the pretrained checkpoint from Hugging Face:
https://huggingface.co/ocxlabs/FloydARC
Place the downloaded folder anywhere on disk and pass its path via --ckpt_path.
2. Prepare ARC evaluation data
Place the original ARC JSON files under rawdata/, then preprocess:
python -m scripts.process_data \
--input_dir ./rawdata/ARC-AGI-1_evaluation/ \
--output_dir ./preprocessed/arc1 \
--split test
Repeat with ARC-AGI-2_evaluation for ARC-AGI-2.
3. Run inference with Test-Time Training (recommended)
python -m scripts.TTT \
--ckpt_path /path/to/floydarc_ckpt \
--subset arc1 \
--output_dir ./output/TTT_results
Notes:
- Default configuration uses 8 GPUs on a single node
- LoRA-based TTT is enabled by default and recommended
- For ARC-AGI-2, set
--subset arc2
4. Ensembling & visualization
For reproducible evaluation and qualitative inspection:
python -m scripts.analyze \
--result-folder ./output/TTT_results \
--subset arc1 \
--out-html output/arc1_results.html
Multiple result folders can be passed to enable max-voting ensembling.