FloydARC (ARC-AGI Reasoning)

Model Summary

FloydARC is a neural algorithmic reasoning model adapted from FloydNet for the ARC-AGI benchmark. This checkpoint is trained primarily on ARC-style synthetic and curated data, and is designed to solve ARC tasks via iterative refinement and test-time adaptation, rather than large-scale web pretraining.

Among models trained mainly on ARC-like data, FloydARC achieves state-of-the-art performance on both ARC-AGI-1 and ARC-AGI-2, significantly narrowing the gap to very large proprietary models.


Performance

FloydARC demonstrates strong generalization on ARC benchmarks under standard evaluation protocols.

ARC-AGI benchmark results:

Model #Params ARC-AGI-1 ARC-AGI-2
VARC 73M 60.4 11.1
Loop-ViT 11.2M 61.2 10.3
HRM 27M 40.3 5.0
FloydARC 153.7M 70.5 15.3

Model Details

  • Model ID: ocxlabs/FloydARC
  • Task: Abstraction and Reasoning Corpus (ARC-AGI)
  • Architecture: FloydNet-based global relational reasoning with looped refinement
  • Input / Output: ARC grid-based visual reasoning (query canvas → predicted answer canvas)
  • License: Apache 2.0

Usage: Inference & Evaluation

This checkpoint is intended for research and evaluation use on ARC-AGI. Full reproduction of reported results requires multi-GPU inference with test-time training.

1. Download checkpoint

Download the pretrained checkpoint from Hugging Face:

https://huggingface.co/ocxlabs/FloydARC

Place the downloaded folder anywhere on disk and pass its path via --ckpt_path.


2. Prepare ARC evaluation data

Place the original ARC JSON files under rawdata/, then preprocess:

python -m scripts.process_data \
  --input_dir ./rawdata/ARC-AGI-1_evaluation/ \
  --output_dir ./preprocessed/arc1 \
  --split test

Repeat with ARC-AGI-2_evaluation for ARC-AGI-2.


3. Run inference with Test-Time Training (recommended)

python -m scripts.TTT \
  --ckpt_path /path/to/floydarc_ckpt \
  --subset arc1 \
  --output_dir ./output/TTT_results

Notes:

  • Default configuration uses 8 GPUs on a single node
  • LoRA-based TTT is enabled by default and recommended
  • For ARC-AGI-2, set --subset arc2

4. Ensembling & visualization

For reproducible evaluation and qualitative inspection:

python -m scripts.analyze \
  --result-folder ./output/TTT_results \
  --subset arc1 \
  --out-html output/arc1_results.html

Multiple result folders can be passed to enable max-voting ensembling.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support