---
library_name: coreml
pipeline_tag: image-to-image
tags:
  - super-resolution
  - apple-silicon
  - neural-engine
  - ane
  - coreml
  - real-time
  - video-upscaling
  - macos
license: apache-2.0
datasets:
  - eugenesiow/Div2k
metrics:
  - psnr
  - ssim
model-index:
  - name: PiperSR-2x
    results:
      - task:
          type: image-super-resolution
          name: Image Super-Resolution
        dataset:
          type: Set5
          name: Set5
        metrics:
          - type: psnr
            value: 37.54
            name: PSNR
      - task:
          type: image-super-resolution
          name: Image Super-Resolution
        dataset:
          type: Set14
          name: Set14
        metrics:
          - type: psnr
            value: 33.21
            name: PSNR
      - task:
          type: image-super-resolution
          name: Image Super-Resolution
        dataset:
          type: BSD100
          name: BSD100
        metrics:
          - type: psnr
            value: 31.98
            name: PSNR
      - task:
          type: image-super-resolution
          name: Image Super-Resolution
        dataset:
          type: Urban100
          name: Urban100
        metrics:
          - type: psnr
            value: 31.38
            name: PSNR
---

# PiperSR-2x: ANE-Native Super Resolution for Apple Silicon

Real-time 2x AI upscaling on Apple's Neural Engine. 44.4 FPS at 720p on M2 Max, 928 KB model, every op runs natively on ANE with zero CPU/GPU fallback.

Not a converted PyTorch model — an architecture designed from ANE hardware measurements. Every dimension, operation, and data type is dictated by Neural Engine characteristics.

## Key Results

| Model | Params | Set5 | Set14 | BSD100 | Urban100 |
|-------|--------|------|-------|--------|----------|
| Bicubic | — | 33.66 | 30.24 | 29.56 | 26.88 |
| FSRCNN | 13K | 37.05 | 32.66 | 31.53 | 29.88 |
| **PiperSR** | **453K** | **37.54** | **33.21** | **31.98** | **31.38** |
| SAFMN | 228K | 38.00 | ~33.7 | ~32.2 | — |

Beats FSRCNN across all benchmarks. Within 0.46 dB of SAFMN on Set5 — below the perceptual threshold for most content.

## Performance

| Configuration | FPS | Hardware | Notes |
|--------------|-----|----------|-------|
| Full-frame 640×360 → 1280×720 | 44.4 | M2 Max | ANE predict 20.8 ms |
| 128×128 tiles (static weights) | 125.6 | M2 | Baked weights, 2.82× vs dynamic |
| 128×128 tiles (dynamic weights) | 44.5 | M2 | CoreML default |

Real-time 2× upscaling at 30+ FPS on any Mac with Apple Silicon. The ANE sits idle during video playback — PiperSR puts it to work.

## Architecture

453K-parameter network: 6 residual blocks at 64 channels with BatchNorm and SiLU activations, upscaling via PixelShuffle.

```
Input (128×128×3 FP16)
  → Head: Conv 3×3 (3 → 64)
  → Body: 6× ResBlock [Conv 3×3 → BatchNorm → SiLU → Conv 3×3 → BatchNorm → Residual Add]
  → Tail: Conv 3×3 (64 → 12) → PixelShuffle(2)
Output (256×256×3)
```

Compiles to 5 MIL ops: `conv`, `add`, `silu`, `pixel_shuffle`, `const`. All verified ANE-native.

### Why ANE-native matters

Off-the-shelf super resolution models (SPAN, Real-ESRGAN) were designed for CUDA GPUs and converted to CoreML after the fact. They waste the ANE:

- **Misaligned channels** (48 instead of 64) waste 25%+ of each ANE tile
- **Monolithic full-frame** tensors serialize the ANE's parallel compute lanes
- **Silent CPU fallback** from unsupported ops can 5-10× latency
- **No batched tiles** means 60× dispatch overhead

PiperSR addresses every one of these by designing around ANE constraints.

## Model Variants

| File | Use Case | Input → Output |
|------|----------|----------------|
| `PiperSR_2x.mlpackage` | Static images (128px tiles) | 128×128 → 256×256 |
| `PiperSR_2x_video_720p.mlpackage` | Video (full-frame, BN-fused) | 640×360 → 1280×720 |
| `PiperSR_2x_256.mlpackage` | Static images (256px tiles) | 256×256 → 512×512 |

## Usage

### With ToolPiper (recommended)

PiperSR is integrated into [ToolPiper](https://modelpiper.com), a local macOS AI toolkit. Install ToolPiper, enable the MediaPiper browser extension, and every 720p video on the web is upscaled to 1440p in real time.

```bash
# Via MCP tool
mcp__toolpiper__image_upscale image=/path/to/image.png

# Via REST API
curl -X POST http://127.0.0.1:9998/v1/images/upscale \
  -F "image=@input.png" \
  -o upscaled.png
```

### With CoreML (Swift)

```swift
import CoreML

let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine  // NOT .all — .all is 23.6% slower

let model = try PiperSR_2x(configuration: config)
let input = try PiperSR_2xInput(x: pixelBuffer)
let output = try model.prediction(input: input)
// output.var_185 contains the 2× upscaled image
```

> **Important:** Use `.cpuAndNeuralEngine`, not `.all`. CoreML's `.all` silently misroutes pure-ANE ops onto the GPU, causing a 23.6% slowdown for this model.

### With coremltools (Python)

```python
import coremltools as ct
from PIL import Image
import numpy as np

model = ct.models.MLModel("PiperSR_2x.mlpackage")

img = Image.open("input.png").resize((128, 128))
arr = np.array(img).astype(np.float32) / 255.0
arr = np.transpose(arr, (2, 0, 1))[np.newaxis]  # NCHW

result = model.predict({"x": arr})
```

## Training

Trained on DIV2K (800 training images) with L1 loss and random augmentation (flips, rotations). Total training cost: ~$6 on RunPod A6000 instances. Full training journey documented from 33.46 dB to 37.54 dB across 12 experiment findings.

## Technical Details

- **Compute units:** `.cpuAndNeuralEngine` (ANE primary, CPU for I/O only)
- **Precision:** Float16
- **Input format:** NCHW, normalized to [0, 1]
- **Output format:** NCHW, [0, 1]
- **Model size:** 928 KB (compiled .mlmodelc)
- **Parameters:** 453K
- **ANE ops used:** conv, batch_norm (fused at inference), silu, add, pixel_shuffle, const
- **CPU fallback ops:** None

## License

Apache 2.0

## Citation

```bibtex
@software{pipersr2025,
  title={PiperSR: ANE-Native Super Resolution for Apple Silicon},
  author={ModelPiper},
  year={2025},
  url={https://huggingface.co/ModelPiper/PiperSR-2x}
}
```