--- library_name: coreml pipeline_tag: image-to-image tags: - super-resolution - apple-silicon - neural-engine - ane - coreml - real-time - video-upscaling - macos license: apache-2.0 datasets: - eugenesiow/Div2k metrics: - psnr - ssim model-index: - name: PiperSR-2x results: - task: type: image-super-resolution name: Image Super-Resolution dataset: type: Set5 name: Set5 metrics: - type: psnr value: 37.54 name: PSNR - task: type: image-super-resolution name: Image Super-Resolution dataset: type: Set14 name: Set14 metrics: - type: psnr value: 33.21 name: PSNR - task: type: image-super-resolution name: Image Super-Resolution dataset: type: BSD100 name: BSD100 metrics: - type: psnr value: 31.98 name: PSNR - task: type: image-super-resolution name: Image Super-Resolution dataset: type: Urban100 name: Urban100 metrics: - type: psnr value: 31.38 name: PSNR --- # PiperSR-2x: ANE-Native Super Resolution for Apple Silicon Real-time 2x AI upscaling on Apple's Neural Engine. 44.4 FPS at 720p on M2 Max, 928 KB model, every op runs natively on ANE with zero CPU/GPU fallback. Not a converted PyTorch model — an architecture designed from ANE hardware measurements. Every dimension, operation, and data type is dictated by Neural Engine characteristics. ## Key Results | Model | Params | Set5 | Set14 | BSD100 | Urban100 | |-------|--------|------|-------|--------|----------| | Bicubic | — | 33.66 | 30.24 | 29.56 | 26.88 | | FSRCNN | 13K | 37.05 | 32.66 | 31.53 | 29.88 | | **PiperSR** | **453K** | **37.54** | **33.21** | **31.98** | **31.38** | | SAFMN | 228K | 38.00 | ~33.7 | ~32.2 | — | Beats FSRCNN across all benchmarks. Within 0.46 dB of SAFMN on Set5 — below the perceptual threshold for most content. ## Performance | Configuration | FPS | Hardware | Notes | |--------------|-----|----------|-------| | Full-frame 640×360 → 1280×720 | 44.4 | M2 Max | ANE predict 20.8 ms | | 128×128 tiles (static weights) | 125.6 | M2 | Baked weights, 2.82× vs dynamic | | 128×128 tiles (dynamic weights) | 44.5 | M2 | CoreML default | Real-time 2× upscaling at 30+ FPS on any Mac with Apple Silicon. The ANE sits idle during video playback — PiperSR puts it to work. ## Architecture 453K-parameter network: 6 residual blocks at 64 channels with BatchNorm and SiLU activations, upscaling via PixelShuffle. ``` Input (128×128×3 FP16) → Head: Conv 3×3 (3 → 64) → Body: 6× ResBlock [Conv 3×3 → BatchNorm → SiLU → Conv 3×3 → BatchNorm → Residual Add] → Tail: Conv 3×3 (64 → 12) → PixelShuffle(2) Output (256×256×3) ``` Compiles to 5 MIL ops: `conv`, `add`, `silu`, `pixel_shuffle`, `const`. All verified ANE-native. ### Why ANE-native matters Off-the-shelf super resolution models (SPAN, Real-ESRGAN) were designed for CUDA GPUs and converted to CoreML after the fact. They waste the ANE: - **Misaligned channels** (48 instead of 64) waste 25%+ of each ANE tile - **Monolithic full-frame** tensors serialize the ANE's parallel compute lanes - **Silent CPU fallback** from unsupported ops can 5-10× latency - **No batched tiles** means 60× dispatch overhead PiperSR addresses every one of these by designing around ANE constraints. ## Model Variants | File | Use Case | Input → Output | |------|----------|----------------| | `PiperSR_2x.mlpackage` | Static images (128px tiles) | 128×128 → 256×256 | | `PiperSR_2x_video_720p.mlpackage` | Video (full-frame, BN-fused) | 640×360 → 1280×720 | | `PiperSR_2x_256.mlpackage` | Static images (256px tiles) | 256×256 → 512×512 | ## Usage ### With ToolPiper (recommended) PiperSR is integrated into [ToolPiper](https://modelpiper.com), a local macOS AI toolkit. Install ToolPiper, enable the MediaPiper browser extension, and every 720p video on the web is upscaled to 1440p in real time. ```bash # Via MCP tool mcp__toolpiper__image_upscale image=/path/to/image.png # Via REST API curl -X POST http://127.0.0.1:9998/v1/images/upscale \ -F "image=@input.png" \ -o upscaled.png ``` ### With CoreML (Swift) ```swift import CoreML let config = MLModelConfiguration() config.computeUnits = .cpuAndNeuralEngine // NOT .all — .all is 23.6% slower let model = try PiperSR_2x(configuration: config) let input = try PiperSR_2xInput(x: pixelBuffer) let output = try model.prediction(input: input) // output.var_185 contains the 2× upscaled image ``` > **Important:** Use `.cpuAndNeuralEngine`, not `.all`. CoreML's `.all` silently misroutes pure-ANE ops onto the GPU, causing a 23.6% slowdown for this model. ### With coremltools (Python) ```python import coremltools as ct from PIL import Image import numpy as np model = ct.models.MLModel("PiperSR_2x.mlpackage") img = Image.open("input.png").resize((128, 128)) arr = np.array(img).astype(np.float32) / 255.0 arr = np.transpose(arr, (2, 0, 1))[np.newaxis] # NCHW result = model.predict({"x": arr}) ``` ## Training Trained on DIV2K (800 training images) with L1 loss and random augmentation (flips, rotations). Total training cost: ~$6 on RunPod A6000 instances. Full training journey documented from 33.46 dB to 37.54 dB across 12 experiment findings. ## Technical Details - **Compute units:** `.cpuAndNeuralEngine` (ANE primary, CPU for I/O only) - **Precision:** Float16 - **Input format:** NCHW, normalized to [0, 1] - **Output format:** NCHW, [0, 1] - **Model size:** 928 KB (compiled .mlmodelc) - **Parameters:** 453K - **ANE ops used:** conv, batch_norm (fused at inference), silu, add, pixel_shuffle, const - **CPU fallback ops:** None ## License Apache 2.0 ## Citation ```bibtex @software{pipersr2025, title={PiperSR: ANE-Native Super Resolution for Apple Silicon}, author={ModelPiper}, year={2025}, url={https://huggingface.co/ModelPiper/PiperSR-2x} } ```