CLaRa models
AI & ML interests
None defined yet.
Recent Activity
Papers
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting
RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning
Team members 791 private
MobileCLIP2: Mobile-friendly image-text models with SOTA zero-shot capabilities trained on DFNDR-2B
A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint.
-
apple/aimv2-large-patch14-224
Image Feature Extraction β’ 0.3B β’ Updated β’ 1.48k β’ 62 -
apple/aimv2-huge-patch14-224
Image Feature Extraction β’ 0.7B β’ Updated β’ 46 β’ 13 -
apple/aimv2-1B-patch14-224
Image Feature Extraction β’ 1B β’ Updated β’ 185 β’ 8 -
apple/aimv2-3B-patch14-224
Image Feature Extraction β’ 3B β’ Updated β’ 36 β’ 4
-
apple/OpenELM-270M-Instruct
Text Generation β’ 0.3B β’ Updated β’ 2.25k β’ 145 -
apple/OpenELM-450M-Instruct
Text Generation β’ 0.5B β’ Updated β’ 755 β’ 51 -
apple/OpenELM-1_1B-Instruct
Text Generation β’ Updated β’ 1.5M β’ 74 -
apple/OpenELM-3B-Instruct
Text Generation β’ 3B β’ Updated β’ 3.4k β’ 339
MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities.
DataCompDR: Improved datasets for training image-text SOTA models.
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper β’ 2311.17049 β’ Published β’ 6 -
apple/mobileclip_s0_timm
Image Classification β’ Updated β’ 63 β’ 12 -
apple/mobileclip_s1_timm
Image Classification β’ Updated β’ 55 β’ 3 -
apple/mobileclip_s2_timm
Image Classification β’ Updated β’ 53 β’ 6
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
CLIP Models trained using DFN-2B/DFN-5B datasets
DCLM Models + Datasets
Efficient Vision Encoding for Vision Language Models
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper β’ 2412.13303 β’ Published β’ 75 -
FastVLM WebGPU
π445Real-time video captioning powered by FastVLM
-
apple/FastVLM-0.5B
Text Generation β’ 0.8B β’ Updated β’ 12.7k β’ 388 -
apple/FastVLM-1.5B
Text Generation β’ 2B β’ Updated β’ 2.08k β’ 79
-
apple/DiffuCoder-7B-cpGRPO
8B β’ Updated β’ 1.82k β’ 316 -
apple/DiffuCoder-7B-Instruct
8B β’ Updated β’ 1.58k β’ 61 -
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
Paper β’ 2506.20639 β’ Published β’ 31 -
apple/DiffuCoder-7B-Base
8B β’ Updated β’ 967 β’ 29
-
apple/coreml-depth-anything-v2-small
Depth Estimation β’ Updated β’ 673 β’ 93 -
apple/coreml-depth-anything-small
Depth Estimation β’ Updated β’ 279 β’ 40 -
apple/coreml-detr-semantic-segmentation
Image Segmentation β’ Updated β’ 266 β’ 32 -
apple/coreml-FastViT-T8
Image Classification β’ Updated β’ 40 β’ 17
Benchmark for the design of efficient continual learning of image-text models over years.
-
TiC-CLIP: Continual Training of CLIP Models
Paper β’ 2310.16226 β’ Published β’ 10 -
apple/TiC-DataComp
Preview β’ Updated β’ 2.07k β’ 4 -
apple/TiC-CLIP-basic-cumulative
Zero-Shot Image Classification β’ Updated β’ 135 β’ 3 -
apple/TiC-CLIP-basic-oracle
Zero-Shot Image Classification β’ Updated β’ 14
-
apple/coreml-stable-diffusion-mixed-bit-palettization
Updated β’ 64 β’ 30 -
apple/coreml-stable-diffusion-xl-base
Text-to-Image β’ Updated β’ 99 β’ 70 -
apple/coreml-stable-diffusion-2-1-base
Text-to-Image β’ Updated β’ 224 β’ 55 -
pcuenq/coreml-stable-diffusion-2-1-base
Text-to-Image β’ Updated β’ 4
AIM: Autoregressive Image Models
CLaRa models
Efficient Vision Encoding for Vision Language Models
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper β’ 2412.13303 β’ Published β’ 75 -
FastVLM WebGPU
π445Real-time video captioning powered by FastVLM
-
apple/FastVLM-0.5B
Text Generation β’ 0.8B β’ Updated β’ 12.7k β’ 388 -
apple/FastVLM-1.5B
Text Generation β’ 2B β’ Updated β’ 2.08k β’ 79
MobileCLIP2: Mobile-friendly image-text models with SOTA zero-shot capabilities trained on DFNDR-2B
-
apple/DiffuCoder-7B-cpGRPO
8B β’ Updated β’ 1.82k β’ 316 -
apple/DiffuCoder-7B-Instruct
8B β’ Updated β’ 1.58k β’ 61 -
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
Paper β’ 2506.20639 β’ Published β’ 31 -
apple/DiffuCoder-7B-Base
8B β’ Updated β’ 967 β’ 29
A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint.
-
apple/aimv2-large-patch14-224
Image Feature Extraction β’ 0.3B β’ Updated β’ 1.48k β’ 62 -
apple/aimv2-huge-patch14-224
Image Feature Extraction β’ 0.7B β’ Updated β’ 46 β’ 13 -
apple/aimv2-1B-patch14-224
Image Feature Extraction β’ 1B β’ Updated β’ 185 β’ 8 -
apple/aimv2-3B-patch14-224
Image Feature Extraction β’ 3B β’ Updated β’ 36 β’ 4
-
apple/coreml-depth-anything-v2-small
Depth Estimation β’ Updated β’ 673 β’ 93 -
apple/coreml-depth-anything-small
Depth Estimation β’ Updated β’ 279 β’ 40 -
apple/coreml-detr-semantic-segmentation
Image Segmentation β’ Updated β’ 266 β’ 32 -
apple/coreml-FastViT-T8
Image Classification β’ Updated β’ 40 β’ 17
-
apple/OpenELM-270M-Instruct
Text Generation β’ 0.3B β’ Updated β’ 2.25k β’ 145 -
apple/OpenELM-450M-Instruct
Text Generation β’ 0.5B β’ Updated β’ 755 β’ 51 -
apple/OpenELM-1_1B-Instruct
Text Generation β’ Updated β’ 1.5M β’ 74 -
apple/OpenELM-3B-Instruct
Text Generation β’ 3B β’ Updated β’ 3.4k β’ 339
MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities.
DataCompDR: Improved datasets for training image-text SOTA models.
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper β’ 2311.17049 β’ Published β’ 6 -
apple/mobileclip_s0_timm
Image Classification β’ Updated β’ 63 β’ 12 -
apple/mobileclip_s1_timm
Image Classification β’ Updated β’ 55 β’ 3 -
apple/mobileclip_s2_timm
Image Classification β’ Updated β’ 53 β’ 6
Benchmark for the design of efficient continual learning of image-text models over years.
-
TiC-CLIP: Continual Training of CLIP Models
Paper β’ 2310.16226 β’ Published β’ 10 -
apple/TiC-DataComp
Preview β’ Updated β’ 2.07k β’ 4 -
apple/TiC-CLIP-basic-cumulative
Zero-Shot Image Classification β’ Updated β’ 135 β’ 3 -
apple/TiC-CLIP-basic-oracle
Zero-Shot Image Classification β’ Updated β’ 14
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
-
apple/coreml-stable-diffusion-mixed-bit-palettization
Updated β’ 64 β’ 30 -
apple/coreml-stable-diffusion-xl-base
Text-to-Image β’ Updated β’ 99 β’ 70 -
apple/coreml-stable-diffusion-2-1-base
Text-to-Image β’ Updated β’ 224 β’ 55 -
pcuenq/coreml-stable-diffusion-2-1-base
Text-to-Image β’ Updated β’ 4
CLIP Models trained using DFN-2B/DFN-5B datasets
AIM: Autoregressive Image Models
DCLM Models + Datasets