Spatial-SSRL Spatial Reasoning
Spatial reasoning with vision-language models
New Ghibli EasyControl model is now released!!
An Agentic Framework with Tools for Complex Reasoning
View the LMArena language model leaderboard
ElevenLab Italian demo
FitDiT is a high-fidelity virtual try-on model.
Easily expand image boundaries
Upgraded to v1.0!
Add a logo to anything
Audio Conditioned LipSync with Latent Diffusion Models
Colorize grayscale photos using AI-generated captions
Generate app code from your idea
Generate new person images with swapped clothes or poses
Convert images of screens to structured elements
Fill and modify images using a mask and prompt
Generate 3D textured mesh from a single image
Segment and extract subjects from images
Text Behind Image using birefnet-lite for background removal