4M: Massively Multimodal Masked Modeling
Generate images from text prompts
Compare different visual question answering
Convert screenshot to HTML code and preview