Papers
arxiv:2604.08364

MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping

Published on Apr 9
· Submitted by
Gaojunyao
on Apr 10
#3 Paper of the day
Authors:
,
,
,
,
,
,
,

Abstract

MegaStyle presents a scalable data curation pipeline for creating high-quality, style-consistent datasets using large generative models and proposes style-supervised contrastive learning for effective style representation extraction.

AI-generated summary

In this paper, we introduce MegaStyle, a novel and scalable data curation pipeline that constructs an intra-style consistent, inter-style diverse and high-quality style dataset. We achieve this by leveraging the consistent text-to-image style mapping capability of current large generative models, which can generate images in the same style from a given style description. Building on this foundation, we curate a diverse and balanced prompt gallery with 170K style prompts and 400K content prompts, and generate a large-scale style dataset MegaStyle-1.4M via content-style prompt combinations. With MegaStyle-1.4M, we propose style-supervised contrastive learning to fine-tune a style encoder MegaStyle-Encoder for extracting expressive, style-specific representations, and we also train a FLUX-based style transfer model MegaStyle-FLUX. Extensive experiments demonstrate the importance of maintaining intra-style consistency, inter-style diversity and high-quality for style dataset, as well as the effectiveness of the proposed MegaStyle-1.4M. Moreover, when trained on MegaStyle-1.4M, MegaStyle-Encoder and MegaStyle-FLUX provide reliable style similarity measurement and generalizable style transfer, making a significant contribution to the style transfer community. More results are available at our project website https://jeoyal.github.io/MegaStyle/.

Community

Paper author Paper submitter

teaser
Visualizations of our style dataset (a)MegaStyle-1.4M and the stylized results produced by our style transfer model (b)MegaStyle-FLUX. MegaStyle-1.4M contains style pairs that share the style but have different content (intra-style consistency), as well as a large number of diverse styles (inter-style diversity). Trained on MegaStyle-1.4M, MegaStyle-FLUX effectively captures nuances—such as color, light, texture and brushwork—across various styles.

Great work!

·

Thanks for supporting!

Paper author Paper submitter

The dataset, model, and code are expected to be publicly released on April 20.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.08364
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.08364 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.08364 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.08364 in a Space README.md to link it from this page.

Collections including this paper 1