RealGRPO FLUX DiT Weights
This repository provides DiT weights fine-tuned from FLUX.1-dev with GRPO using the RealGRPO strategy.
RealGRPO targets a common post-training issue in image generation: reward hacking (e.g., over-smoothing, over-saturation, and synthetic-looking artifacts).
Compared with vanilla FLUX and standard GRPO baselines, these weights are optimized to better preserve prompt intent while reducing reward-driven artifacts.
What Is Included
- Fine-tuned FLUX DiT weights (GRPO post-training).
- Training objective based on contrastive positive/negative style guidance.
- Compatibility with the RealGRPO codebase inference scripts.
Method (Brief)
RealGRPO uses a LLM to generate prompt-specific style pairs:
- positive style cues (
pos_style) - negative style cues (
neg_style)
The reward encourages similarity to positive cues while penalizing negative cues, helping the model avoid artifact-prone shortcuts during alignment.
Note: This release contains DiT alignment weights, not a standalone full pipeline package. You need download black-forest-labs/FLUX.1-dev and replace the contents of the
transfermerdirectory with the contents of this repository.
- Downloads last month
- 5
Model tree for YangZhou24/RealGRPO
Base model
black-forest-labs/FLUX.1-dev