RealGRPO FLUX DiT Weights

This repository provides DiT weights fine-tuned from FLUX.1-dev with GRPO using the RealGRPO strategy.

RealGRPO targets a common post-training issue in image generation: reward hacking (e.g., over-smoothing, over-saturation, and synthetic-looking artifacts).
Compared with vanilla FLUX and standard GRPO baselines, these weights are optimized to better preserve prompt intent while reducing reward-driven artifacts.

What Is Included

  • Fine-tuned FLUX DiT weights (GRPO post-training).
  • Training objective based on contrastive positive/negative style guidance.
  • Compatibility with the RealGRPO codebase inference scripts.

Method (Brief)

RealGRPO uses a LLM to generate prompt-specific style pairs:

  • positive style cues (pos_style)
  • negative style cues (neg_style)

The reward encourages similarity to positive cues while penalizing negative cues, helping the model avoid artifact-prone shortcuts during alignment.

Note: This release contains DiT alignment weights, not a standalone full pipeline package. You need download black-forest-labs/FLUX.1-dev and replace the contents of the transfermer directory with the contents of this repository.

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for YangZhou24/RealGRPO

Finetuned
(549)
this model