YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

JoyAI-Image-Edit
_{^{Awakening Spatial Intelligence in Unified Multimodal Understanding and Generation}}

🐶 JoyAI-Image-Edit

JoyAI-Image-Edit is a multimodal foundation model specialized in instruction-guided image editing. It enables precise and controllable edits by leveraging strong spatial understanding, including scene parsing, relational grounding, and instruction decomposition, allowing complex modifications to be applied accurately to specified regions.

🚀 Quick Start

1. Environment Setup

Requirements: Python >= 3.10, CUDA-capable GPU

Create a virtual environment and install:

git clone https://github.com/jd-opensource/JoyAI-Image
cd JoyAI-Image
conda create -n joyai python=3.10 -y
conda activate joyai

pip install -e .

Note on Flash Attention: flash-attn >= 2.8.0 is listed as a dependency for best performance.

Core Dependencies

Package	Version	Purpose
`torch`	>= 2.8	PyTorch
`transformers`	>= 4.57.0, < 4.58.0	Text encoder
`diffusers`	>= 0.34.0	Pipeline utilities
`flash-attn`	>= 2.8.0	Fast attention kernel

2. Inference

Image Editing

python inference.py \
  --ckpt-root /path/to/ckpts_infer \
  --prompt "Turn the plate blue" \
  --image test_images/test_1.jpg \
  --output outputs/result.png \
  --seed 123 \
  --steps 30 \
  --guidance-scale 5.0 \
  --basesize 1024

CLI Reference (`inference.py`)

Argument	Type	Default	Description
`--ckpt-root`	str	required	Checkpoint root
`--prompt`	str	required	Edit instruction or T2I prompt
`--image`	str	None	Input image path (required for editing, omit for T2I)
`--output`	str	`example.png`	Output image path
`--steps`	int	50	Denoising steps
`--guidance-scale`	float	5.0	Classifier-free guidance scale
`--seed`	int	42	Random seed for reproducibility
`--neg-prompt`	str	`""`	Negative prompt
`--basesize`	int	1024	Bucket base size for input image resizing (256/512/768/1024)
`--config`	str	auto	Config path; defaults to `<ckpt-root>/infer_config.py`
`--rewrite-prompt`	flag	off	Enable LLM-based prompt rewriting
`--rewrite-model`	str	`gpt-5`	Model name for prompt rewriting
`--hsdp-shard-dim`	int	1	FSDP shard dimension for multi-GPU (set to GPU count)

Spatial Editing Reference

JoyAI-Image supports three spatial editing prompt patterns: Object Move, Object Rotation, and Camera Control. For the most stable behavior, we recommend following the prompt templates below as closely as possible.

1. Object Move

Use this pattern when you want to move a target object into a specified region.

Prompt template:

Move the <object> into the red box and finally remove the red box.

Rules:

Replace <object> with a clear description of the target object to be moved.
The red box indicates the target destination in the image.
The phrase "finally remove the red box" means the guidance box should not appear in the final edited result.

Example:

Move the apple into the red box and finally remove the red box.

2. Object Rotation

Use this pattern when you want to rotate an object to a specific canonical view.

Prompt template:

Rotate the <object> to show the <view> side view.

Supported <view> values:

front
right
left
rear
front right
front left
rear right
rear left

Rules:

Replace <object> with a clear description of the object to rotate.
Replace <view> with one of the supported directions above.
This instruction is intended to change the object orientation, while keeping the object identity and surrounding scene as consistent as possible.

Examples:

Rotate the chair to show the front side view.
Rotate the car to show the rear left side view.

3. Camera Control

Use this pattern when you want to change only the camera viewpoint while keeping the 3D scene itself unchanged.

Prompt template:

Move the camera.
- Camera rotation: Yaw {y_rotation}°, Pitch {p_rotation}°.
- Camera zoom: in/out/unchanged.
- Keep the 3D scene static; only change the viewpoint.

Rules:

{y_rotation} specifies the yaw rotation angle in degrees.
{p_rotation} specifies the pitch rotation angle in degrees.
Camera zoom must be one of:
- in
- out
- unchanged
The last line is important: it explicitly tells the model to preserve the 3D scene content and geometry, and only adjust the camera viewpoint.

Examples:

Move the camera.
- Camera rotation: Yaw 45°, Pitch 0°.
- Camera zoom: in.
- Keep the 3D scene static; only change the viewpoint.

Move the camera.
- Camera rotation: Yaw -90°, Pitch 20°.
- Camera zoom: unchanged.
- Keep the 3D scene static; only change the viewpoint.

License Agreement

JoyAI-Image is licensed under Apache 2.0.

☎️ We're Hiring!

We are actively hiring Research Scientists, Engineers, and Interns to join us in building next-generation generative foundation models and bringing them into real-world applications. If you’re interested, please send your resume to: huanghaoyang.ocean@jd.com

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including jdopensource/JoyAI-Image-Edit

JoyAI-Image

Collection

JoyAI-Image • 1 item • Updated 1 day ago • 1

JoyAI-Image-EditAwakening Spatial Intelligence in Unified Multimodal Understanding and Generation

🐶 JoyAI-Image-Edit

🚀 Quick Start

1. Environment Setup

Core Dependencies

2. Inference

Image Editing

CLI Reference (inference.py)

Spatial Editing Reference

1. Object Move

2. Object Rotation

3. Camera Control

License Agreement

☎️ We're Hiring!

Collection including jdopensource/JoyAI-Image-Edit

JoyAI-Image-Edit
_{^{Awakening Spatial Intelligence in Unified Multimodal Understanding and Generation}}

CLI Reference (`inference.py`)