File size: 944 Bytes
9114cf2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | Multi-Modal Example Architecture
=================================
Last updated: 04/28/2025.
Introduction
------------
Now, verl has supported multi-modal training. You can use fsdp and
vllm/sglang to start a multi-modal RL task. Megatron supports is also
on the way.
Follow the steps below to quickly start a multi-modal RL task.
Step 1: Prepare dataset
-----------------------
.. code:: python
# it will be saved in the $HOME/data/geo3k folder
python examples/data_preprocess/geo3k.py
Step 2: Download Model
----------------------
.. code:: bash
# download the model from huggingface
python3 -c "import transformers; transformers.pipeline(model='Qwen/Qwen2.5-VL-7B-Instruct')"
Step 3: Perform GRPO training with multi-modal model on Geo3K Dataset
---------------------------------------------------------------------
.. code:: bash
# run the task
bash examples/grpo_trainer/run_qwen2_5_vl-7b.sh
|