collabllm / docs /workers /model_engine.rst

Upload folder using huggingface_hub

9114cf2 verified 3 months ago

5.02 kB

	Model Engine
	============

	.. _vermouth: https://github.com/vermouth1992

	Author: `Chi Zhang <https://github.com/vermouth1992>`_

	Last updated: 09/25/2025.

	Current Support Matrix
	----------------------

	+----------+-----------+--------------+-------------+--------------------------+
	\| Backends \| Model \| Scalability \| Model \| Pain points \|
	\| \| Supported \| \| Definition \| \|
	\| \| \| \| \| \|
	+==========+===========+==============+=============+==========================+
	\| FSDP \| Day 1 \| - Dense is OK\| Huggingface \| Monkey patch can be \|
	\| + \| support \| \| + monkey \| easily impacted by \|
	\| ulysses \| HF model \| - MoE is bad \| patch \| transformers version \|
	+----------+-----------+--------------+-------------+--------------------------+
	\| MCore \| Limited \| Best \| GPTModel \| Supporting new models is \|
	\| \| \| \| (One model \| difficult \|
	\| \| \| \| for all) \| \|
	+----------+-----------+--------------+-------------+--------------------------+

	- We monkey patch attention function to support ulysses
	- We monkey patch VLM models to support FSDP with mixed data with and
	without images

	Class Hierarchy
	---------------

	Note that all the workers and trainers run in SPMD mode. SFT/DPO/RM
	trainer is directly invoked by ``torchrun``. The Actor/Critic worker can
	also be invoked by a RayWorkerGroup and provides APIs to a single
	controller.

	- Base Engine level: implement model init, optimizer init, lr scheduler
	init, sharding, checkpoint manager.
	- Full Engine level: subclass base engine and implement
	``forward_step``.
	- Worker/SPMD trainer level: engine agnostic, implement training
	logics using abstract engine APIs

	RL trainer utilizes workers to construct HybridFlow program. This is out
	of the scope of model engine.

	Existing Model Types
	--------------------

	========== ====================== ======================
	Model type Language model Value model
	========== ====================== ======================
	Input text/image/video/audio text/image/video/audio
	Output logits for next token logits as value
	========== ====================== ======================

	Currently, we have two model types: language model and value model. We
	expect to expand the category to include Qwen-Omni family (output both
	text and audio) and VLA models.

	Data Format
	-----------

	Currently, verl adopts left-right padding data format in RL trainer.
	This creates massive padding when the discrepancy between response
	length is large. We will start to implement no-padding format throughout
	the whole system.

	.. image:: https://github.com/vermouth1992/verl-data/blob/master/images/data_format.png?raw=true
	:alt: Data Format

	Here is the migration plan:
	- Implement no-padding format in engine
	- Add a transformation layer in Actor/Critic worker.
	- Replace Actor/Critic Worker in RL trainer
	- Implement no-padding throughput system

	Checkpoint System
	-----------------

	.. image:: https://github.com/vermouth1992/verl-data/blob/master/images/verl-ckpt.png?raw=true
	:alt: Model Engine Checkpoint System

	The engine constructs the model using huggingface config, then load
	weights from huggingface checkpoint. If the engine directly uses
	huggingface model definition, it can use function provided by
	``transformers``. Otherwise, each engine has to write their own
	checkpoint load logic (e.g.,
	`mbridge <https://github.com/ISEEKYAN/mbridge>`__). During model
	training, each engine has to implement save_checkpoint and
	load_checkpoint that save/load intermediate sharded checkpoint including
	model, optimizer and lr scheduler states. Each engine has to implement a
	checkpoint merge script, that merges the intermediate sharded checkpoint
	back to huggingface format.

	API
	---

	A tentative model engine API can be found:
	https://github.com/volcengine/verl/blob/main/verl/workers/engine/base.py#L24

	Extension
	---------

	Add a new backend
	~~~~~~~~~~~~~~~~~

	- Start a new folder under ``verl/workers/engine``. Then, implement
	``transformer_impl.py``. If you want to implement a non-transformer
	model, please contact us in advance.
	- Add the engine config to the GSM8k SFT trainer script:
	https://github.com/volcengine/verl/blob/main/tests/special_e2e/sft/run_sft_engine_gsm8k.sh
	- Invoke the tests with your backend:
	https://github.com/volcengine/verl/blob/main/tests/special_e2e/sft/test_sft_engine_all.sh.
	This test script will run various backends and various
	configurations, and compare the loss and grad norm of the first step
	to make sure they are close.

	Add a new model type
	~~~~~~~~~~~~~~~~~~~~

	- This is mainly reserved for models whose the output is not just text
	(e.g., Qwen3-Omni). Please discuss with us before you proceed.