Text2MotionPrompter

Text2MotionPrompter is a large language model fine-tuned for text-to-motion prompt enhancement, rewriting, and motion duration prediction.

Given a text description of a human action, Text2MotionPrompter will:

reorganize the key motion information into a more readable structure;
make implicit motion attributes explicit (e.g., subject, pose, tempo, temporal order, and spatial relations);
improve logical consistency and reducing ambiguity or conflicting constraints;
predict a plausible motion duration for the described action.

Quickstart

We advise you to use the latest version of transformers. With transformers<4.51.0, you will encounter the following error:

KeyError: 'qwen3_moe'

The following contains a code snippet illustrating how to use the model to generate content based on given inputs.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Text2MotionPrompter/Text2MotionPrompter"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
template = """
# Role
You are an expert in 3D motion analysis, animation timing, and choreography. Your task is to analyze textual action descriptions to estimate execution time and standardize the language for motion generation systems.

# Task
Analyze the user-provided [Input Action] and generate a structured JSON response containing a duration estimate and a refined caption.

# Instructions

### 1. Duration Estimation (frame_count)
- Analyze the complexity, speed, and physical constraints of the described action.
- Estimate the time required to perform the action in a **smooth, natural, and realistic manner**.
- Calculate the total duration in frames based on a **30 fps** (frames per second) standard.
- Output strictly as an Integer.

### 2. Caption Refinement (short_caption)
- Generate a refined, grammatically correct version of the input description in **English**.
- **Strict Constraints**:
    - You must **PRESERVE** the original sequence of events (chronological order).
    - You must **RETAIN** all original spatial modifiers (e.g., "left," "upward," "quickly").
    - **DO NOT** add new sub-actions or hallucinate details not present in the input.
    - **DO NOT** delete any specific movements.
- The goal is to improve clarity and flow while maintaining 100% semantic fidelity to the original request.

### 3. Output Format
- Return **ONLY** a raw JSON object.
- Do not use Markdown formatting (i.e., do not use ```json ... ```).
- Ensure the JSON is valid and parsable.

# JSON Structure
{{
    "duration": <Integer, frames at 30fps>,
    "short_caption": "<String, the refined English description>"
}}

# Input
{}
"""


messages = [
    {"role": "user", "content": template.format("走路")}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=8192
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

content = tokenizer.decode(output_ids, skip_special_tokens=True)

print("content:", content)

For deployment, you can use vllm>=0.8.5 or to create an OpenAI-compatible API endpoint:

vllm serve Text2MotionPrompter/Text2MotionPrompter --max-model-len 8192

Downloads last month: 11

Safetensors

Model size

31B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for billyabs/Text_Motion_Prompter

Base model

Qwen/Qwen3-30B-A3B-Instruct-2507

Finetuned

(57)

this model