Papers
arxiv:2604.09132

Strips as Tokens: Artist Mesh Generation with Native UV Segmentation

Published on Apr 10
ยท Submitted by
Huaijin Pi
on Apr 14
Authors:
,
,
,
,
,
,
,

Abstract

SATO introduces a novel token ordering strategy for autoregressive transformers that preserves edge flow and semantic layout in mesh generation through triangle strip-based sequences.

AI-generated summary

Recent advancements in autoregressive transformers have demonstrated remarkable potential for generating artist-quality meshes. However, the token ordering strategies employed by existing methods typically fail to meet professional artist standards, where coordinate-based sorting yields inefficiently long sequences, and patch-based heuristics disrupt the continuous edge flow and structural regularity essential for high-quality modeling. To address these limitations, we propose Strips as Tokens (SATO), a novel framework with a token ordering strategy inspired by triangle strips. By constructing the sequence as a connected chain of faces that explicitly encodes UV boundaries, our method naturally preserves the organized edge flow and semantic layout characteristic of artist-created meshes. A key advantage of this formulation is its unified representation, enabling the same token sequence to be decoded into either a triangle or quadrilateral mesh. This flexibility facilitates joint training on both data types: large-scale triangle data provides fundamental structural priors, while high-quality quad data enhances the geometric regularity of the outputs. Extensive experiments demonstrate that SATO consistently outperforms prior methods in terms of geometric quality, structural coherence, and UV segmentation.

Community

Paper submitter

Today we're releasing Strips as Tokens (SATO), a new autoregressive framework for artist mesh generation with native UV segmentation.

Most existing mesh generators use token orderings that do not match how artists actually build meshes. Coordinate based sequences are often too long, while patch based heuristics can break edge flow, topology regularity, and UV structure.

SATO takes a different path.
Inspired by triangle strips, it represents a mesh as a connected chain of faces that also encodes UV boundaries directly in the token sequence. This gives the model a much more natural way to capture organized edge flow, semantic structure, and clean UV layouts during generation.

A key feature of SATO is that the same token sequence can be decoded into either a triangle mesh or a quad mesh. This provides one unified representation for both mesh types, making it possible to jointly learn from large scale triangle data and high quality quad data in a single framework.

Given an input point cloud, SATO autoregressively generates artist style meshes together with native UV segmentation. In other words, the model does not only produce geometry. It also generates UV charts with clean and meaningful island boundaries, making the outputs much more practical for downstream texturing and real content creation workflows.

Across extensive experiments, SATO achieves strong results on triangle mesh generation, quad mesh generation, and UV aware mesh generation within one unified framework. We believe high quality mesh generation should move closer to real artist workflows, where geometry and UV layout are designed together rather than treated separately.

Geometry matters. Topology matters. UVs matter too.
SATO is a step toward generative models that understand all three.

๐ŸŽฅ Video (Youtube): https://youtu.be/Mc9skirm8cg

๐ŸŽฅ Video (Bilibili): https://www.bilibili.com/video/BV13eQ8BAEiA/

๐ŸŒ Project Page: https://ruixu.me/html/SATO/index.html

๐Ÿ“„ Paper: https://arxiv.org/abs/2604.09132

๐Ÿ’ป Code: https://github.com/Xrvitd/SATO

๐Ÿค— Models / Demo: COMING SOON

the strip-based tokenization, where faces grow along shared edges like a zipper and the uv island boundaries are baked into the token stream, is the clever core here. my worry is how the 512^3 quantization interacts with real artists' fine edge details and whether tiny uv seam noise could ripple into topology or tri/quad decoding ambiguity. the arxivLens breakdown helped me parse the token vocabulary and the start-of-strip and uv-transition tokens, which is nontrivial to implement cleanly in practice. an ablation worth doing would be varying quantization levels and decoding stride to see if the unified tri/quad output stays robust under leaner token budgets.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.09132
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.09132 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.09132 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.09132 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.