Spaces:

FINAL-Bench
/

LiteRT-LM

Running

File size: 5,368 Bytes

5f923cd

# Constrained Decoding in LiteRT-LM

LiteRT-LM supports constrained decoding, allowing you to enforce specific
structures on the model's output. This is particularly useful for tasks like:

-   **Function Calling**: Ensuring the model outputs a valid function call
    matching a specific schema.
-   **Structured Data Extraction**: Forcing the model to adhere to a specific
    format (e.g., specific regex patterns).
-   **Grammar Enforcement**: Using context-free grammars (via Lark) to guide
    generation.

This document explains how to enable, configure, and use constrained decoding in
your application.

## Enabling Constrained Decoding

To use constrained decoding, you must enable it in the `ConversationConfig` when
creating your `Conversation` instance.

```cpp
#include "runtime/conversation/conversation.h"

// ...

ConversationConfig::Builder builder;
builder.SetEnableConstrainedDecoding(true);

// Set a ConstraintProviderConfig in the ConversationConfig::Builder.
// This line set the ConstraintProvider to LLGuidance with default settings.
builder.SetConstraintProviderConfig(LlGuidanceConfig());

auto config = builder.Build(*engine);
```

### Constraint Providers

LiteRT-LM supports different backends for constrained decoding, configured via
`ConstraintProviderConfig`:

1.  **LLGuidance (`LlGuidanceConfig`)**: Uses the
    [LLGuidance](https://github.com/guidance-ai/llguidance) library. Supports
    Regex, JSON Schema, and Lark grammars.
2.  **External (`ExternalConstraintConfig`)**: Allows passing a pre-constructed
    `Constraint` object per-request. Useful for custom C++ constraint
    implementations.

## Using Constraints in `SendMessage`

Once enabled, you can apply constraints to individual messages using the
`decoding_constraint` field in the `OptionalArgs` struct passed to `SendMessage`
or `SendMessageAsync`. This field is of type `std::optional<ConstraintArg>`.

### 1. LLGuidance Constraints

LLGuidance constraints can be specified as Regex, JSON Schema, or Lark grammars.

#### Regex Constraint

Constrain the output to match a regular expression.

```cpp
#include "runtime/components/constrained_decoding/llg_constraint_config.h"

// ...

LlGuidanceConstraintArg constraint_arg;
constraint_arg.constraint_type = LlgConstraintType::kRegex;
// Example: Force output to be a sequence of 'a's followed by 'b's
constraint_arg.constraint_string = "a+b+";

auto response = conversation->SendMessage(
    user_message,
    {.decoding_constraint = constraint_arg}
);
```

#### JSON Schema Constraint

Constrain the output to be a valid JSON object matching a schema.

```cpp
LlGuidanceConstraintArg constraint_arg;
constraint_arg.constraint_type = LlgConstraintType::kJsonSchema;
// Example: Simple JSON object with a "name" field
constraint_arg.constraint_string = R"({
  "type": "object",
  "properties": {
    "name": {"type": "string"}
  },
  "required": ["name"]
})";

auto response = conversation->SendMessage(
    user_message,
    {.decoding_constraint = constraint_arg}
);
```

#### Lark Grammar Constraint

Constrain the output to follow a Lark grammar.

```cpp
LlGuidanceConstraintArg constraint_arg;
constraint_arg.constraint_type = LlgConstraintType::kLark;
// Example: A simple calculator grammar
constraint_arg.constraint_string = R"(
    start: expr
    expr: atom
        | expr "+" atom
        | expr "-" atom
        | expr "*" atom
        | expr "/" atom
        | "(" expr ")"
    atom: /[0-9]+/
    WS: /[ \t\n\f]+/
    %ignore WS
)";

auto response = conversation->SendMessage(
    user_message,
    {.decoding_constraint = constraint_arg}
);
```

### 2. External Constraints

If you have a custom implementation of the `Constraint` interface (e.g., a
highly specialized C++ state machine), you can use `ExternalConstraintArg`.

Prerequisite: You must have initialized `Conversation` with
`ExternalConstraintConfig`.

```cpp
// 1. Initialize with ExternalConstraintConfig
auto config = ConversationConfig::Builder()
    .SetEnableConstrainedDecoding(true)
    .SetConstraintProviderConfig(ExternalConstraintConfig())
    .Build(*engine);
auto conversation = Conversation::Create(*engine, config);

// 2. Create your custom constraint (must implement litert::lm::Constraint)
class MyCustomConstraint : public Constraint {
    // Implement Start, ComputeNext, etc.
};
auto my_constraint = std::make_unique<MyCustomConstraint>();

// 3. Pass it to SendMessage
ExternalConstraintArg external_constraint;
external_constraint.constraint = std::move(my_constraint);

auto response = conversation->SendMessage(
    user_message,
    {.decoding_constraint = std::move(external_constraint)}
);
```

## API Reference

### `ConstraintProviderConfig`

A variant configuration passed to `ConversationConfig`.

-   `LlGuidanceConfig`: Configures LLGuidance.
    -   `eos_id`: Optional override for the End-of-Sequence token ID.
-   `ExternalConstraintConfig`: Empty struct (marker) to enable external
    constraints.

### `ConstraintArg`

A variant argument passed via `OptionalArgs` to `SendMessage`.

-   `LlGuidanceConstraintArg`:
    -   `constraint_type`: `kRegex`, `kJsonSchema`, or `kLark`.
    -   `constraint_string`: The pattern/schema/grammar string.
-   `ExternalConstraintArg`:
    -   `constraint`: `std::unique_ptr<Constraint>`. Ownership is transferred to
        the valid decoder for that request.