Spaces:
Running
Running
File size: 5,368 Bytes
5f923cd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | # Constrained Decoding in LiteRT-LM
LiteRT-LM supports constrained decoding, allowing you to enforce specific
structures on the model's output. This is particularly useful for tasks like:
- **Function Calling**: Ensuring the model outputs a valid function call
matching a specific schema.
- **Structured Data Extraction**: Forcing the model to adhere to a specific
format (e.g., specific regex patterns).
- **Grammar Enforcement**: Using context-free grammars (via Lark) to guide
generation.
This document explains how to enable, configure, and use constrained decoding in
your application.
## Enabling Constrained Decoding
To use constrained decoding, you must enable it in the `ConversationConfig` when
creating your `Conversation` instance.
```cpp
#include "runtime/conversation/conversation.h"
// ...
ConversationConfig::Builder builder;
builder.SetEnableConstrainedDecoding(true);
// Set a ConstraintProviderConfig in the ConversationConfig::Builder.
// This line set the ConstraintProvider to LLGuidance with default settings.
builder.SetConstraintProviderConfig(LlGuidanceConfig());
auto config = builder.Build(*engine);
```
### Constraint Providers
LiteRT-LM supports different backends for constrained decoding, configured via
`ConstraintProviderConfig`:
1. **LLGuidance (`LlGuidanceConfig`)**: Uses the
[LLGuidance](https://github.com/guidance-ai/llguidance) library. Supports
Regex, JSON Schema, and Lark grammars.
2. **External (`ExternalConstraintConfig`)**: Allows passing a pre-constructed
`Constraint` object per-request. Useful for custom C++ constraint
implementations.
## Using Constraints in `SendMessage`
Once enabled, you can apply constraints to individual messages using the
`decoding_constraint` field in the `OptionalArgs` struct passed to `SendMessage`
or `SendMessageAsync`. This field is of type `std::optional<ConstraintArg>`.
### 1. LLGuidance Constraints
LLGuidance constraints can be specified as Regex, JSON Schema, or Lark grammars.
#### Regex Constraint
Constrain the output to match a regular expression.
```cpp
#include "runtime/components/constrained_decoding/llg_constraint_config.h"
// ...
LlGuidanceConstraintArg constraint_arg;
constraint_arg.constraint_type = LlgConstraintType::kRegex;
// Example: Force output to be a sequence of 'a's followed by 'b's
constraint_arg.constraint_string = "a+b+";
auto response = conversation->SendMessage(
user_message,
{.decoding_constraint = constraint_arg}
);
```
#### JSON Schema Constraint
Constrain the output to be a valid JSON object matching a schema.
```cpp
LlGuidanceConstraintArg constraint_arg;
constraint_arg.constraint_type = LlgConstraintType::kJsonSchema;
// Example: Simple JSON object with a "name" field
constraint_arg.constraint_string = R"({
"type": "object",
"properties": {
"name": {"type": "string"}
},
"required": ["name"]
})";
auto response = conversation->SendMessage(
user_message,
{.decoding_constraint = constraint_arg}
);
```
#### Lark Grammar Constraint
Constrain the output to follow a Lark grammar.
```cpp
LlGuidanceConstraintArg constraint_arg;
constraint_arg.constraint_type = LlgConstraintType::kLark;
// Example: A simple calculator grammar
constraint_arg.constraint_string = R"(
start: expr
expr: atom
| expr "+" atom
| expr "-" atom
| expr "*" atom
| expr "/" atom
| "(" expr ")"
atom: /[0-9]+/
WS: /[ \t\n\f]+/
%ignore WS
)";
auto response = conversation->SendMessage(
user_message,
{.decoding_constraint = constraint_arg}
);
```
### 2. External Constraints
If you have a custom implementation of the `Constraint` interface (e.g., a
highly specialized C++ state machine), you can use `ExternalConstraintArg`.
Prerequisite: You must have initialized `Conversation` with
`ExternalConstraintConfig`.
```cpp
// 1. Initialize with ExternalConstraintConfig
auto config = ConversationConfig::Builder()
.SetEnableConstrainedDecoding(true)
.SetConstraintProviderConfig(ExternalConstraintConfig())
.Build(*engine);
auto conversation = Conversation::Create(*engine, config);
// 2. Create your custom constraint (must implement litert::lm::Constraint)
class MyCustomConstraint : public Constraint {
// Implement Start, ComputeNext, etc.
};
auto my_constraint = std::make_unique<MyCustomConstraint>();
// 3. Pass it to SendMessage
ExternalConstraintArg external_constraint;
external_constraint.constraint = std::move(my_constraint);
auto response = conversation->SendMessage(
user_message,
{.decoding_constraint = std::move(external_constraint)}
);
```
## API Reference
### `ConstraintProviderConfig`
A variant configuration passed to `ConversationConfig`.
- `LlGuidanceConfig`: Configures LLGuidance.
- `eos_id`: Optional override for the End-of-Sequence token ID.
- `ExternalConstraintConfig`: Empty struct (marker) to enable external
constraints.
### `ConstraintArg`
A variant argument passed via `OptionalArgs` to `SendMessage`.
- `LlGuidanceConstraintArg`:
- `constraint_type`: `kRegex`, `kJsonSchema`, or `kLark`.
- `constraint_string`: The pattern/schema/grammar string.
- `ExternalConstraintArg`:
- `constraint`: `std::unique_ptr<Constraint>`. Ownership is transferred to
the valid decoder for that request.
|