Spaces:
Running
Running
File size: 4,554 Bytes
5f923cd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | # LiteRT-LM Python API
The Python API of LiteRT-LM for **Linux and MacOS** (Windows support is upcoming).
Features like **multi-modality** and **tools use** are supported, while **GPU
acceleration** is upcoming.
## Introduction
Here is a sample terminal chat app built with the Python API:
```python
import litert_lm
litert_lm.set_min_log_severity(litert_lm.LogSeverity.ERROR) # Hide log for TUI app
with litert_lm.Engine("path/to/model.litertlm") as engine:
with engine.create_conversation() as conversation:
while True:
user_input = input("\n>>> ")
for chunk in conversation.send_message_async(user_input):
print(chunk["content"][0]["text"], end="", flush=True)
```

## Getting Started
LiteRT-LM is available as a Python library. You can install the nightly version from PyPI:
```bash
# Using pip
pip install litert-lm-nightly
# Using uv
uv pip install litert-lm-nightly
```
### 1. Initialize the Engine
The `Engine` is the entry point to the API. It handles model loading and resource management. Using it as a context manager (with the `with` statement) ensures that native resources are released promptly.
**Note:** Initializing the engine can take several seconds to load the model.
```python
import litert_lm
# Initialize with the model path and optionally specify the backend.
# backend can be Backend.CPU (default). GPU support is upcoming.
with litert_lm.Engine(
"path/to/your/model.litertlm",
backend=litert_lm.Backend.CPU,
# Optional: Pick a writable dir for caching compiled artifacts.
# cache_dir="/tmp/litert-lm-cache"
) as engine:
# ... Use the engine to create a conversation ...
pass
```
### 2. Create a Conversation
A `Conversation` manages the state and history of your interaction with the model.
```python
# Optional: Configure system instruction and initial messages
messages = [
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
]
# Create the conversation
with engine.create_conversation(messages=messages) as conversation:
# ... Interact with the conversation ...
pass
```
### 3. Sending Messages
You can send messages synchronously or asynchronously (streaming).
**Synchronous Example:**
```python
# Simple string input
response = conversation.send_message("What is the capital of France?")
print(response["content"][0]["text"])
# Or with full message structure
# response = conversation.send_message({"role": "user", "content": "..."})
```
**Asynchronous (Streaming) Example:**
```python
# sendMessageAsync returns an iterator of response chunks
stream = conversation.send_message_async("Tell me a long story.")
for chunk in stream:
# Chunks are dictionaries containing pieces of the response
for item in chunk.get("content", []):
if item.get("type") == "text":
print(item["text"], end="", flush=True)
print()
```
### 4. Multi-Modality
Note: This requires models with multi-modality support, such as [Gemma3n](https://huggingface.co/google/gemma-3n-E2B-it-litert-lm).
```python
# Initialize with vision and/or audio backends if needed
with litert_lm.Engine(
"path/to/multimodal_model.litertlm",
audio_backend=litert_lm.Backend.CPU,
# vision_backend=litert_lm.Backend.CPU, (GPU support is upcoming)
) as engine:
with engine.create_conversation() as conversation:
user_message = {
"role": "user",
"content": [
{"type": "audio", "path": "/path/to/audio.wav"},
{"type": "text", "text": "Describe this audio."},
],
}
response = conversation.send_message(user_message)
print(response["content"][0]["text"])
```
### 5. Defining and Using Tools
Note: This requires models with tool support, such as [FunctionGemma](https://huggingface.co/google/functiongemma-270m-it).
You can define Python functions as tools that the model can call automatically.
```python
def add_numbers(a: float, b: float) -> float:
"""Adds two numbers.
Args:
a: The first number.
b: The second number.
"""
return a + b
# Register the tool in the conversation
tools = [add_numbers]
with engine.create_conversation(tools=tools) as conversation:
# The model will call add_numbers automatically if it needs to sum values
response = conversation.send_message("What is 123 + 456?")
print(response["content"][0]["text"])
```
LiteRT-LM uses the function's docstring and type hints to generate the tool schema for the model.
|