Spaces:

FINAL-Bench
/

LiteRT-LM

Running

File size: 16,526 Bytes

5f923cd

# Tool Use

LiteRT-LM handles tool calling in the [Conversation API](./conversation.md). The
Conversation API is a high-level API that represents a multi-turn conversation
with an LLM.

TIP: This page describes how to use tool calling in the C++ Conversation API.
For the Android API, see
[LiteRT-LM Android API: Defining and Using Tools](https://github.com/google-ai-edge/LiteRT-LM/blob/main/android/README.md#6-defining-and-using-tools)

## Concepts

### Tool Calling Flow

Tool calling involves three main entities:

1.  **Application**: The application code written by the developer, using the
    LiteRT-LM library.
2.  **Model**: The LLM that is being called.
3.  **User**: The end-user of the application.

Tool use typically follows these steps:

1.  The application declares the tools that are available to the model. A tool
    declaration consists of a name, parameters, and description. These are
    specified in a JSON object defined by the application.
2.  When the user sends a message to the application, e.g. by typing in a chat
    box, the application sends the message to LiteRT-LM, which feeds the message
    to the model and initiates auto-regressive generation.
3.  The model outputs a string indicating a tool call.
4.  LiteRT-LM detects the tool call and parses the tool call into a JSON object.
5.  The application uses the tool call JSON object to execute the tool call,
    which can perform real-world actions and then return a result. The
    implementation of the tool is provided by the application.
6.  The application sends the tool result back to the model.
7.  The model outputs a natural language response based on the tool result or
    makes another tool call.
8.  Repeat from #2.

![Tool Calling Flow](images/tool-call-flow.png)

-   Application:
    -   Provides a specification for each tool, i.e. name, parameters, and
        description.
    -   Implements the tools and executes them when requested.
    -   Manages the chat loop with the user.
-   LiteRT-LM:
    -   Translates human-readable messages into the format the model was trained
        on.
    -   Runs inference on the model given a prompt.
    -   Detects and parses tool calls.
    -   Maintains the conversation history between the user, model, and tools.

## How to Use

### Tool Declarations {#tool-declarations}

When you create a `Conversation` you set a `Preface` object that defines the
initial context for the LLM. This includes the system message and the **tool
declarations**.

To declare the tools available to the model, set the `tools` field of the
`Preface` object to a JSON array of tool declarations. Each tool declaration is
a [JSON schema](https://json-schema.org/learn) containing the tool's name,
description, and parameters.

For example, the following code defines two tools: `get_weather` and
`get_stock_price`:

```c++
constexpr absl::string_view kToolString = R"([
{
  "name": "get_weather",
  "description": "Returns the weather for a given location.",
  "parameters": {
    "type": "object",
    "properties": {
      "location": {
        "type": "string",
        "description": "The location to get the weather for."
      }
    },
    "required": [
      "location"
    ]
  }
},
{
  "name": "get_stock_price",
  "description": "Returns the stock price for a given stock symbol.",
  "parameters": {
    "type": "object",
    "properties": {
      "stock_symbol": {
        "type": "string",
        "description": "The stock symbol to get the price for."
      }
    },
    "required": [
      "stock_symbol"
    ]
  }
}
])";

JsonPreface preface;
preface.tools = nlohmann::ordered_json::parse(kToolString);
```

The `Preface` is passed to `ConversationConfig::Builder` when you create the
`Conversation` object:

```c++
// Set model file path and backend.
std::string model_path = absl::GetFlag(FLAGS_model_path);
ASSIGN_OR_RETURN(ModelAssets model_assets, ModelAssets::Create(model_path));
ASSIGN_OR_RETURN(
  EngineSettings engine_settings,
  EngineSettings::CreateDefault(std::move(model_assets), Backend::CPU));

// Create `Engine`.
ASSIGN_OR_RETURN(
    std::unique_ptr<litert::lm::Engine> engine,
    litert::lm::Engine::CreateEngine(std::move(engine_settings)));

// Create `Conversation`.
auto session_config = litert::lm::SessionConfig::CreateDefault();
ASSIGN_OR_RETURN(auto conversation_config,
                   ConversationConfig::Builder()
                       .SetSessionConfig(session_config)
                       .SetPreface(preface)
                       .Build(*engine));
ASSIGN_OR_RETURN(std::unique_ptr<Conversation> conversation,
                   Conversation::Create(*engine, conversation_config));
```

### Tool Calls

Once tools have been declared, the model may respond to a user message with a
tool call, instead of or in addition to natural language text.

Example:

```c++
// Construct the user message as a JSON object.
JsonMessage user_message = JsonMessage::parse(R"({
  "role": "user",
  "content": {
    "type": "text",
    "text"" "How is the weather in Paris?"
  }
})")

// Send the user message to the model.
ASSIGN_OR_RETURN(Message model_message, conversation->SendMessage(user_message));
```

After the code above runs, `model_message` will contain the following JSON
object:

```json
{
  "tool_calls": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "arguments": {
          "location": "Paris"
        }
      }
    }
  ]
}
```

### Tool Execution

Calling the function is the responsibility of your application. For example,
`get_weather` could call the following function:

```c++
// Returns random weather conditions.
nlohmann::ordered_json GetWeather(const nlohmann::ordered_json& arguments) {
    std::string location = arguments.value("location", "Unknown");
    absl::BitGen gen;
    int temperature = absl::Uniform(gen, 50, 91);
    int humidity = absl::Uniform(gen, 20, 81);
    constexpr std::string weather_conditions[] = {"Sunny", "Cloudy", "Rainy",
                                                  "Windy"};
    std::string condition = weather_conditions[absl::Uniform(
        gen, 0, static_cast<int>(std::size(weather_conditions)))];

  return {
      {"tool_name", "get_weather"},
      {"location", location},
      {"temperature", temperature},
      {"unit", "F"},
      {"humidity", humidity},
      {"condition", condition},
  };
}

// Call the function using the tool call object.
nlohmann::ordered_json arguments = message["tool_calls"][0]["function"]["arguments"];
nlohmann::ordered_json weather_report = GetWeather(arguments);
```

For this code example, let's assume `GetWeather` returns the following JSON
object:

```json
{
  "tool_name": "get_weather",
  "location":"Paris",
  "temperature":72,
  "unit":"F",
  "humidity":50,
  "condition":"Sunny"
}
```

TIP: The
[Android API](https://github.com/google-ai-edge/LiteRT-LM/blob/main/android/README.md#6-defining-and-using-tools)
supports **automatic tool calling**. This allows you to define a class with
annotated methods and LiteRT-LM will automatically call those methods when tool
calls are generated by the model.

### Tool Response

Once your application has called the real-world function, the model needs to
know the result. Pass the tool result as a message with the `role` set to
`tool`:

```c++
// Construct the tool message containing the result.
JsonMessage tool_message = {{"role", "tool"}, {"content", weather_report}};

// Send the tool message to the model.
ASSIGN_OR_RETURN(model_message, conversation->SendMessage(tool_message));
```

After the code above runs, `model_message` will contain the following JSON
object, which includes a natural language interpretation of the tool result:

```json
{
  "content": [
    {
      "type": "text",
      "text": "The weather in Paris is sunny with a temperature of 72°F and humidity of 50%."
    }
  ]
}
```

The application could then display the model's natural language response (*"The
weather in Paris is sunny with a temperature of 72°F and humidity of 50%."*) to
the user.

### Tool Calling Loop

In your application, you will usually want to allow the user to converse with
the LLM in a loop, and to enable the LLM to call tools sequentially before
returning a natural language response to the user.

Create the `Conversation` object by following the instructions in
[Tool Declarations](#tool-declarations).

Next, let's define a class that looks up the tool name and calls the
corresponding function:

```c++
class Tools {
 public:
  Tools() {
    tools_["get_weather"] = absl::bind_front(&Tools::GetWeather, this);
    tools_["get_stock_price"] = absl::bind_front(&Tools::GetStockPrice, this);
  }

  nlohmann::ordered_json CallTool(const std::string& name,
                                  const nlohmann::ordered_json& arguments) {
    auto it = tools_.find(name);
    if (it == tools_.end()) {
      return {{"tool_name", name}, {"error", "Tool not found"}};
    }
    nlohmann::ordered_json tool_response = it->second(arguments);
    return tool_response;
  }

 private:
  // Returns random weather conditions.
  nlohmann::ordered_json GetWeather(const nlohmann::ordered_json& arguments) {
    std::string location = arguments.value("location", "Unknown");
    absl::BitGen gen;
    int temperature = absl::Uniform(gen, 50, 91);
    int humidity = absl::Uniform(gen, 20, 81);
    constexpr std::string weather_conditions[] = {"Sunny", "Cloudy", "Rainy",
                                                  "Windy"};
    std::string condition = weather_conditions[absl::Uniform(
        gen, 0, static_cast<int>(std::size(weather_conditions)))];

    return {
        {"tool_name", "get_weather"}, {"location", location},
        {"temperature", temperature}, {"unit", "F"},
        {"humidity", humidity},       {"condition", condition},
    };
  }

  // Returns a random stock price.
  nlohmann::ordered_json GetStockPrice(
      const nlohmann::ordered_json& arguments) {
    std::string stock_symbol = arguments.value("stock_symbol", "Unknown");
    absl::BitGen gen;
    double price = std::round(absl::Uniform(gen, 100.0, 400.0) * 100.0) / 100.0;
    return {
        {"tool_name", "get_stock_price"},
        {"stock_symbol", stock_symbol},
        {"price", price},
        {"currency", "USD"},
    };
  }

  absl::flat_hash_map<std::string, std::function<nlohmann::ordered_json(
                                       const nlohmann::ordered_json&)>>
      tools_;
};
```

The chat loop will consist of two loops:

-   The outer loop between the *user* and the *model*:
    1.  Takes the user's text input from the terminal.
    2.  Constructs a `message` from the user's text input.
-   The inner loop between the *model* and the *application*:
    1.  Sends the `message` to the model.
    2.  Receives the response from the model.
    3.  Checks the response for tool calls.
    4.  Calls the tools specified in the model's response.
    5.  Constructs a message containing the tool results.
    6.  Loops back to #1.

<!-- end list -->

```c++
// The tools we defined above.
Tools tools;

// This string will hold the next prompt to be sent to the model.
std::string input_prompt;

// Chat loop between user and model.
while (true) {
  // Get input from the user.
  std::cout << "Please enter the prompt (or press Enter to end): "
            << std::flush;
  std::getline(std::cin, input_prompt);

  // Exit if the user pressed Enter.
  if (input_prompt.empty()) {
    break;
  }

  // Construct the user message.
  JsonMessage input_message = {
      {"role", "user"},
      {"content", {{{"type", "text"}, {"text", input_prompt}}}}};

  // Tool calling loop between application and model.
  while (true) {
    // Send the user message to the model.
    ASSIGN_OR_RETURN(Message message,
                      conversation->SendMessage(input_message));

    // Get the JSON message from the model's response.
    if (std::holds_alternative<json>(message)) {
      JsonMessage message_json =
          std::get<nlohmann::ordered_json>(message);

      // Check for tool calls.
      if (message_json.contains("tool_calls") &&
          message_json["tool_calls"].is_array() &&
          !message_json["tool_calls"].empty()) {
        // This JSON array will hold the tool response messages.
        nlohmann::ordered_json tool_messages = nlohmann::ordered_json::array();

        // For each tool call, call the tool and add the response.
        for (const auto& tool_call : message_json["tool_calls"]) {
          JsonMessage tool_message = {{"role", "tool"},
                                                  {"content", {}}};
          const nlohmann::ordered_json& function = tool_call["function"];
          tool_message["content"] =
              tools.CallTool(function["name"], function["arguments"]);
          tool_messages.push_back(tool_message);
        }

        // The next input message is the tool response.
        input_message = tool_messages;
      } else {
        // If there are no tool calls, print the model's response and exit the
        // tool calling loop.
        for (const auto& item : message_json["content"]) {
          if (item.contains("type") && item["type"] == "text") {
            std::cout << item["text"].get<std::string>() << std::endl;
          }
        }

        break;
      }
    }
  }
}
```

The chat loop above will run until the user presses `Enter` at the prompt.

### Tool Calling with `SendMessageAsync`

Tool calling works with `Conversation::SendMessageAsync`.

When you call `SendMessageAsync` to send a message to the model asynchronously:

-   Text chunks are streamed to the callback as usual.
-   When the start of a tool call is encountered, the following happens:
    -   LiteRT-LM will wait for the remainder of the tool call to be generated,
    -   Parse the full tool call expression
    -   Send the parsed tool call JSON to the callback in the `tool_calls` field
        of the message.

To use `SendMessageAsync` in the chat loop example above, you would replace the
inner tool calling loop with the following code:

```c++
// Tool calling loop between application and model in asynchronous mode.
while (true) {
  // This Notification is used to signal when the model is done decoding.
  absl::Notification done;

  // This stores the tool calls.
  nlohmann::ordered_json tool_calls;

  // This is the callback that is called with each message chunk as the model is
  // generating tokens.
  auto user_callback = [&done,
                        &tool_calls](absl::StatusOr<Message> message) {
    if (!message.ok()) {
      // If message is not OK, it means there was an error.
      done.Notify();
      return;
    }

    if (!std::holds_alternative<nlohmann::json>(*message)) {
      return;
    }

    // Get JSON from the message.
    JsonMessage message_json = std::get<JsonMessage>(*message);

    // An empty message indicates the model is done generating.
    if (message_json.is_null()) {
      std::cout << std::endl << std::flush;
      done.Notify();
      return;
    }

    // Print any text content.
    if (message_json.contains("content") &&
        message_json["content"].is_array()) {
      for (const auto& item : message_json["content"]) {
        if (item.contains("text")) {
          std::cout << item["text"] << std::endl << std::flush;
        }
      }
    }

    // Collect any tool calls, if present.
    if (message_json.contains("tool_calls") &&
        message_json["tool_calls"].is_array() &&
        !message_json["tool_calls"].empty()) {
      for (const auto& tool_call : message_json["tool_calls"]) {
        tool_calls.push_back(tool_call);
      }
    }
  };

  // Send message to the model asynchronously.
  RETURN_IF_ERROR(conversation->SendMessageAsync(
      input_message, std::move(user_callback)));

  // Wait for model to finish generating.
  done.WaitForNotification();

  // Handle tool calls.
  for (const auto& tool_call : tool_calls) {
    // Call tools, get results, etc.
    // ...
  }
}
```

## How It Works

Most of the work for tool calling is done in the `ModelDataProcessor`
implementation for the model you're using.

-   Tool declarations are formatted by `ModelDataProcessor::FormatTools`.
-   Tool calls are parsed by `ModelDataProcessor::ToMessage`.
-   Tool calls and responses are formatted in
    `ModelDataProcessor::MessageToTemplateInput`.
    -   Additional formatting may be done inside the chat template.

![Tool Format and Parse](images/tool-format-and-parse.png)