Code Documentation

This document provides an overview of the llm_node.py script, which is the core of the ros_llm package.

`LlamaClientNode`

The LlamaClientNode class is a ROS2 node that acts as a client to an OpenAI-compatible LLM server, such as one provided by Llama.cpp, but any other OpenAI-compatible LLM server will suffice.

It exposes ROS2 services to interact with the language model.

Parameters

The node exposes the following ROS2 parameters:

Parameter	Type	Description	Default Value
`server_url`	string	The URL of the Llama.cpp server's chat completions endpoint.	`http://127.0.0.1:5001/v1/chat/completions`
`temperature`	double	Controls the randomness of the output. Lower values make the output more deterministic, while higher values make it more creative.	`1.0`
`n_predict`	integer	The maximum number of tokens to generate in the response. A value of `-1` means no limit.	`-1`
`history_max_length`	integer	The maximum number of user/assistant conversation turns to keep in the history.	`10`
`rag_enabled`	boolean	Toggles Retrieval-Augmented Generation (RAG) mode. Note: This feature is currently a placeholder.	`false`
`rag_system_prompt_template`	string	A template for the system prompt when RAG is enabled. It must contain a `{context}` placeholder.	`Use the following context to answer the user's question. If the context doesn't contain the answer, say you don't know.\n\n--- Context ---\n{context}\n--- End Context ---`

System Prompt

The node loads a system prompt from the system_prompt.md file located in the src/llm/resource directory.

This prompt is used to set the behavior and personality of the assistant.

Note that the current implementation also contains some OrpheusTTS specific tags for emotion inferencing, so those can later be outputted as such in the TTS node.

Services

The node provides two services:

`/llm`

Type: ric_messages/srv/LLMChat
Description: This is the main service for interacting with the LLM. It takes a prompt from the user and returns the model's response.
Request:
- prompt (string): The user's input to the model.
Response:
- response (string): The model's generated text.

`/clear_history`

Type: std_srvs/srv/Empty
Description: This service clears the conversation history maintained by the node. This is useful for starting a new conversation without restarting the node.
Request: (empty)
Response: (empty)

How it Works

Initialization: The node starts, declares its parameters, loads the system prompt, and creates the services.
Service Call: Another ROS2 node calls the /llm service with a prompt.
Message Building: The completion_callback is triggered. It calls _build_messages to construct a list of messages, including the system prompt, the conversation history (from a deque), and the new user prompt.
API Request: The node sends this list of messages in a JSON payload to the Llama.cpp server via an HTTP POST request.
Response Handling:
- If the server returns a successful response, the node extracts the generated text.
- The user prompt and the assistant's response are added to the history deque.
- The generated text is returned to the service caller.
- If the server returns an error, an error message is logged and returned.