Skip to content

Code documentation

The chat_node is a central component that orchestrates the entire chat workflow. It integrates multiple services to process an audio input and generate an audio response from it.

graph TD
    subgraph User
        A[Client]
    end

    subgraph "ROS2 Services"
        B(chat_node)
        C(STT Service /stt)
        D(LLM Service /llm)
        E(TTS Service /tts)
    end

    A -- "UI Audio" --> B
    B -- "UI Audio" --> C
    C -- "Transcription + Language" --> B
    B -- "Transcription" --> D
    D -- "Response" --> B
    B -- "Response + Language" --> E
    E -- "Response Audio" --> B
    B -- "Response Audio + Response + Language + Emotion" --> A

The chat_node performs the following steps:

  1. Receives an audio request: It listens on the /chat service (ric_messages/srv/Chat) for an incoming request containing audio data.
  2. Speech-to-Text (STT): It calls the /stt service (ric_messages/srv/AudioBytesToText) to transcribe the input audio into text and detect the language (supports English and German).
  3. Language Model (LLM): It sends the transcribed text to the /llm service (ric_messages/srv/LLMChat) to generate a meaningful and context-aware response. The node also extracts an emotion from the LLM's response (e.g., {calm}).
  4. Text-to-Speech (TTS): It takes the cleaned text response from the LLM and calls the /tts service (ric_messages/srv/TextToAudioBytes) to convert it back into audio.
  5. Returns Response: It sends the final generated audio, language, text, and emotion back to the original caller.

Asynchronous Processing

The chat_node is designed to be highly responsive and efficient, leveraging asynchronous programming to handle long-running tasks without blocking the main execution thread. This is crucial for a robotics system where nodes must remain responsive to other events.

  • The node uses Python's asyncio library to manage concurrent operations. The chat_callback and the service client calls (_send_stt_request, _send_llm_request, _send_tts_request) are defined as async functions. This allows the node to await the results of the service calls (which can take a significant amount of time) without freezing. While waiting for a response from one service, the node can still process other events.

  • The /chat service is created with a ReentrantCallbackGroup. It allows the chat_callback to be called again while a previous call is still await-ing a long-running operation (like a call to the LLM service). This prevents deadlocks and allows the node to handle multiple concurrent requests to the /chat service. That means multiple calls to the /chat service, while the service is still processing the previous request won't "eat up" the request and would queue up the request instead.

  • The main function and spin_node function integrate the asyncio event loop with the ROS2 event system (rclpy). The spin_node function uses rclpy.spin_once() within an async loop, allowing both ROS2 callbacks and asyncio tasks to be processed concurrently. The await asyncio.sleep(0.05) ensures that the event loop can run other asyncio tasks.