ROS2 Text to Speech

A ROS2 Jazzy client for an OpenAI API compatible TTS server using OrpheusTTS as an implementation.

The TTS models for specific languages are accessed through llama-swap that runs them in instances of llama-server from llama.cpp. Currently there is support only for German and English.

For decoding the response tokens from the TTS model into actual audio the solution uses SNAC (Multi-Scale Neural Audio Codec). This part might be unneeded in the future as llama.cpp should get support for TTS models.

The tts node includes /tts as a ROS2 service, which is described in the services documentation.

This documentation provides all the information you need to get started with ros_tts, from setting up your environment to using the service and contributing to the project.

---
config:
    theme: redux
    look: neo
---
flowchart TB
    subgraph s1["Server"]
        direction TB
        n11["Chat Node"]

        subgraph s11["Whisper STT"]
            direction TB
            n112["STT Node"] <-- HTTP --> n111["whisper.cpp"]
        end

        subgraph s12["Gemma 3 12b"]
                direction TB
                n122["LLM Node"] <-- HTTP --> n121["llama.cpp"]
            end

        subgraph s13["Orpheus TTS"]
            direction TB
            n132["TTS Node"] <-- HTTP --> n131["llama-swap"]
        end

        n11 -- ROS2 --> n112
        n11 -- ROS2 --> n122
        n11 -- ROS2 --> n132
    end
    subgraph s21["Robot"]
        direction TB
        n21["UI Com"] <-- WebSocket --> n22["Emotion"]
    end
    n11 <-- "ROS2" --> n21

Requirements

Contributing

We welcome contributions to this project! Please see the contributing guidelines at contributing.md in the root of this repository for more information on how to get started.