ROS2 Text to Speech
A ROS2 Jazzy client for an OpenAI API compatible TTS server using OrpheusTTS as an implementation.
The TTS models for specific languages are accessed through llama-swap that runs them in instances of llama-server
from llama.cpp.
Currently there is support only for German and English.
For decoding the response tokens from the TTS model into actual audio the solution uses SNAC (Multi-Scale Neural Audio Codec). This part might be unneeded in the future as llama.cpp should get support for TTS models.
The tts node includes /tts
as a ROS2 service, which is described in the services documentation.
This documentation provides all the information you need to get started with ros_tts
, from setting up your environment to using the service and contributing to the project.
---
config:
theme: redux
look: neo
---
flowchart TB
subgraph s1["Server"]
direction TB
n11["Chat Node"]
subgraph s11["Whisper STT"]
direction TB
n112["STT Node"] <-- HTTP --> n111["whisper.cpp"]
end
subgraph s12["Gemma 3 12b"]
direction TB
n122["LLM Node"] <-- HTTP --> n121["llama.cpp"]
end
subgraph s13["Orpheus TTS"]
direction TB
n132["TTS Node"] <-- HTTP --> n131["llama-swap"]
end
n11 -- ROS2 --> n112
n11 -- ROS2 --> n122
n11 -- ROS2 --> n132
end
subgraph s21["Robot"]
direction TB
n21["UI Com"] <-- WebSocket --> n22["Emotion"]
end
n11 <-- "ROS2" --> n21
Requirements
Contributing
We welcome contributions to this project! Please see the contributing guidelines at contributing.md
in the root of this repository for more information on how to get started.