Running with Docker
This project can be run using Docker and Docker Compose. Install it from here if not already available.
There are two separate configurations available: one for running with NVIDIA GPU support and another for CPU-only execution.
IMPORTANT: Make sure to also clone the ric-messages
git submodule located in src
folder with:
With GPU Support
To run the application with GPU acceleration, you will need to have the NVIDIA Container Toolkit installed on your system.
Once you have the toolkit installed, you can run the application using the following command:
This will build and run the tts
and tts-node
services.
The tts
service will automatically download the specified TTS models and start the llama-swap server with GPU support.
Important: Do note that the ROS2 node makes use of rmw_zenoh
for ROS2 communication. Use the provided zenoh_router for this purpose.
CPU-Only
If you do not have a compatible NVIDIA GPU, you can run the application in CPU-only mode.
To do this, use the compose.cpu.yaml
file:
This will start the same services, but the tts
service will be configured to run entirely on the CPU.
Note that the execution time using CPU-only will be very slow.
Services
The Docker Compose configurations define three main services: tts-model-downloader
, tts
, and tts-node
.
The tts-model-downloader
Service
This service is responsible for downloading the required Orpheus TTS models from Hugging Face.
- Uses a lightweight Python Alpine image to install
huggingface_hub[cli]
- Downloads two models:
- English model:
isaiahbjork/orpheus-3b-0.1-ft-Q4_K_M-GGUF
- German model:
TheVisitorX/3b-de-ft-research_release-Q4_K_M-GGUF
- Models are stored in
./.models
directory on the host system - Runs as an initialization step before other services start
The tts
Service
This service runs the llama-swap server, which manages the TTS model instances and provides an OpenAI-compatible API endpoint.
- Uses pre-built Docker images from
ghcr.io/mostlygeek/llama-swap
(cuda
for GPU,cpu
for CPU-only) - Manages multiple TTS models through llama-swap configuration
- Uses
llama-swap.multi.config.yaml
for multi-model support - Exposes port 8080 internally for the TTS API
- Includes health checks to ensure proper startup sequence
Environment Variables
Variable | Description | Default Value |
---|---|---|
LLAMA_ARG_N_PARALLEL |
Number of requests to process in parallel | 2 |
LLAMA_ARG_THREADS |
Number of threads to use (-1 for all available) | -1 |
LLAMA_ARG_N_GPU_LAYERS |
Number of model layers to offload to GPU | 49 |
LLAMA_ARG_NO_WEBUI |
Disable the web interface | true |
The tts-node
Service
This service runs the ROS2 client node that acts as a bridge between the ROS2 ecosystem and the tts
service.
- Uses
harbor.hb.dfki.de/helloric/ros_tts:latest
(VPN required) or builds from the local Dockerfile - The node provides a ROS2 service at
/tts
that allows other ROS2 nodes to send text and receive audio - Supports both English and German text-to-speech conversion
- Communicates with the
tts
service over the internal Docker network - Configured to start only after the
tts
service is healthy and running - Uses Zenoh as RMW implementation by default
Environment Variables
Variable | Description | Default Value |
---|---|---|
LLAMACPP_URL |
URL of the llama-swap server's completions endpoint | http://tts:8080/v1/completions |
PYTHONUNBUFFERED |
Prevents Python from buffering stdout and stderr | 1 |
RMW_IMPLEMENTATION |
ROS2 middleware implementation | rmw_zenoh_cpp |
ROS_AUTOMATIC_DISCOVERY_RANGE |
Disables automatic discovery in ROS2 | OFF |
ZENOH_ROUTER_CHECK_ATTEMPTS |
Number of attempts to check for Zenoh router. 0 means wait indefinitely |
0 |
ZENOH_CONFIG_OVERRIDE |
Zenoh configuration override, see rmw_zenoh | mode="client";connect/endpoints=["tcp/host.docker.internal:7447"] |
Usage
Create a ROS2 client for the /tts
service and call it.
The service uses the ric_messages/srv/TextToAudioBytes
interface.
For exact definition check out the ric_messages
repository.
For usage examples, check out service.