Running with Docker
This project can be run using Docker and Docker Compose. Install it from here if not already available.
There are two separate configurations available: one for running with NVIDIA GPU support and another for CPU-only execution.
IMPORTANT: Make sure to also clone the ric-messages git submodule located in src folder with:
With GPU Support
To run the application with GPU acceleration, you will need to have the NVIDIA Container Toolkit installed on your system.
Once you have the toolkit installed, you can run the application using the following command:
This will build and run the tts and tts-node services.
The tts service will automatically download the specified TTS models and start the llama-swap server with GPU support.
Important: Do note that the ROS2 node makes use of rmw_zenoh for ROS2 communication. Use the provided zenoh_router for this purpose.
CPU-Only
If you do not have a compatible NVIDIA GPU, you can run the application in CPU-only mode.
To do this, use the compose.cpu.yaml file:
This will start the same services, but the tts service will be configured to run entirely on the CPU.
Note that the execution time using CPU-only will be very slow.
Services
The Docker Compose configurations define three main services: tts-model-downloader, tts, and tts-node.
The tts-model-downloader Service
This service is responsible for downloading the required Orpheus TTS models from Hugging Face.
- Uses a lightweight Python Alpine image to install
huggingface_hub[cli] - Downloads two models:
- English model:
isaiahbjork/orpheus-3b-0.1-ft-Q4_K_M-GGUF - German model:
TheVisitorX/3b-de-ft-research_release-Q4_K_M-GGUF - Models are stored in
./.modelsdirectory on the host system - Runs as an initialization step before other services start
The tts Service
This service runs the llama-swap server, which manages the TTS model instances and provides an OpenAI-compatible API endpoint.
- Uses pre-built Docker images from
ghcr.io/mostlygeek/llama-swap(cudafor GPU,cpufor CPU-only) - Manages multiple TTS models through llama-swap configuration
- Uses
llama-swap.multi.config.yamlfor multi-model support - Exposes port 8080 internally for the TTS API
- Includes health checks to ensure proper startup sequence
Environment Variables
| Variable | Description | Default Value |
|---|---|---|
LLAMA_ARG_N_PARALLEL |
Number of requests to process in parallel | 2 |
LLAMA_ARG_THREADS |
Number of threads to use (-1 for all available) | -1 |
LLAMA_ARG_N_GPU_LAYERS |
Number of model layers to offload to GPU | 49 |
LLAMA_ARG_NO_WEBUI |
Disable the web interface | true |
The tts-node Service
This service runs the ROS2 client node that acts as a bridge between the ROS2 ecosystem and the tts service.
- Uses
harbor.hb.dfki.de/helloric/ros_tts:latest(VPN required) or builds from the local Dockerfile - The node provides a ROS2 service at
/ttsthat allows other ROS2 nodes to send text and receive audio - Supports both English and German text-to-speech conversion
- Communicates with the
ttsservice over the internal Docker network - Configured to start only after the
ttsservice is healthy and running - Uses Zenoh as RMW implementation by default
Environment Variables
| Variable | Description | Default Value |
|---|---|---|
LLAMACPP_URL |
URL of the llama-swap server's completions endpoint | http://tts:8080/v1/completions |
PYTHONUNBUFFERED |
Prevents Python from buffering stdout and stderr | 1 |
RMW_IMPLEMENTATION |
ROS2 middleware implementation | rmw_zenoh_cpp |
ROS_AUTOMATIC_DISCOVERY_RANGE |
Disables automatic discovery in ROS2 | OFF |
ZENOH_ROUTER_CHECK_ATTEMPTS |
Number of attempts to check for Zenoh router. 0 means wait indefinitely |
0 |
ZENOH_CONFIG_OVERRIDE |
Zenoh configuration override, see rmw_zenoh | mode="client";connect/endpoints=["tcp/host.docker.internal:7447"] |
Usage
Create a ROS2 client for the /tts service and call it.
The service uses the ric_messages/srv/TextToAudioBytes interface.
For exact definition check out the ric_messages repository.
For usage examples, check out service.