Running with Docker
This project can be run using Docker and Docker Compose. Install it from here if not already available.
There are two separate configurations available: one for running with NVIDIA GPU support and another for CPU-only execution.
IMPORTANT: Make sure to also clone the ric-messages
git submodule located in src
folder with:
With GPU Support
To run the application with GPU acceleration, you will need to have the NVIDIA Container Toolkit installed on your system.
Once you have the toolkit installed, you can run the application using the following command:
This will build and run the stt
and stt-node
services.
The stt
service will automatically download the specified model and start the Whisper.CPP server with GPU support.
Info: If the download script ever fails, that means, the download script from Whisper.CPP in models/download-ggml-model.sh
was probably moved or removed. Try to download the ggml-large-v3-turbo-q5_0.bin
to /app/models
, so it automatically mounts to .models
.
Important: Do note that the ROS2 node makes use of rmw_zenoh
for ROS2 communication. Use the provided zenoh_router for this purpose.
CPU-Only
If you do not have a compatible NVIDIA GPU, you can run the application in CPU-only mode.
To do this, use the compose.cpu.yaml
file:
This will start the same services, but the stt
service will be configured to run entirely on the CPU.
Note that the execution time using CPU-only will be much slower than without GPU, but it will be not as slow as an LLM.
Services
The Docker Compose configurations define two main services: stt
and stt-node
, along with a helper service stt-model-downloader
.
The stt
Service
This service is responsible for running the whisper.cpp
server, which performs the actual speech-to-text transcription.
- It is preceded by the
stt-model-downloader
service, which downloads the specified model from the internet. The model is determined by theWHISPER_MODEL
variable in the.env
file. - The
stt
service uses a custom Docker image (whisper.cuda.Dockerfile
for GPU) or the officialwhisper.cpp
image (compose.cpu.yaml
for CPU). - It mounts the local
./.models
directory to/models
, so downloaded models are persisted on the host. - The server exposes its transcription service on port
8080
within the Docker network. - A healthcheck runs every 30 seconds to ensure the
stt-node
only starts after the server is running. - Check the official Whisper.CPP documentation for all available server arguments.
Environment
Variable | Description | Default Value |
---|---|---|
WHISPER_MODEL |
The name of the model to download. | large-v3-turbo-q5_0 |
WHISPER_THREADS |
The number of threads to use for processing. | 8 |
The stt-node
Service
This service runs the ROS2 node that acts as a bridge between the ROS2 ecosystem and the stt
service.
- It builds from the local
Dockerfile
. - The node provides a ROS2 service at
/stt
that allows other ROS2 nodes to send audio and receive transcribed text. - It communicates with the
stt
service over the internal Docker network. - It is configured to start only after the
stt
service is healthy and running. - It uses Zenoh as the RMW implementation by default. To change it, refer to the
zenoh_router
documentation.
Environment
Variable | Description | Default Value |
---|---|---|
WHISPER_URL |
URL of the whisper.cpp server endpoint. | http://stt:8080/inference |
PYTHONUNBUFFERED |
Prevents Python from buffering stdout and stderr. | 1 |
RMW_IMPLEMENTATION |
ROS2 middleware implementation. | rmw_zenoh_cpp |
ROS_AUTOMATIC_DISCOVERY_RANGE |
Disables automatic discovery in ROS2. | OFF |
ZENOH_ROUTER_CHECK_ATTEMPTS |
Number of attempts to check for Zenoh router. 0 means wait indefinitely. |
0 |
ZENOH_CONFIG_OVERRIDE |
Zenoh configuration override, see rmw_zenoh. | mode="client";connect/endpoints=["tcp/host.docker.internal:7447"] |
Usage
Create a ROS2 client for the /stt
service and call it.
The service uses the ric_messages/srv/AudioBytesToText
interface.
For exact definition check out the ric_messages
repository.
For usage examples, check out service.