Running the Service
This document provides instructions on how to run the tts
service node and interact with it.
How to Run the Service
To run the text-to-speech service, you first need to source your ROS2 environment and then use the ros2 run
command.
You can run the service node with the following command. This will start the TTS client and make it available for receiving text-to-speech requests.
When running the service in docker, you can enter the container with the following command, where the above steps are already done:
With that, you should be able to follow the next instructions.
Parameters
You can customize the behavior of the node by passing the following ROS parameters.
Argument | Description | Default |
---|---|---|
server_url |
The URL of the llama.cpp server's completions endpoint for TTS inference. | http://localhost:8080/v1/completions |
en_model |
Model identifier for English TTS. | en |
en_voice |
Voice profile to use for English text-to-speech. | leah |
en_max_tokens |
Maximum number of tokens to generate for English TTS. | 10240 |
en_temperature |
Controls randomness in English TTS generation. Higher values increase creativity. | 0.6 |
en_top_p |
Nucleus sampling parameter for English TTS. Controls diversity of token selection. | 0.9 |
en_repeat_penalty |
Penalty for token repetition in English TTS to encourage more varied output. | 1.1 |
de_model |
Model identifier for German TTS. | de |
de_voice |
Voice profile to use for German text-to-speech. | max |
de_max_tokens |
Maximum number of tokens to generate for German TTS. | 10240 |
de_temperature |
Controls randomness in German TTS generation. Higher values increase creativity. | 0.6 |
de_top_p |
Nucleus sampling parameter for German TTS. Controls diversity of token selection. | 0.9 |
de_repeat_penalty |
Penalty for token repetition in German TTS to encourage more varied output. | 1.1 |
Service Requests
/tts
To convert text to speech, you can call the /tts
service.
It uses the ric_messages/srv/TextToAudioBytes
service type. Replace ${text}
with your desired text and ${language}
with the target language.
ros2 service call /tts ric_messages/srv/TextToAudioBytes "{'text': '${text}', 'language': '${language}'}"
Supported Languages
The service currently supports the following languages:
- English: Use
"english"
or"en"
- German: Use
"german"
or"de"
Example Usage
Convert English text to speech:
ros2 service call /tts ric_messages/srv/TextToAudioBytes "{'text': 'Hello, how are you today?', 'language': 'english'}"
Convert German text to speech:
ros2 service call /tts ric_messages/srv/TextToAudioBytes "{'text': 'Hallo, wie geht es dir heute?', 'language': 'german'}"
Response
The service returns audio data in WAV format as a byte array in the audio
field of the response.
This audio can be saved to a file or played directly by audio processing applications.
You can use the convert.py
script as an example to save the audio response to a WAV file.
First, save the service response to a JSON file, then convert it:
# Call the service and save response to JSON
ros2 service call /tts ric_messages/srv/TextToAudioBytes "{'text': 'Hello world', 'language': 'english'}" > input.json
# Convert the JSON response to a WAV file
python3 convert.py
The convert.py
script reads the JSON response, extracts the audio byte array, and saves it as output.wav
:
After running the script, you can play the generated output.wav
file with any audio player.
Troubleshooting
Common Issues
- Service not available: Ensure the TTS server is running and accessible at the configured URL
- Unsupported language: Check that you're using a supported language identifier ("en"/"english" or "de"/"german")
- Empty audio response: Verify that the input text is not empty and the model parameters are correctly configured
- GPU out of memory: If using GPU deployment, consider reducing
LLAMA_ARG_N_GPU_LAYERS
or switch to CPU deployment
Checking Service Status
To verify the service is running:
To check service type:
Logs
View service logs when running in Docker: