Code Documentation
This document provides an overview of the stt_node.py
script, which is the core of the ros_stt
package.
SpeechToTextNode
The SpeechToTextNode
class is a ROS2 node that acts as a client to an OpenAI-compatible STT server. It exposes a ROS2 service to transcribe audio into text.
Parameters
The node exposes the following ROS2 parameter:
Parameter | Type | Description | Default Value |
---|---|---|---|
server_url |
string | The URL of the whisper.cpp server endpoint. | http://localhost:8080/inference |
Services
The node provides one main service:
/stt
- Type:
ric_messages/srv/AudioBytesToText
- Description: This service takes a raw audio byte array and returns the transcribed text along with the detected language.
- Request:
audio
(uint8[]): The raw audio data to be transcribed.
- Response:
text
(string): The transcribed text from the audio.language
(string): The language automatically detected by the server.
How it Works
- Initialization: The node starts, declares its
server_url
parameter, and creates the/stt
service. - Service Call: Another ROS2 node calls the
/stt
service with a request containing the raw audio data as auint8
array. - Callback Execution: The
speech_to_text_callback
method is triggered. - Data Preparation: The incoming
uint8
array is wrapped in anio.BytesIO
object to be sent as a file in an HTTP request. - API Request: The node sends the audio data in a
multipart/form-data
POST request to thewhisper.cpp
server URL. It specifically requests averbose_json
response to ensure it receives the detected language in addition to the text. - Response Handling:
- If the server returns a successful response (HTTP 200), the node parses the JSON payload.
- It extracts the
text
andlanguage
fields from the response. - The extracted data is populated into the ROS service response object.
- If the server returns an error or the transcription is empty, an appropriate error or warning is logged.
- Return to Caller: The ROS service response, containing the text and language, is returned to the original caller.