LLM-Server
Introduction
This document provides comprehensive guidelines for setting up, using, and troubleshooting the LLM-Server, which integrates a Speech-to-Text (STT) feature using the Whisper library.
Setup Guide
Warning
This setup guide requires you to have taken the steps in the Setup Documentation. If you haven't done so already, do those steps first and only afterwards continue here.
Dockerfile Overview
The Docker container is built using an Ubuntu 22.04 base image. Key dependencies include Python, pip, and necessary libraries for running the Whisper-based STT.
FROM ubuntu:22.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && apt install -y python3-pip
RUN pip install whisper mkdocs mkdocs-material
Environment Variables
The Docker includes some environment variables that can be set by the user at will. The ones relevant to the STT are:
- AUDIO_IN: Directory inside the container where audio files are stored.
Running the STT
To manually run the STT, you just have to start your Docker Container. You can build a test script that you can execute.
Example
The STT Class
__init__(audio_input_folder, model)
Initializes the Speech To Text, loading the desired whisper model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_input_folder
|
str
|
The Folder where audio files are put into and can be read from. |
required |
model
|
str
|
The desired Whisper model. More info here. |
required |
audio_to_text(file)
Converts audio into text.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
str | bytes
|
The Audio file to be converted into text. Can be a file, can also be a direct bytestream. |
required |
Returns:
Name | Type | Description |
---|---|---|
text |
str
|
The text the audio file contained, if whisper could recognize the audio file. |
none |
None
|
Nothing, if whisper did not recognize what the speaker just said. |