Skip to content

LLM-Server

Introduction

This document provides comprehensive guidelines for setting up, using, and troubleshooting the LLM-Server, which integrates a Speech-to-Text (STT) feature using the Whisper library.

Setup Guide

Warning

This setup guide requires you to have taken the steps in the Setup Documentation. If you haven't done so already, do those steps first and only afterwards continue here.

Dockerfile Overview

The Docker container is built using an Ubuntu 22.04 base image. Key dependencies include Python, pip, and necessary libraries for running the Whisper-based STT.

    FROM ubuntu:22.04
    ENV DEBIAN_FRONTEND=noninteractive
    RUN apt update && apt install -y python3-pip
    RUN pip install whisper mkdocs mkdocs-material

Environment Variables

The Docker includes some environment variables that can be set by the user at will. The ones relevant to the STT are:

  • AUDIO_IN: Directory inside the container where audio files are stored.

Running the STT

To manually run the STT, you just have to start your Docker Container. You can build a test script that you can execute.

Example

from stt import STT

whisper = STT('stt/audio_files', 'large')
text = whisper.audio_to_text('1.mp3')

if text is None:
    print('Audio could not be recognized.')
else:
    print(f'Text: {text}')

The STT Class

__init__(audio_input_folder, model)

Initializes the Speech To Text, loading the desired whisper model.

Parameters:

Name Type Description Default
audio_input_folder str

The Folder where audio files are put into and can be read from.

required
model str

The desired Whisper model. More info here.

required

audio_to_text(file)

Converts audio into text.

Parameters:

Name Type Description Default
file str | bytes

The Audio file to be converted into text. Can be a file, can also be a direct bytestream.

required

Returns:

Name Type Description
text str

The text the audio file contained, if whisper could recognize the audio file.

none None

Nothing, if whisper did not recognize what the speaker just said.