LLM-Server

Introduction

This document provides comprehensive guidelines for setting up, using, and troubleshooting the LLM-Server, which integrates a Speech-to-Text (STT) feature using the Whisper library.

Setup Guide

Warning

This setup guide requires you to have taken the steps in the Setup Documentation. If you haven't done so already, do those steps first and only afterwards continue here.

Dockerfile Overview

The Docker container is built using an Ubuntu 22.04 base image. Key dependencies include Python, pip, and necessary libraries for running the Whisper-based STT.

    FROM ubuntu:22.04
    ENV DEBIAN_FRONTEND=noninteractive
    RUN apt update && apt install -y python3-pip
    RUN pip install whisper mkdocs mkdocs-material

Environment Variables

The Docker includes some environment variables that can be set by the user at will. The ones relevant to the STT are:

AUDIO_IN: Directory inside the container where audio files are stored.

Running the STT

To manually run the STT, you just have to start your Docker Container. You can build a test script that you can execute.

Example

Python

from stt import STT

whisper = STT('stt/audio_files', 'large')
text = whisper.audio_to_text('1.mp3')

if text is None:
    print('Audio could not be recognized.')
else:
    print(f'Text: {text}')

The STT Class

`init(audio_input_folder, model)`

Initializes the Speech To Text, loading the desired whisper model.

Parameters:

Name	Type	Description	Default
`audio_input_folder`	`str`	The Folder where audio files are put into and can be read from.	required
`model`	`str`	The desired Whisper model. More info here.	required

`audio_to_text(file)`

Converts audio into text.