Getting models

(this part is based on the set-up for Ollama, the inference engine used by our LLM server, and applied to our containerized environment)

We preload our Docker image with Llama3, but trained on German data by DiscoResearch, named dl-llama3. (unless specified otherwise)

Pulling from Ollama's model library

While the container is running, you can pull from Ollama's model library with the simple command:

docker compose exec llm-server ollama pull [MODEL]

or by requesting a pull with the API (using curl as an example of an requesting application):

curl http://<IP-ADDRESS>:PORT/api/pull -d '{
    "name": <MODEL>
}'

Making a modified prompt from an already pulled model

We create a Modelfile (similar to Docker's Dockerfile), which specifies the criteria in which the given model is modified.

(taken from Ollama's documentation)

Format of the `Modelfile`

# comment
INSTRUCTION arguments

Instruction	Description
`FROM` (required)	Defines the base model to use.
`PARAMETER`	Sets the parameters for how Ollama will run the model.
`TEMPLATE`	The full prompt template to be sent to the model.
`SYSTEM`	Specifies the system message that will be set in the template.
`ADAPTER`	Defines the (Q)LoRA adapters to apply to the model.
`LICENSE`	Specifies the legal license.
`MESSAGE`	Specify message history.

Example

A simple example for a Modelfile would be something like this:

FROM mistral
# Sets the base model to be used as 'mistral'

PARAMETER temperature 1.3
# sets the model's 'temperature', i.e. its creative capability - increasing it would make its answers more creative

SYSTEM """
You are a helping robot, called RICBot, inside a research facility called the DFKI in Bremen, Germany.

You receive prompts in natural language, mostly in German.
"""
# Sets a 'system' prompt, which gives the model some instructions, as how to answer. (comparable to a given personality or for specifying answer syntax)

Utilizing `Modelfile`s

After we created our Modelfile, we save it as a Modelfile in a directory of our choice (currently it is <INSTALLATION DIR>/ollama/models).

Then we type:

docker compose exec llm-server ollama create [MYMODEL] -f ~/models/Modelfile

to insert it into our available prompts.

Using custom models

Ollama, like other inference engines, such as llama.cpp, utilizes either raw or quantized models in the .gguf format. To use those, you can also create a Modelfile for it, with the FROM instruction containing a file path to the model file.

FROM ./our-custom-model-13b.Q4_0.gguf

and creating it with the create command mentioned above.

Models in use by the LLM server

The model `dl-llama3`

DiscoResearch's pre-trained version of Llama3, for better German recognition, is pre-created for your liking, with a basic modelfile. It sits inside the containers /models/dl-llama3 directory, where changes to its modelfile can be made, via FROM dl-llama3.

The model `llama3.1`

Meta's new version of their Llama LLM, Llama 3.1, currently has better pattern recognition than the previously used dl-llama3. Prior to utilizing llama3.1, the rather subpar pattern recognition abilities of dl-llama3 were detrimental for our performance, with the worst-case of 8 attempted generations per prompt, before the required criteria, like correct usage of the Grammar, were met. Now, way fewer errors regarding the usage of our rather complicated Grammar occur, which improves speed and reliability at the cost of higher quality natural German communication.

_{(still better than stock mistral!)}