Getting models
(this part is based on the set-up for Ollama, the inference engine used by our LLM server, and applied to our containerized environment)
We preload our Docker image with Llama3, but trained on German data by DiscoResearch, named
dl-llama3
. (unless specified otherwise)
Pulling from Ollama's model library
While the container is running, you can pull from Ollama's model library with the simple command:
or by requesting a pull with the API (usingcurl
as an example of an requesting application):
Making a modified prompt from an already pulled model
We create a Modelfile
(similar to Docker's Dockerfile
), which specifies the criteria in which the given model is modified.
(taken from Ollama's documentation)
Format of the Modelfile
Instruction | Description |
---|---|
FROM (required) |
Defines the base model to use. |
PARAMETER |
Sets the parameters for how Ollama will run the model. |
TEMPLATE |
The full prompt template to be sent to the model. |
SYSTEM |
Specifies the system message that will be set in the template. |
ADAPTER |
Defines the (Q)LoRA adapters to apply to the model. |
LICENSE |
Specifies the legal license. |
MESSAGE |
Specify message history. |
Example
A simple example for a Modelfile
would be something like this:
FROM mistral
# Sets the base model to be used as 'mistral'
PARAMETER temperature 1.3
# sets the model's 'temperature', i.e. its creative capability - increasing it would make its answers more creative
SYSTEM """
You are a helping robot, called RICBot, inside a research facility called the DFKI in Bremen, Germany.
You receive prompts in natural language, mostly in German.
"""
# Sets a 'system' prompt, which gives the model some instructions, as how to answer. (comparable to a given personality or for specifying answer syntax)
Utilizing Modelfile
s
After we created our Modelfile
, we save it as a Modelfile
in a directory of our choice (currently it is <INSTALLATION DIR>/ollama/models
).
Then we type:
to insert it into our available prompts.Using custom models
Ollama, like other inference engines, such as llama.cpp, utilizes either raw or quantized models in the .gguf
format. To use those, you can also create a Modelfile
for it, with the FROM
instruction containing a file path to the model file.
Models in use by the LLM server
The model dl-llama3
DiscoResearch's pre-trained version of Llama3, for better German recognition, is pre-created for your liking, with a basic modelfile. It sits inside the containers /models/dl-llama3
directory, where changes to its modelfile can be made, via FROM dl-llama3
.
The model llama3.1
Meta's new version of their Llama LLM, Llama 3.1, currently has better pattern recognition than the previously used dl-llama3
.
Prior to utilizing llama3.1
, the rather subpar pattern recognition abilities of dl-llama3
were detrimental for our performance, with the worst-case of 8 attempted generations per prompt, before the required criteria, like correct usage of the Grammar, were met. Now, way fewer errors regarding the usage of our rather complicated Grammar occur, which improves speed and reliability at the cost of higher quality natural German communication.
(still better than stock mistral
!)