Ollama with Open WebUI (20240730)

Linux hosts set up instructions for installing Ollama with the Open WebUI using Dockge, a self-hosted Docker Compose stacks management.

Jul 7, 2024
💡
Linux hosts set up instructions for installing Ollama with the Open WebUI using Dockge, a self-hosted Docker Compose stacks management.
 
Revision: 20240730-0 (init: 20240707)
 
This post details the installation of Ollama and the Open WebUI using Dockge for docker-compose stacks to run LLMs on your Linux NVIDIA GPU host.
 
 
Ollama is a free and open-source tool designed to simplify running large language models (LLMs) locally on a machine (preferably with a GPU). It allows users to download, run, and manage open-source LLMs on their local systems without complex setups or cloud services. Ollama supports various open-source models, including Llama 3, Mistral, Gemma.
Ollama runs models locally (the usable model size depends on the amount of memory available on the GPU), which ensures privacy and control over our queries. Because it exposes a REST API, many applications integrate it; Open WebUI is one of many others, as can be seen at https://github.com/ollama/ollama?tab=readme-ov-file#community-integrations
Open WebUI is an open-source web interface designed to work seamlessly with Ollama. It provides an intuitive graphical user interface for interacting with various AI models, with features like chat interfaces, model management, and prompt templates. This allows us to generate text, answer questions, and perform multiple language-related tasks. It helps experiment with or integrate language models into projects while maintaining control over privacy and data.

Ollama

We will use Dockge and create a new ollama stack with the following compose.yaml for this setup:
services: ollama: image: ollama/ollama:latest container_name: ollama ports: - 11434:11434 volumes: - ./ollama:/root/.ollama - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro command: serve restart: unless-stopped deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: - gpu labels: - "com.centurylinklabs.watchtower.enable=true"
The serve command asks Ollama to answer API requests, making it available for other tools to use.
After starting the ollama stack, we should have access to http://127.0.0.1:11434/ answering with Ollama is running, which is what is expected: API access is enabled for other tools to use.
With an HTTPS reverse proxy available, let’s configure it to give map https://ollama.example.com/ to this HTTP resource, which will become an option for a later section of this writeup.

Obtaining some models

Here, the amount of memory of the GPU is going to be a limiting factor:
Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
From Dockge’s >_ bash window, ask ollama to pull some models. For example, ollama pull llama3:8b, which we can follow by ollama run llama3:8b and ask it a question.
Download some compatible models, as per https://ollama.com/library, while keeping in mind the memory limitation of such models for the GPU.
Each model will be downloaded locally into /opt/stacks/ollama/ollama/models directory. Looking into /opt/stacks/ollama/ollama/models/manifests/registry.ollama.ai/library/ we will see the list of models installed locally. Investigating the model directory, we will see the obtained model’s options (7b, etc).
Since this is a container, we can also get a bash within the running container (obtain the list using docker container ls)and add more models by running docker exec -it <CONTAINERID> /bin/bash, then run ollama pull or ollama run commands. Similarly, the ollama command has some sub-commands; in particular, be aware of list and rm should you want to clean up some older downloaded models.
After downloading a model or a few, let’s setup Open WebUI to access them.

Open WebUI

Setup using the Ollama’s compose.yaml

This method adds open-webui to the already existing ollama stack.
By default, Docker Compose creates a network for services in the same compose.yaml to communicate with one another. When this is done, services end up on the same private subnetwork, and it is possible to use the service names to communicate (i.e. a service named ollama can be accessed using the ollama name).
This method will set OLLAMA_HOST to ask Ollama to listen on all available network interfaces. Because Docker containers operate within an abstracted network environment different from the host's network interfaces, containers are connected to a virtual network interface created by Docker. This interface is typically part of a bridge network, which isolates the container's network from the host's network while allowing communication between containers on the same bridge network (in this case, both services are being started within the same compose.yaml file).
By using OLLAMA_HOST=0.0.0.0:11434 in this setup, we request Ollama to answer requests beyond localhost only, and this will allow us to have the open_webui service talk to ollama directly.
The final compose.yaml is as follows:
services: ollama: image: ollama/ollama:latest container_name: ollama ports: - 11434:11434 volumes: - ./ollama:/root/.ollama - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro command: serve environment: - OLLAMA_HOST=0.0.0.0:11434 restart: unless-stopped deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: - gpu labels: - "com.centurylinklabs.watchtower.enable=true" open-webui: image: ghcr.io/open-webui/open-webui:cuda container_name: open-webui volumes: - ./open-webui:/app/backend/data - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro ports: - 3030:8080 depends_on: - ollama restart: unless-stopped environment: - OLLAMA_BASE_URL=http://ollama:11434 deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: - gpu labels: - "com.centurylinklabs.watchtower.enable=true"
In addition to the open-webui service which depends_on and communicate with the ollama service, and listens on port 8080 (but we are exposing it on port 3030), we altered the ollama stack to add the environment section: the OLLAMA_HOST variable requests the service to listen to all interfaces, not just 127.0.0.1 which is only local to the ollama container.
💡
Each service in the compose.yaml file gets its own private IP on the private bridge subnet created for the stack. In a terminal, run docker network ls to see the list of private subnets created by docker compose to isolate the services from the running host. Except for exposed ports, those communications stay internal to that subnet. The stack name is ollama (which is also the directory name in /opt/stacks), ollama_default is the name of the network to inspect using docker network inspect ollama_default. In our setup, the subnet is 172.23.0.0/16 and the ollama container runs on 172.23.0.2/16 while open-webui is on 172.23.0.3/16.

Setup in a separate compose.yaml

This setup will be done on the same host where Ollama is running, and does not require the OLLAMA_HOST variable to be set (i.e. the compose.yaml from the “Ollama” section is sufficient) but requires the use ofhost.docker.internal.
This host is a special DNS name used in Docker environments to allow containers to communicate with the host machine to access services running on the host machine's localhost(i.e., other exposed services). It resolves to the host's internal IP address within the Docker network. The host-gateway option is a reserved string used in Docker configurations to determine the host's IP address dynamically.
To use it, use the host exposed port (here also 11434), not the container port if those differ, and use two extra entries in the compose.yaml file:
environment: - OLLAMA_BASE_URL=http://host.docker.internal:11434 extra_hosts: - host.docker.internal:host-gateway
Integrating those into the open-webui stack’s compose.yaml:
services: open-webui: image: ghcr.io/open-webui/open-webui:cuda container_name: open-webui volumes: - ./open-webui:/app/backend/data - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro ports: - 3030:8080 restart: unless-stopped environment: - OLLAMA_BASE_URL=http://host.docker.internal:11434 extra_hosts: - host.docker.internal:host-gateway deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: - gpu labels: - "com.centurylinklabs.watchtower.enable=true"
Note that we are using an alternate host port: 3030.
With this setup, there will be an open-webui directory in /opt/stacks, but the tool will only work if the ollama container has been started before the open-webui one.

With Ollama behind an HTTPS reverse proxy

With Ollama configured to answer on an HTTPS reverse proxy, such as https://ollama.example.com/ we can bypass the host.docker.internal in favor a direct access to the host. The compose.yaml looks similar to the one in the previous section, but notice the alternate OLLAMA_BASE_URL:
services: open-webui: image: ghcr.io/open-webui/open-webui:cuda container_name: open-webui volumes: - ./open-webui:/app/backend/data - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro ports: - 3030:8080 restart: unless-stopped environment: - OLLAMA_BASE_URL=https://ollama.example.com deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: - gpu labels: - "com.centurylinklabs.watchtower.enable=true"
Specifying the OLLAMA_BASE_URL allows us to run the WebUI on another host than the one running the Ollama service, which, by extension, can also be defined as OLLAMA_BASE_URL=http://<IP>:<PORT> in the absence of a reverse proxy.

Using Open WebUI

Admin Creation: The first account created on Open WebUI gains Administrator privileges, controlling user management and system settings.
User Registrations: Subsequent sign-ups start with Pending status, requiring Administrator approval for access.
Privacy and Data Security: All your data, including login details, is locally stored on your device. Open WebUI ensures strict confidentiality and no external requests for enhanced privacy and security.
After setting up an account (the email address is only used locally), we can investigate some of the tool's features and ask questions about the different models we have downloaded for Ollama.

Enabling WebSearch

notion image
When asking WebSearch in OpenWebui about itself:
"WebSearch" in Open WebUI is a powerful feature that lets you perform searches directly through the interface using various search engines.
Here's what it provides:
  • Integrated Search: No need to leave Open WebUI to use a search engine. Just type your query into the search bar and get results from multiple sources.
  • Multiple Engines: Open WebUI supports various search engines like Google, DuckDuckGo, Bing, and others. You can choose your preferred engine or switch between them easily.
  • Direct Results: WebSearch pulls results directly from the chosen search engine, so you get the same information you'd find on their websites.
  • Relevance and Ranking: Open WebUI aims to present results in a relevant and well-ranked order, making it easier to find what you're looking for.
There are many backend possible for it, their set up can be seen from https://docs.openwebui.com/tutorial/web_search
Most of those can be enabled from the WebUI, as an admin user:
  • Click on your username (bottom left of the WebUI) to access a sub-menu
  • Select the “Admin Panel” option
  • Select the “Settings” tab
  • Select the “WebSearch” option
  • Enable it and configure an “engine”, the “search result count” and its “concurrent requests”
We will not propose to discuss many of the available options are those are an end-user choice, and will note that DuckDuckGo is an excellent privacy conscious option and can be enabled easily from the UI.

Adding a SearXNG stack

SearXNG is a free and open-source metasearch engine (a search engine that searches other search engines); we will install it as it own stack for Dockge (similar to “Setup in a separate compose.yaml” above), using the host.docker.internal:host-gateway method (it is easy to change this to a reverse proxy url when available)
First, from the Dockge UI, “+ Compose” a new stack named searxng and just “Save” it; before using it, we need to populate the directory with a folder and three files that will be obtained from Open WebUI’s SearXNG WebSearch documentation at https://docs.openwebui.com/tutorial/web_search#searxng-docker
# /opt/stacks is not readable by the default user, we need to become root (temporarily) sudo su cd /opt/stacks/searxng mkdir searxng nano searxng/settings.yml # fill in the content of the file from the documentation # feel free to modify the secret_key value nano searxng/limiter.toml # fill in the content of the file from the documentation nano searxng/uwsgi.ini # fill in the content of the file from the documentation
After exiting the root shell, from the Dockge WebUI, “Edit” the searxng stack and use the following for its compose.yaml:
services: searxng: image: searxng/searxng:latest container_name: searxng ports: - 8234:8080 volumes: - ./searxng:/etc/searxng - /etc/timezone:/etc/timezone:ro - /etc/localtime:/etc/localtime:ro restart: always labels: - "com.centurylinklabs.watchtower.enable=true"
After performing a “Save”, it is fine to “Start” the stack. Note that we have modified the exposed port to be 8234.
“Stop” then “Edit” the open-webui stack’s compose.yml:
  • if it is not yet present, add
extra_hosts: - host.docker.internal:host-gateway
  • in the environment: section, add:
- ENABLE_RAG_WEB_SEARCH=true - RAG_WEB_SEARCH_ENGINE=searxng - RAG_WEB_SEARCH_RESULT_COUNT=3 - RAG_WEB_SEARCH_CONCURRENT_REQUESTS=10 - SEARXNG_QUERY_URL=http://host.docker.internal:8234/search?q=<query>
“Save” and “Start” it again.
After going to the Open WebUI page, as an admin, go to the “WebSearch” “Setting” in the “Admin Panel” again and select “searxng” as the “engine” and use http://host.docker.internal:8234/search?q=<query> for the “Query URL”.
After a “Save”, test in a chat by enabling the “Web Search” option.

Further reading

Revision History

  • 20240730-0: Passed timezone and watchtower label to docker compose
  • 20240714-0: Moved to cuda container + added WebSearch and SearXNG content
  • 20240713-0: Added restart policy to Ollama container
  • 20240707-0: Initial release