Tutorials

How to Add Local Documents to Open WebUI with RAG and Ollama

Build a private document chatbot in Open WebUI with Ollama embeddings, local PDFs, and practical RAG tuning tips for better answers.

SelfHostedAI AdminMay 29, 202610 min read

Open WebUI RAG workspace with local documents, search results, and a homelab server in the background

How to Add Local Documents to Open WebUI with RAG and Ollama

If you already run Open WebUI and Ollama, the next big upgrade is turning your chat interface into a private document assistant. With retrieval-augmented generation (RAG), you can ask questions about manuals, runbooks, project notes, PDFs, and markdown files without sending anything to a cloud service.

This guide shows you how to build a practical local-docs workflow in Open WebUI, keep it accurate, and avoid the common mistakes that make RAG feel flaky. If you still need the base stack, start with How to Deploy Open WebUI and Ollama on a Private LAN with Docker Compose and Docker Setup for Local AI Tools. For hardening tips, pair this with How to Secure a Self-Hosted AI Server.

Why use Open WebUI for local document search?

Open WebUI gives you a simple front end for chat, document upload, and knowledge bases. Ollama handles the local model runtime, which means your prompts and your uploaded documents stay inside your own environment.

That matters for three reasons:

You can keep sensitive PDFs, SOPs, and internal notes off third-party SaaS platforms.
You can use the same interface for normal chat and document Q&A.
You can update your knowledge base as your files change, instead of copying and pasting chunks into every prompt.

RAG works best when you treat it like a small information system, not magic. The model still needs clean files, a solid embedding model, and a sensible way to test whether retrieval is actually working.

What you need before you start

You need:

A running Open WebUI instance
A running Ollama instance
At least one chat model, such as `llama3.1:8b` or another local model you trust
One embedding model for document indexing
A small set of documents to test with first

If your machine is memory-constrained, start with a modest chat model and a lightweight embedding model. A good first pass is:

```bash

ollama pull llama3.1:8b

ollama pull nomic-embed-text

```

If you prefer a larger embedding model, you can switch later, but start simple so you can debug the pipeline without extra variables.

Step 1: Make sure Ollama is reachable

Open WebUI needs to talk to Ollama over HTTP. Before you touch the UI, verify the Ollama API is alive and the models are installed.

```bash

curl http://localhost:11434/api/tags

```

You should see the models you pulled. If you run Ollama in Docker, the hostname may be `ollama` instead of `localhost` from inside the Open WebUI container.

You can also test embeddings directly:

```bash

curl http://localhost:11434/api/embeddings -H 'Content-Type: application/json' -d '{"model":"nomic-embed-text","prompt":"Open WebUI can index local documents."}'

```

If that request fails, fix Ollama first. RAG only works when the embedding endpoint works reliably.

Step 2: Create a knowledge base in Open WebUI

Open WebUI’s document workflow is usually centered on a knowledge or workspace area where you can create a collection and upload files.

Use a clean naming scheme so collections stay understandable over time. Good examples:

`engineering-runbooks`
`home-lab-docs`
`client-project-alpha`
`hr-policies`

For your first test, keep the set small. Upload three to five documents that are easy to verify, such as:

a short markdown file with known facts
one PDF manual
one text export from a notebook or wiki
a runbook with step numbers and commands

That makes it obvious whether retrieval is working because you already know the answers.

Step 3: Prepare documents for better retrieval

RAG quality depends heavily on the source material. If your documents are messy, the answers will be messy too.

Use clean text when possible

Markdown, plain text, and well-structured notes usually outperform giant unprocessed PDFs. Before uploading, convert whatever you can into a cleaner format.

Useful tools:

```bash

pandoc handbook.docx -t markdown -o handbook.md

pdftotext manual.pdf manual.txt

```

If you have scanned PDFs, OCR them first:

```bash

ocrmypdf --deskew --clean input-scan.pdf output-ocr.pdf

```

Split huge documents

A 300-page PDF is not ideal as a first RAG source. If possible, split large books, vendor manuals, and policy bundles into smaller topic-specific files.

Instead of one giant `ops-manual.pdf`, try:

`backups.md`
`networking.md`
`troubleshooting.md`
`restore-procedures.md`

Smaller files make it easier for retrieval to land on the right passage.

Remove junk before upload

Before indexing, delete:

repeated headers and footers
page numbers
duplicated tables of contents
irrelevant screenshots with tiny text
outdated drafts

The cleaner the input, the more useful the retrieved context.

Step 4: Upload documents and test with a simple question

After creating the collection, upload your files and ask a question with a very obvious answer.

Good first questions look like this:

“What is the backup retention policy in this document?”
“Which port does the service use?”
“List the shutdown steps in order.”
“What is the default username mentioned in the manual?”

You are testing retrieval, not the model’s creativity. Start with questions where the answer should be found in one or two passages.

If the answer is wrong, ask yourself:

1. Was the fact actually in the document?

2. Did the document upload finish successfully?

3. Is the embedding model working?

4. Is the collection too broad?

Step 5: Ask better RAG questions

The quality of your question matters almost as much as the quality of your files.

A weak prompt:

> Tell me about backups.

A better prompt:

> In the uploaded backup runbook, what is the nightly backup schedule, where are the backups stored, and what is the restore verification step?

The second prompt gives the retriever a much clearer target. It also reduces the chance that the model answers from general knowledge instead of your documents.

A useful pattern is:

```text

Answer only from the uploaded documents.

If the documents do not contain the answer, say so clearly.

Quote the relevant section before summarizing it.

```

That framing nudges the model to stay grounded.

Step 6: Use a sensible Docker Compose baseline

If you want a repeatable setup, keep Open WebUI and Ollama on the same Docker network with persistent volumes.

```yaml

services:

ollama:

image: ollama/ollama:latest

container_name: ollama

restart: unless-stopped

ports:

"11434:11434"

volumes:

ollama:/root/.ollama

open-webui:

image: ghcr.io/open-webui/open-webui:main

container_name: open-webui

restart: unless-stopped

depends_on:

ollama

ports:

"3000:8080"

environment:

OLLAMA_BASE_URL=http://ollama:11434
WEBUI_SECRET_KEY=change-me

volumes:

open-webui:/app/backend/data

volumes:

ollama:

open-webui:

```

A few notes:

Keep the data volumes persistent, or your documents and chat history will disappear after a restart.
Put the containers on a private LAN or behind a reverse proxy if you expose the service.
Set a strong secret key and apply your normal homelab security posture.

If you already have a working stack, you probably only need to confirm the `OLLAMA_BASE_URL` value and make sure the data volume is durable.

Step 7: Keep your knowledge base accurate over time

RAG gets stale if nobody maintains it. Treat your documents like code.

A good maintenance routine looks like this:

delete or archive outdated PDFs
version your runbooks
re-upload only the documents that changed
use one collection per topic or team when possible
test the most important questions after every update

For example, if your backup policy changes, update the `backups.md` file and re-index that document before you trust the answers again.

Troubleshooting Open WebUI RAG

The embedding model does not appear

Pull it manually in Ollama first:

```bash

ollama pull nomic-embed-text

```

Then restart Open WebUI and try the upload again.

Answers ignore the documents

Usually this means one of three things:

the collection is empty
retrieval returned poor chunks
the prompt is too broad

Try a more specific question and a smaller document set.

PDFs produce bad answers

That usually happens when the PDF is a scan, has tiny text, or contains too much noise. Run OCR, convert to text, or split the file before uploading.

The answer is partially correct but missing details

Ask for the exact section, not just the summary:

> Show me the paragraph that mentions the restore location, then explain it in one sentence.

That often improves retrieval and reduces hallucination.

My collection is growing too large

Create separate knowledge bases for separate purposes. A single bucket for every document is convenient at first, but it becomes harder to retrieve the right source later.

A simple test plan you can reuse

Whenever you add a new document collection, run this quick check:

1. Ask one question with a known answer.

2. Ask one question that should not be in the docs.

3. Ask one multi-part question that requires two facts from different sections.

4. Compare the model’s answer with the source file.

If the first answer is wrong, fix ingestion. If the second answer is confidently invented, tighten your prompt. If the third answer misses one fact, shorten the documents or split them more aggressively.

FAQ

Is Open WebUI RAG fully private?

It can be fully private if your Open WebUI instance, Ollama server, and document storage stay inside your own infrastructure. Avoid public exposure unless you know exactly how your authentication and reverse proxy are configured.

Which embedding model should I use first?

Start with a lightweight local model such as `nomic-embed-text`. It is easy to pull, easy to test, and good enough for many homelab and small-team document workflows.

Can I use PDFs, DOCX, and markdown together?

Yes, but you will usually get the best results from clean markdown or text. PDFs work too, especially if they are text-based rather than scanned images.

How big should my knowledge base be?

Start small. A focused collection of a few documents is easier to test and easier to trust than a giant mixed library.

Do I need a vector database?

Open WebUI handles the document retrieval workflow for you, so most home lab users do not need to manage a separate vector database just to get started.

Final thoughts

Open WebUI becomes much more useful once you connect it to a carefully prepared set of local documents. The winning formula is simple: use a working Ollama embedding model, keep your source files clean, build small collections, and test with questions you can verify manually.

Once that foundation is solid, you can expand into backups, runbooks, internal wikis, and project docs without handing your knowledge base to a cloud platform.

How to Add Local Documents to Open WebUI with RAG and Ollama

How to Add Local Documents to Open WebUI with RAG and Ollama

Why use Open WebUI for local document search?

What you need before you start

Step 1: Make sure Ollama is reachable

Step 2: Create a knowledge base in Open WebUI

Step 3: Prepare documents for better retrieval

Use clean text when possible

Split huge documents

Remove junk before upload

Step 4: Upload documents and test with a simple question

Step 5: Ask better RAG questions

Step 6: Use a sensible Docker Compose baseline

Step 7: Keep your knowledge base accurate over time

Troubleshooting Open WebUI RAG

The embedding model does not appear

Answers ignore the documents

PDFs produce bad answers

The answer is partially correct but missing details

My collection is growing too large

A simple test plan you can reuse

FAQ

Is Open WebUI RAG fully private?

Which embedding model should I use first?

Can I use PDFs, DOCX, and markdown together?

How big should my knowledge base be?

Do I need a vector database?

Final thoughts

Related articles

How to Deploy Open WebUI and Ollama on a Private LAN with Docker Compose

How to Build a Self-Hosted AI Workstation with Docker and Multiple Model Runners

Docker Compose for Self-Hosted AI: Ollama, Open WebUI, and AnythingLLM Together