Tutorials

Advanced RAG Strategies: Reranking and Hybrid Search in Open WebUI

Improve local RAG answer quality by combining keyword search with semantic embeddings and adding a reranking stage in Open WebUI.

Robson PereiraMay 31, 202611 min read

Flow diagram of advanced RAG pipeline with hybrid search and reranking stages.

Advanced RAG Strategies: Reranking and Hybrid Search in Open WebUI

Basic RAG — embed documents, retrieve by vector similarity, feed to a model — works well for straightforward questions. But advanced strategies fix gaps that basic vector search misses.

This guide covers hybrid search, cross-encoder reranking, query expansion, and multi-stage retrieval in Open WebUI.

Why Basic Vector Search Is Not Enough

Vector search excels at finding conceptually related content but struggles with exact phrases, rare terminology, and short queries.

| Scenario | Vector search | Better approach |

|----------|--------------|----------------|

| Exact phrase match | May find conceptually similar | Keyword finds exact matches |

| Rare terminology | Embeddings may miss rare terms | Keyword indexes every term |

| Short queries | Broad vectors miss context | Hybrid combines both |

If you are new to RAG, start with Build a Local RAG Pipeline That Actually Answers Questions.

Strategy 1: Hybrid Search

Enable in Admin Settings: Documents -> Search Method -> Hybrid (BM25 + Vector).

| Document type | BM25 weight | Vector weight |

|--------------|------------|--------------|

| Technical docs | 0.4 | 0.6 |

| Legal contracts | 0.5 | 0.5 |

| General articles | 0.2 | 0.8 |

| Code repos | 0.5 | 0.5 |

Strategy 2: Cross-Encoder Reranking

Reranking re-scores retrieved chunks using a dedicated cross-encoder model.

Enable in Admin Settings: Documents -> Reranking.

|--------|-----------------|---------------|--------------------|

| 10 | Fast | +100-200 ms | Significant |

| 20 | Fast | +300-500 ms | Major |

Strategy 3: Query Expansion

Enable query expansion in Admin Settings -> Documents -> Query Expansion.

Short queries get expanded into multiple search-friendly formulations.

Strategy 4: Multi-Stage Retrieval

1. Broad retrieval: 20-30 chunks at low threshold (0.5)

2. Coarse filtering: remove clearly irrelevant chunks

3. Reranking: pass 10-15 chunks through cross-encoder, keep top 3-5

4. Context assembly: arrange in original document order

Configuration Example

```bash

docker run -d \

-p 3000:8080 \

-e RAG_HYBRID_SEARCH_ENABLED=true \

-e RAG_HYBRID_SEARCH_WEIGHT_BM25=0.3 \

-e RAG_HYBRID_SEARCH_WEIGHT_VECTOR=0.7 \

-e RAG_RERANKING_ENABLED=true \

-e RAG_QUERY_EXPANSION_ENABLED=true \

-v open-webui-data:/app/backend/data \

ghcr.io/open-webui/open-webui:main

```

Measuring Improvements

|--------|--------|-------|-------------|

| Recall@5 | 0.72 | 0.89 | +24% |

| Answer relevance | 3.2 | 4.1 | +28% |

| Hallucination rate | 18% | 8% | -56% |

For reducing hallucinations further, see Stop Hallucinations in Local RAG Systems.

Conclusion

Start with hybrid search — a single toggle in Open WebUI. Add reranking next for a measurable quality boost. Query expansion is a refinement for when you need every percentage point.

FAQ

Does hybrid search work with any embedding model?

Yes. BM25 keyword indexing is independent of the embedding model.

Will reranking slow responses?

Reranking adds 100-500 ms, negligible compared to generation time.

Do I need a separate vector database?

No. Open WebUI handles vector storage internally.

**Sources:**

Advanced RAG Strategies: Reranking and Hybrid Search in Open WebUI

Advanced RAG Strategies: Reranking and Hybrid Search in Open WebUI

Why Basic Vector Search Is Not Enough

Strategy 1: Hybrid Search

Strategy 2: Cross-Encoder Reranking

Strategy 3: Query Expansion

Strategy 4: Multi-Stage Retrieval

Configuration Example

Measuring Improvements

Conclusion

FAQ

Does hybrid search work with any embedding model?

Will reranking slow responses?

Do I need a separate vector database?

Related articles

How to Add Local Documents to Open WebUI with RAG and Ollama

How to Deploy Open WebUI and Ollama on a Private LAN with Docker Compose

How to Build a Self-Hosted AI Workstation with Docker and Multiple Model Runners