Tutorials

Advanced RAG Strategies: Reranking and Hybrid Search in Open WebUI

Improve local RAG answer quality by combining keyword search with semantic embeddings and adding a reranking stage in Open WebUI.

Robson PereiraMay 31, 202611 min read
Flow diagram of advanced RAG pipeline with hybrid search and reranking stages.

Advanced RAG Strategies: Reranking and Hybrid Search in Open WebUI

Basic RAG — embed documents, retrieve by vector similarity, feed to a model — works well for straightforward questions. But advanced strategies fix gaps that basic vector search misses.

This guide covers hybrid search, cross-encoder reranking, query expansion, and multi-stage retrieval in Open WebUI.

Why Basic Vector Search Is Not Enough

Vector search excels at finding conceptually related content but struggles with exact phrases, rare terminology, and short queries.

| Scenario | Vector search | Better approach |

|----------|--------------|----------------|

| Exact phrase match | May find conceptually similar | Keyword finds exact matches |

| Rare terminology | Embeddings may miss rare terms | Keyword indexes every term |

| Short queries | Broad vectors miss context | Hybrid combines both |

If you are new to RAG, start with Build a Local RAG Pipeline That Actually Answers Questions.

Enable in Admin Settings: Documents -> Search Method -> Hybrid (BM25 + Vector).

| Document type | BM25 weight | Vector weight |

|--------------|------------|--------------|

| Technical docs | 0.4 | 0.6 |

| Legal contracts | 0.5 | 0.5 |

| General articles | 0.2 | 0.8 |

| Code repos | 0.5 | 0.5 |

Strategy 2: Cross-Encoder Reranking

Reranking re-scores retrieved chunks using a dedicated cross-encoder model.

Enable in Admin Settings: Documents -> Reranking.

| Chunks | Without reranker | With reranker | Quality improvement |

|--------|-----------------|---------------|--------------------|

| 5 | Fast | Negligible | Moderate |

| 10 | Fast | +100-200 ms | Significant |

| 20 | Fast | +300-500 ms | Major |

Strategy 3: Query Expansion

Enable query expansion in Admin Settings -> Documents -> Query Expansion.

Short queries get expanded into multiple search-friendly formulations.

Strategy 4: Multi-Stage Retrieval

1. Broad retrieval: 20-30 chunks at low threshold (0.5)

2. Coarse filtering: remove clearly irrelevant chunks

3. Reranking: pass 10-15 chunks through cross-encoder, keep top 3-5

4. Context assembly: arrange in original document order

Configuration Example

```bash

docker run -d \

-p 3000:8080 \

-e RAG_HYBRID_SEARCH_ENABLED=true \

-e RAG_HYBRID_SEARCH_WEIGHT_BM25=0.3 \

-e RAG_HYBRID_SEARCH_WEIGHT_VECTOR=0.7 \

-e RAG_RERANKING_ENABLED=true \

-e RAG_QUERY_EXPANSION_ENABLED=true \

-v open-webui-data:/app/backend/data \

ghcr.io/open-webui/open-webui:main

```

Measuring Improvements

| Metric | Before | After | Improvement |

|--------|--------|-------|-------------|

| Recall@5 | 0.72 | 0.89 | +24% |

| Answer relevance | 3.2 | 4.1 | +28% |

| Hallucination rate | 18% | 8% | -56% |

For reducing hallucinations further, see Stop Hallucinations in Local RAG Systems.

Conclusion

Start with hybrid search — a single toggle in Open WebUI. Add reranking next for a measurable quality boost. Query expansion is a refinement for when you need every percentage point.

FAQ

Does hybrid search work with any embedding model?

Yes. BM25 keyword indexing is independent of the embedding model.

Will reranking slow responses?

Reranking adds 100-500 ms, negligible compared to generation time.

Do I need a separate vector database?

No. Open WebUI handles vector storage internally.

**Sources:**

Related articles