Tools
[GitHub] multi-llm-mcp: Bridge Claude Code and Codex with One MCP Server
A new open-source MCP server lets Claude Code call OpenAI Codex as a subagent — or route tasks across GPT, Kimi, DeepSeek, and Qwen — all from a single config.

[GitHub] multi-llm-mcp: Bridge Claude Code and Codex with One MCP Server
A new open-source project called **multi-llm-mcp** is gaining traction on GitHub — 32 stars in two days — as a Model Context Protocol (MCP) server that bridges Claude Code with OpenAI Codex, GPT, Kimi, DeepSeek, Qwen, and other models.
Created by developer **mai-yyy**, the tool lets you configure multiple LLM providers behind a single MCP interface, effectively turning Claude Code into a multi-model orchestration platform.
What it does
multi-llm-mcp is an MCP server built with FastMCP in Python that exposes each configured model as a named tool within the Model Context Protocol. Once set up, you can send tasks from your primary agent (Claude Code) to any supported model:
- **Claude Code → Codex** — Delegate a coding task to OpenAI's Codex agent while staying in Claude's interface
- **Claude Code → Kimi** — Route a research question to MoonshotAI's Kimi model
- **Claude Code → DeepSeek** — Use DeepSeek for specialised reasoning tasks
- **Claude Code → Qwen** — Leverage Alibaba's Qwen models for Chinese-language or multimodal tasks
- **Claude Code → GPT** — Fall back to OpenAI's GPT models when Claude's context is running low
The project works with any model that exposes an OpenAI-compatible API endpoint — which includes virtually every major LLM provider and local runtimes like Ollama, vLLM, and LiteLLM.
Why this matters for the self-hosted AI community
This tool addresses a pain point that has grown as the coding agent ecosystem has fragmented:
1. **Vendor lock-in is optional.** If Claude Code cannot solve a problem, you can route it to Codex or a local model without leaving your session. This is the multi-provider flexibility that defines the self-hosted approach.
2. **Model routing without infrastructure.** Instead of building a complex routing layer with LiteLLM or a custom proxy, multi-llm-mcp gives you model selection as an in-session tool call. Your primary agent decides which model to call based on the task.
3. **Works with local models.** Because it supports any OpenAI-compatible endpoint, you can point it at a locally running Ollama or vLLM instance for completely offline operation — keeping every query on your own hardware.
How to get started
multi-llm-mcp is available on GitHub:
```bash
git clone https://github.com/mai-yyy/multi-llm-mcp
cd multi-llm-mcp
pip install -r requirements.txt
```
Configure your API keys in a .env file:
```
OPENAI_API_KEY=sk-...
KIMI_API_KEY=...
DEEPSEEK_API_KEY=...
QWEN_API_KEY=...
```
Then add the server to your Claude Code MCP config and start routing tasks across models.
For more on running AI coding agents locally, see our guides on setting up OpenCode as a private coding agent and designing a two-tier AI stack.
**Sources:**
