Tools

[GitHub] multi-llm-mcp: Bridge Claude Code and Codex with One MCP Server

A new open-source MCP server lets Claude Code call OpenAI Codex as a subagent — or route tasks across GPT, Kimi, DeepSeek, and Qwen — all from a single config.

Robson PereiraMay 31, 20264 min read
Diagram showing multiple LLMs connected through an MCP server bridge.

[GitHub] multi-llm-mcp: Bridge Claude Code and Codex with One MCP Server

A new open-source project called **multi-llm-mcp** is gaining traction on GitHub — 32 stars in two days — as a Model Context Protocol (MCP) server that bridges Claude Code with OpenAI Codex, GPT, Kimi, DeepSeek, Qwen, and other models.

Created by developer **mai-yyy**, the tool lets you configure multiple LLM providers behind a single MCP interface, effectively turning Claude Code into a multi-model orchestration platform.

What it does

multi-llm-mcp is an MCP server built with FastMCP in Python that exposes each configured model as a named tool within the Model Context Protocol. Once set up, you can send tasks from your primary agent (Claude Code) to any supported model:

  • **Claude Code → Codex** — Delegate a coding task to OpenAI's Codex agent while staying in Claude's interface
  • **Claude Code → Kimi** — Route a research question to MoonshotAI's Kimi model
  • **Claude Code → DeepSeek** — Use DeepSeek for specialised reasoning tasks
  • **Claude Code → Qwen** — Leverage Alibaba's Qwen models for Chinese-language or multimodal tasks
  • **Claude Code → GPT** — Fall back to OpenAI's GPT models when Claude's context is running low

The project works with any model that exposes an OpenAI-compatible API endpoint — which includes virtually every major LLM provider and local runtimes like Ollama, vLLM, and LiteLLM.

Why this matters for the self-hosted AI community

This tool addresses a pain point that has grown as the coding agent ecosystem has fragmented:

1. **Vendor lock-in is optional.** If Claude Code cannot solve a problem, you can route it to Codex or a local model without leaving your session. This is the multi-provider flexibility that defines the self-hosted approach.

2. **Model routing without infrastructure.** Instead of building a complex routing layer with LiteLLM or a custom proxy, multi-llm-mcp gives you model selection as an in-session tool call. Your primary agent decides which model to call based on the task.

3. **Works with local models.** Because it supports any OpenAI-compatible endpoint, you can point it at a locally running Ollama or vLLM instance for completely offline operation — keeping every query on your own hardware.

How to get started

multi-llm-mcp is available on GitHub:

```bash

git clone https://github.com/mai-yyy/multi-llm-mcp

cd multi-llm-mcp

pip install -r requirements.txt

```

Configure your API keys in a .env file:

```

OPENAI_API_KEY=sk-...

KIMI_API_KEY=...

DEEPSEEK_API_KEY=...

QWEN_API_KEY=...

```

Then add the server to your Claude Code MCP config and start routing tasks across models.

For more on running AI coding agents locally, see our guides on setting up OpenCode as a private coding agent and designing a two-tier AI stack.

**Sources:**

Related articles