News
[GitHub] Breaking: LogicPipe — New Open-Source Framework for Edge Multi-Device LLM Inference Hits 200 Stars
LogicPipe is an open-source Python framework for running collaborative LLM inference across multiple edge devices with pipeline parallelism, DAG scheduling, and KV cache reuse.

LogicPipe: Open-Source Framework Brings Pipeline-Parallel LLM Inference to Edge Devices
A new open-source project called LogicPipe has appeared on GitHub, offering a Python-based framework for running collaborative LLM inference across multiple edge devices. The project has already gathered nearly 200 stars since its release yesterday.
What LogicPipe Does
LogicPipe splits a large language model's continuous transformer layers into stages and distributes them across multiple GPUs or edge devices. Each rank loads and executes only its own layer weights, passing activations and generated tokens between stages via `torch.distributed`.
The key innovation is a **DAG-based scheduler** that breaks complex requests into dependency-aware "points." Rather than leaving pipeline stages idle during sequential generation, LogicPipe uses structured outlines to discover parallelism opportunities, reuses contextual KV caches from completed points to accelerate successors, and integrates Medusa/MBSD speculative decoding directly into the pipeline.
Key Features
- **Offline pipeline planning** — generates reusable `PartitionPlan` artifacts based on layer count, device count, and compute/communication/memory estimates
- **Distributed weight loading** — `model_partition.py` splits full model weights into per-rank `stage.bin` files
- **Dependency-aware scheduling** — outlines parsed into DAGs; only points with satisfied dependencies enter the execution queue
- **Contextual KV cache reuse** — completed points export caches; successors inject predecessor context
- **Quantisation support** — `--load_in_4bit` and `--load_in_8bit` flags for memory-constrained environments
Why It Matters
For the self-hosted AI community, LogicPipe addresses a critical gap: running meaningful LLM inference on edge hardware without needing a single large GPU. By distributing the model across multiple smaller devices, it opens up pipeline parallelism for homelab setups, multi-GPU workstations, and edge clusters.
The project supports Medusa/MBSD decoding for pipeline-level acceleration, long prompt prefill with intra-sequence slicing, and offline planning so you can optimise the partition before running inference.
How to Get Started
LogicPipe requires Python 3.10+ and PyTorch. You install it from the GitHub repo, place model weights in a `model/` directory, and run multiple processes with different rank assignments.
Source
Visit the repository: GitHub — fxyz666/LogicPipe

