News

[GitHub] Breaking: LogicPipe — New Open-Source Framework for Edge Multi-Device LLM Inference Hits 200 Stars

LogicPipe is an open-source Python framework for running collaborative LLM inference across multiple edge devices with pipeline parallelism, DAG scheduling, and KV cache reuse.

Robson PereiraMay 30, 20263 min read

Edge multi-device collaborative LLM inference with LogicPipe.

LogicPipe: Open-Source Framework Brings Pipeline-Parallel LLM Inference to Edge Devices

A new open-source project called LogicPipe has appeared on GitHub, offering a Python-based framework for running collaborative LLM inference across multiple edge devices. The project has already gathered nearly 200 stars since its release yesterday.

What LogicPipe Does

LogicPipe splits a large language model's continuous transformer layers into stages and distributes them across multiple GPUs or edge devices. Each rank loads and executes only its own layer weights, passing activations and generated tokens between stages via `torch.distributed`.

The key innovation is a **DAG-based scheduler** that breaks complex requests into dependency-aware "points." Rather than leaving pipeline stages idle during sequential generation, LogicPipe uses structured outlines to discover parallelism opportunities, reuses contextual KV caches from completed points to accelerate successors, and integrates Medusa/MBSD speculative decoding directly into the pipeline.

Key Features

**Offline pipeline planning** — generates reusable `PartitionPlan` artifacts based on layer count, device count, and compute/communication/memory estimates
**Distributed weight loading** — `model_partition.py` splits full model weights into per-rank `stage.bin` files
**Dependency-aware scheduling** — outlines parsed into DAGs; only points with satisfied dependencies enter the execution queue
**Contextual KV cache reuse** — completed points export caches; successors inject predecessor context
**Quantisation support** — `--load_in_4bit` and `--load_in_8bit` flags for memory-constrained environments

Why It Matters

For the self-hosted AI community, LogicPipe addresses a critical gap: running meaningful LLM inference on edge hardware without needing a single large GPU. By distributing the model across multiple smaller devices, it opens up pipeline parallelism for homelab setups, multi-GPU workstations, and edge clusters.

The project supports Medusa/MBSD decoding for pipeline-level acceleration, long prompt prefill with intra-sequence slicing, and offline planning so you can optimise the partition before running inference.

How to Get Started

LogicPipe requires Python 3.10+ and PyTorch. You install it from the GitHub repo, place model weights in a `model/` directory, and run multiple processes with different rank assignments.

[GitHub] Breaking: LogicPipe — New Open-Source Framework for Edge Multi-Device LLM Inference Hits 200 Stars

LogicPipe: Open-Source Framework Brings Pipeline-Parallel LLM Inference to Edge Devices

What LogicPipe Does

Key Features

Why It Matters

How to Get Started

Source

Related articles

US Government Forces Anthropic to Suspend Fable 5 and Mythos 5 Worldwide — National Security Directive Blocks Non-US Access

[TechCrunch] After Nvidia's $20B Deal, AI Chip Startup Groq Reportedly Raising $650M

[TechCrunch] GitHub Copilot's Token Billing Backlash: What It Means for Self-Hosted AI