Models Articles | The Crazy Alpaca

Connected prompt chain workflow for multi-step local AI processing.

Models

Prompt Chaining: Connect Multiple Prompts for Better Local AI Results

Chain short, focused prompts together to produce better results than one giant instruction with local models.

May 31, 2026 - 7 min read

Three open-weight AI model families compared for local deployment.

Models

Mistral vs Llama vs Qwen: Choosing the Best Open-Weight Model Family

Compare Mistral, Llama, and Qwen model families across performance, hardware fit, ecosystem support, and practical use cases.

May 31, 2026 - 10 min read

DeepSeek R1 reasoning model running locally on a self-hosted AI server.

Models

DeepSeek R1 Local Setup Guide: Run a Reasoning Model on Your Own Hardware

Install DeepSeek R1 locally, configure quantised variants for consumer GPUs, and build a private reasoning workflow that keeps data off third-party servers.

May 31, 2026 - 10 min read

Qwen 2.5 local AI model dashboard with multiple size variants.

Models

Qwen 2.5 Local Setup Guide: Alibaba's Versatile Open-Weight Model Family

Install and run Qwen 2.5 models locally with Ollama or vLLM, compare size variants, and deploy them for chat, coding, and multilingual tasks.

May 31, 2026 - 9 min read

Phi-4 compact local model running on a small self-hosted server.

Models

Phi-4: How Microsoft's Compact Model Changes Local AI Deployment

Run Phi-4 locally on modest hardware, understand why its small size punches above its weight, and integrate it into practical workflows.

May 31, 2026 - 8 min read

Google Gemma 3 running locally on a private self-hosted AI server.

Models

Gemma 3 Local Setup: Run Google's Open-Weight Models on Your Own Hardware

Install Gemma 3 locally with Ollama or Hugging Face, compare sizes, and build privacy-first workflows on Google's efficient open-weight architecture.

May 31, 2026 - 9 min read

GGUF quantisation levels compared on GPU hardware for local AI inference.

Models

GGUF Quantisation Guide: Choosing the Right Format for Your Local LLM

Understand GGUF quantisation levels, choose Q2 through Q8 for your hardware, and balance quality against VRAM usage for every local model.

May 31, 2026 - 10 min read

Models

Best Embedding Models for Local RAG Systems in 2026

Compare embedding models for local retrieval-augmented generation, from BGE to E5 to Nomic Embed, and choose the right one for your document pipeline.

May 31, 2026 - 9 min read

Visual comparison of embedding vector spaces for different document domains.

Models

Optimising Embedding Models for Domain-Specific Document Retrieval

Match embedding models to your document domain — code, medical, legal, or technical — for significantly better local RAG retrieval quality.

May 31, 2026 - 10 min read

Liquid AI LFM2.5-8B-A1B model card artwork.

Models

[News] Liquid AI Releases LFM2.5-8B-A1B: On-Device MoE with 128K Context

Liquid AI's new LFM2.5-8B-A1B packs 8B total parameters (1B active) with a 128K context window, trained on 38 trillion tokens, and runs on llama.cpp, MLX, vLLM and SGLang from day one.

May 31, 2026 - 4 min read

Liquid AI LFM2.5-8B-A1B benchmark results on consumer hardware

Models

Liquid AI LFM2.5-8B-A1B: The 1B-Active-Parameter MoE Model That Runs on Consumer Hardware

Liquid AI released LFM2.5-8B-A1B, a Mixture-of-Experts model with only 1B active parameters per token, a 128K context window, and day-one support for llama.cpp — making it one of the most efficient models for local inference.

May 30, 2026 - 9 min read

Liquid AI LFM2.5-8B-A1B MoE model architecture running on local hardware.

Models

How to Run Liquid AI LFM2.5-8B-A1B Locally: A New MoE Model for Consumer Hardware

Liquid AI's LFM2.5-8B-A1B is an 8B-parameter MoE model with only 1B active parameters, trained on 38T tokens with 128K context — and it runs on consumer hardware via llama.cpp and GGUF.

May 30, 2026 - 10 min read

Robot arm with AI vision processing overlay representing Qwen-VLA embodied AI.

Models

[ArXiv] Qwen-VLA: Unified Vision-Language-Action Model for Robotics

Alibaba's Qwen team releases Qwen-VLA, an embodied foundation model that unifies vision, language, and continuous action generation across diverse robot platforms.

May 30, 2026 - 3 min read

Two-model local AI workflow with speed and reasoning layers.

Models

Build a Two-Model Workflow with a Fast Model and a Reasoning Model

Combine a small fast model and a stronger reasoning model to balance speed, cost, and quality.

May 29, 2026 - 10 min read

Models

How to Choose the Right Local Model Size

Match model size to your hardware, latency target, and task before you chase benchmark hype.

May 27, 2026 - 8 min read

Visual guide to local AI quantisation levels and model compression.

Models

Quantisation Levels Explained for Real-World Local AI

Understand what 4-bit, 5-bit, and 8-bit quantisation actually mean for speed, quality, and memory.

May 26, 2026 - 9 min read

Embedding vectors flowing into a local document search system.

Models

Choosing the Best Embedding Model for Local Search

Compare embedding models for retrieval, semantic search, and document clustering on local hardware.

May 24, 2026 - 8 min read

Local AI prompt engineering workflow on a clean desktop.

Models

Prompt Engineering for Local AI That Produces Better Answers

Use clearer instructions, better context, and repeatable prompt patterns to improve local model output.

May 24, 2026 - 8 min read

Abstract local AI model library for beginners.

Models

Best Local AI Models for Beginners

A beginner-friendly map of local model types, sizes, and practical first choices.

May 13, 2026 - 8 min read