Guides

Private AI Operations Dashboard with n8n, Grafana and Local Models

Monitor model performance, workflow health, token usage, and hardware metrics with a fully self-hosted observability stack.

Robson PereiraMay 31, 202614 min read
Grafana dashboard showing AI infrastructure metrics, model performance, and workflow health.

Private AI Operations Dashboard with n8n, Grafana and Local Models

Running private AI in production means you need to know what your models and automations are doing. An operations dashboard gives you visibility into model performance, workflow health, token consumption, and hardware utilisation — all from your own infrastructure.

What to monitor in a private AI stack

| Category | Metrics | Why it matters |

|----------|---------|----------------|

| Model inference | Latency, tokens/sec, requests/min | Catch slowdowns before users notice |

| Hardware | GPU/CPU usage, VRAM, RAM, disk | Plan upgrades and detect bottlenecks |

| Workflows | Success rate, runtime, error count | Know which automations are failing |

| Token usage | Daily/weekly token counts, cost | Track consumption and optimisation |

| Storage | Index size, document count, backups | Prevent storage exhaustion |

For the hardware sizing fundamentals, see Best Hardware for Self-Hosted AI.

Building the stack

Layer 1: Data collection with n8n

Use n8n as the collection layer. Create scheduled workflows that scrape metrics from each component:

```yaml

Workflow: Hardware metrics collector

Schedule: Every 5 minutes

Nodes:

1. Schedule trigger (cron: */5 * * * *)

2. SSH node → run nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv

3. SSH node → run free -m && df -h

4. HTTP node → GET Ollama metrics endpoint

5. Function node → format as InfluxDB line protocol

6. HTTP node → POST to InfluxDB

```

Layer 2: Storage with InfluxDB or SQLite

Store time-series metrics in InfluxDB for performance data. Use a SQLite database for workflow-level events (success, failure, duration). Both run locally and require no external service.

Layer 3: Visualisation with Grafana

Grafana connects to InfluxDB and SQLite. Build dashboards for:

  • **Model overview** — real-time GPU utilisation, request latency, error rate
  • **Workflow health** — success/failure ratio per workflow, runtime trends
  • **Token economy** — daily token consumption by model and workflow
  • **Infrastructure** — disk usage, memory, temperature, uptime

Layer 4: Intelligent alerting with local LLMs

Instead of static thresholds, use a local LLM to analyse metric patterns:

  • **Anomaly detection** — flag unusual latency spikes or error patterns
  • **Trend analysis** — predict when VRAM or disk will reach capacity
  • **Root cause hints** — correlate workflow failures with hardware changes

Set up an n8n workflow that runs every hour: query InfluxDB for recent anomalies, feed the data to your local model, and send a Slack alert if the model flags something actionable.

Example: Model performance monitoring workflow

```bash

Query Ollama for running model statistics

curl -s http://localhost:11434/api/tags | jq '.models[] | {name, size, modifed_at}'

Get real-time inference metrics

curl -s http://localhost:11434/api/ps | jq '.models[] | {name, processor, "memory_usage": .size_vram}'

```

Alert threshold recommendations

| Metric | Warning | Critical | Action |

|--------|---------|----------|--------|

| GPU utilisation | > 90% for 10 min | > 95% for 30 min | Scale or optimise |

| VRAM usage | > 80% | > 92% | Restart or upgrade |

| Inference latency | > 5s | > 15s | Check model or hardware |

| Disk usage | > 75% | > 90% | Clean or expand storage |

| Workflow failure rate | > 5% in 1 hour | > 15% in 1 hour | Investigate immediately |

Building an operations runbook

Document your response procedures for common incidents. Store the runbook as a searchable document in your AnythingLLM knowledge base, linked to from the dashboard for quick access during incidents.

For more on operational readiness, read Operational Playbooks for Running Private AI Like a Service.

Conclusion

A private AI ops dashboard turns your stack from a collection of tools into a managed service. n8n collects the data, InfluxDB stores it, Grafana visualises it, and local LLMs help interpret it — all without sending operational data to a third party.

FAQ

Do I need all four layers?

Start with n8n + Grafana and a simple SQLite backend. Add InfluxDB only when your metric volume demands it.

Can this monitor multiple machines?

Yes. Deploy n8n workers on each host and forward metrics to a central InfluxDB instance.

Is Grafana hard to set up?

Grafana with Docker is straightforward. The time investment is in dashboard design, not installation.

Related articles