Guides
Private AI Operations Dashboard with n8n, Grafana and Local Models
Monitor model performance, workflow health, token usage, and hardware metrics with a fully self-hosted observability stack.

Private AI Operations Dashboard with n8n, Grafana and Local Models
Running private AI in production means you need to know what your models and automations are doing. An operations dashboard gives you visibility into model performance, workflow health, token consumption, and hardware utilisation — all from your own infrastructure.
What to monitor in a private AI stack
| Category | Metrics | Why it matters |
|----------|---------|----------------|
| Model inference | Latency, tokens/sec, requests/min | Catch slowdowns before users notice |
| Hardware | GPU/CPU usage, VRAM, RAM, disk | Plan upgrades and detect bottlenecks |
| Workflows | Success rate, runtime, error count | Know which automations are failing |
| Token usage | Daily/weekly token counts, cost | Track consumption and optimisation |
| Storage | Index size, document count, backups | Prevent storage exhaustion |
For the hardware sizing fundamentals, see Best Hardware for Self-Hosted AI.
Building the stack
Layer 1: Data collection with n8n
Use n8n as the collection layer. Create scheduled workflows that scrape metrics from each component:
```yaml
Workflow: Hardware metrics collector
Schedule: Every 5 minutes
Nodes:
1. Schedule trigger (cron: */5 * * * *)
2. SSH node → run nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv
3. SSH node → run free -m && df -h
4. HTTP node → GET Ollama metrics endpoint
5. Function node → format as InfluxDB line protocol
6. HTTP node → POST to InfluxDB
```
Layer 2: Storage with InfluxDB or SQLite
Store time-series metrics in InfluxDB for performance data. Use a SQLite database for workflow-level events (success, failure, duration). Both run locally and require no external service.
Layer 3: Visualisation with Grafana
Grafana connects to InfluxDB and SQLite. Build dashboards for:
- **Model overview** — real-time GPU utilisation, request latency, error rate
- **Workflow health** — success/failure ratio per workflow, runtime trends
- **Token economy** — daily token consumption by model and workflow
- **Infrastructure** — disk usage, memory, temperature, uptime
Layer 4: Intelligent alerting with local LLMs
Instead of static thresholds, use a local LLM to analyse metric patterns:
- **Anomaly detection** — flag unusual latency spikes or error patterns
- **Trend analysis** — predict when VRAM or disk will reach capacity
- **Root cause hints** — correlate workflow failures with hardware changes
Set up an n8n workflow that runs every hour: query InfluxDB for recent anomalies, feed the data to your local model, and send a Slack alert if the model flags something actionable.
Example: Model performance monitoring workflow
```bash
Query Ollama for running model statistics
curl -s http://localhost:11434/api/tags | jq '.models[] | {name, size, modifed_at}'
Get real-time inference metrics
curl -s http://localhost:11434/api/ps | jq '.models[] | {name, processor, "memory_usage": .size_vram}'
```
Alert threshold recommendations
| Metric | Warning | Critical | Action |
|--------|---------|----------|--------|
| GPU utilisation | > 90% for 10 min | > 95% for 30 min | Scale or optimise |
| VRAM usage | > 80% | > 92% | Restart or upgrade |
| Inference latency | > 5s | > 15s | Check model or hardware |
| Disk usage | > 75% | > 90% | Clean or expand storage |
| Workflow failure rate | > 5% in 1 hour | > 15% in 1 hour | Investigate immediately |
Building an operations runbook
Document your response procedures for common incidents. Store the runbook as a searchable document in your AnythingLLM knowledge base, linked to from the dashboard for quick access during incidents.
For more on operational readiness, read Operational Playbooks for Running Private AI Like a Service.
Conclusion
A private AI ops dashboard turns your stack from a collection of tools into a managed service. n8n collects the data, InfluxDB stores it, Grafana visualises it, and local LLMs help interpret it — all without sending operational data to a third party.
FAQ
Do I need all four layers?
Start with n8n + Grafana and a simple SQLite backend. Add InfluxDB only when your metric volume demands it.
Can this monitor multiple machines?
Yes. Deploy n8n workers on each host and forward metrics to a central InfluxDB instance.
Is Grafana hard to set up?
Grafana with Docker is straightforward. The time investment is in dashboard design, not installation.


