News

[ArXiv] Breaking: AgentDoG 1.5

AgentDoG 1.5 is a new lightweight safety framework for AI agents, with taxonomy-guided data, smaller open models, and an online guardrail mode.

Robson PereiraMay 30, 20264 min read

Abstract illustration of an AI agent safety shield and policy layers.

AgentDoG 1.5 is a new open safety stack for AI agents

Overview

**AgentDoG 1.5** is a fresh arXiv release focused on the growing safety problem around open-world AI agents. The paper argues that modern agents can take actions across tools, environments, and workflows fast enough that older safety frameworks are no longer sufficient.

The authors propose a lightweight alignment approach that is designed to be cheaper to train and easier to deploy than heavyweight closed models. The most interesting part for self-hosted AI builders is that the system is meant to work as both a training recipe and an online moderation layer.

Key facts

The framework updates the agent safety taxonomy for newer execution scenarios.
It uses a taxonomy-guided data engine with influence-function purification.
The paper reports small open variants at **0.8B, 2B, 4B, and 8B** parameters.
The training set is tiny by modern standards: **around 1k samples**.
The authors also describe a training-free online guardrail for real-time moderation.

What stands out

The biggest claim is efficiency. Instead of requiring huge datasets and heavy infrastructure, AgentDoG 1.5 aims to deliver strong safety behaviour with a compact pipeline and lower deployment overhead. The paper also claims a dedicated agentic safety SFT and RL environment that reduces Docker-level deployment overhead by two orders of magnitude.

Why it matters

This is relevant far beyond one research paper. Anyone running coding agents, workflow agents, or browser agents locally needs a way to keep those systems from drifting into unsafe actions. If the approach holds up in practice, it could become a useful pattern for self-hosted stacks that need local policy enforcement without sending data to a third party.

For builders already operating local agents, this connects directly to How to Secure a Self-Hosted AI Server, Network Segmentation for AI Homelabs with VLANs and Firewalls, and How to Set Up OpenCode as a Private AI Coding Agent with Local Models.

Details worth watching

1. Smaller models, broader deployment

The paper’s open model sizes suggest the authors are optimising for practical deployment, not just benchmark chasing. That matters if you want a guardrail that can run beside an agent in a local stack.

2. Taxonomy-first safety engineering

Rather than treating safety as a generic refusal classifier, AgentDoG 1.5 tries to map the kinds of failures agents actually face in open-ended environments. That is a better fit for tool-using systems than plain chat moderation.

3. Open release

The paper says the models and datasets are being released openly, which makes it more likely that the broader community can inspect, reproduce, and adapt the method.

Source

**ArXiv:** https://arxiv.org/abs/2605.29801