That Agent Demo Won't Survive Compliance

That agent demo that got applause at your last all-hands?

It would not survive a compliance review.

I build agentic AI platforms in regulated financial services. Here’s the uncomfortable truth I’ve learned: getting the agent to work was never the hard part. The hard part was getting compliance, risk, and audit to let it touch real customer workflows.

Before an agent ships to regulated production, it needs to pass all ten of these:

Audit trail — every LLM call and tool call logged and replayable, not just app logs
Provenance — every claim in the output traceable to a source
Per-claim confidence — calibrated scores as an output guardrail, not vibes
Human approval gates — before any consequential action, a human signs off
Safe action boundaries — an explicit allowlist of what the agent can touch
Rollback path — every write action must be reversible
Cost and token ceilings — hard stops, because runaway loops are real
Ground-truth evals — run against a golden dataset before every release
Defined failure behavior — what the agent does when it doesn’t know
Named ownership — one accountable human for the agent’s output

Most demos I’ve reviewed score 2 out of 10.

Regulated production requires 10 out of 10.

The gap between those two numbers is where agentic AI projects go to die — and where the most valuable engineering work is happening right now.

If you’re building agents in a regulated environment: which of these ten is the hardest at your org?

This is the first in a series on making agentic AI survive regulated production. Next up: per-claim confidence, the output guardrail almost nobody implements.