Research

Empirical work on agent firewalls, Claude Code's hook protocol, and where deterministic policy gates outperform model-side safety training.

2026-04-28 · 8 min read

Hooking into Claude Code: 39 paired runs and the 80/20 refusal asymmetry

Built a PreToolUse hook firewall for Claude Code, ran 39 paired chaos prompts. Without the hook: 8 critical incidents. With: 0. The headline finding: Claude refuses ~80% of destructive shell commands the user types, ~20% of read-shaped exfil, ~0% of cases where the agent invents the destructive action itself. Plus two latent Windows bugs the chaos sweep surfaced.

2026-04-28 · 3 min read · follow-up

Enact blocked Claude from editing its own gitignore — that's the feature

Live evidence from the same session: Enact's PreToolUse hook blocked Claude from editing its own .gitignore. Twice — once via the Edit tool, once via Bash redirection. The block was correct in principle (the policy stops agents from leaking secrets via gitignore edits) and inconvenient in practice (the agent's intent was benign). That tradeoff is exactly the architecture defended in the longer post.

In the queue

· Sandbox-friction enrichment + the misinterpretation 5: round 2 of the methodology
· The fabrication detector — diffing Claude's narrative against the receipt log
· WebFetch URL policies + DNS-exfil coverage
· Cursor MCP integration: same engine, second IDE