May 29, 2026 · 8 min read · WHITEPAPER
Whitepaper
Verifiable Audit of AI Agents — Methodology & Regulatory Defensibility
Most agent “audits” are a PDF you have to trust. This paper describes how to produce one that survives a hostile reviewer instead.
Abstract
An audit is only as good as a skeptic's ability to break it. We describe a three-property construction — completeness, integrity, and independence — that makes an AI-agent audit verifiable by a third party who does not trust the auditor. Findings are Merkle-committed with the count bound into the root; the report is content-hashed into an HMAC-signed proof (kind 30105); the root is anchored to three independent witnesses (Base, Sepolia, Nostr). Findings map across OWASP ASI01–10, NIST AI RMF, ISO 42001, CSA AICM, MAESTRO, and EU MiCA. The assessment runs on the open-source MASSAT framework, so the methodology itself is auditable.
1. The problem statement
AI agents increasingly take consequential actions — moving funds, releasing payments, screening counterparties, producing decision-grade output. The teams deploying them keep logs, but logs are mutable by the party that holds them. When an action is later challenged — in litigation, in a regulatory exam, in a customer's vendor review — the deploying party's own log is exactly the artifact a skeptic discounts. The unmet need is not observability (more telemetry for the operator) but attestation: a record an adversarial outside party will accept.
2. Threat model
We assume the reviewer trusts neither the agent's operator nor the auditor. The audit must therefore defend against three failure modes:
- Suppression — the auditor (or operator) silently drops an unflattering finding.
- Revision — the report is quietly edited after the fact.
- Fabrication / captured infrastructure — the “proof” lives only on systems the interested party controls.
3. The construction
| Property | Mechanism | Defeats |
|---|---|---|
| Completeness | Findings Merkle-committed with the count N bound into the root; salted, keyed-HMAC leaves | Suppression (dropping a finding changes N → root mismatch) |
| Integrity | Report content-hashed into an HMAC-signed ProofOfAuditReport (kind 30105), verifiable offline | Revision (any edit changes the hash, breaking the proof) |
| Independence | Root anchored to Base mainnet + Sepolia + Nostr (ProofOfStateAnchor, kind 30106) | Captured infrastructure (third party confirms root + timestamp on public rails) |
| Authorization | ProofOfDelegation (kind 30014) — HMAC-linked record of who authorized the agent to act | “Who is responsible?” ambiguity |
| Auditability of method | Assessment runs on open-source MASSAT (Apache-2.0) | Proprietary-framework opacity |
4. The audit pipeline
agent execution evidence
→ MASSAT assessment (OWASP ASI01–10, risk score 0–100)
→ compliance map (NIST AI RMF · ISO 42001 · CSA AICM · MAESTRO · MiCA)
→ Merkle commitment of findings (N bound into root)
→ ProofOfAuditReport (kind 30105): content-hash + HMAC signature
→ ProofOfStateAnchor (kind 30106): root → Base + Sepolia + Nostr
→ passport audit_attestation block + verify-it-yourself recipe
5. Regulatory defensibility
The construction was designed against the questions reviewers actually ask. The crosswalk below shows where each control framework's requirement is satisfied. (See also our MASSAT × MiCA worked example.)
| Framework | Requirement it answers | Mechanism |
|---|---|---|
| NIST AI RMF | MEASURE / MANAGE — documented, traceable risk treatment | Framework-mapped findings + remediation roadmap |
| ISO 42001 | Clause-level AI management-system evidence | Per-clause mapping in the compliance map |
| CSA AICM / MAESTRO | Control-domain coverage for agentic systems | OWASP ASI → control-domain translation |
| EU MiCA | Article-level obligations for on-chain financial agents | ASI-finding → MiCA-article crosswalk |
6. Reproducibility
Every claim in this paper is reproducible. The risk score and findings are regenerated by re-running the orchestrator on the same target; the report hash is recomputed from the report; the anchored root is looked up on three public witnesses. The framework is public:
git clone https://github.com/craigmbrown/massat-framework
massat audit --target <agent> --frameworks owasp-asi,nist-ai-rmf
7. Scope & limitations
This methodology attests to what an audit found and that the record is intact; it is not a guarantee of agent behavior in unseen states. On-chain anchoring spends gas and is opt-in per audit. Externally unverifiable observations are labeled in the report and never presented as proven. We do not yet hold a SOC 2 attestation, and we say so rather than imply otherwise.
Audit one of your agents — free
One live agent, a verifiable ProofOfAuditReport, a 5-framework gap map. Reproduce every number yourself.
Read the methodology Get MASSATCompanion to “Who Audits the Agents?”. Mechanics live today: proof kinds 30105 / 30106 / 30014, Merkle completeness commitments, open-source MASSAT. No external client or SOC 2 attestation claimed. Published 2026-05-29.
Operated by Craig M. Brown · Back to blog · Companion case study