Reliability Manifesto · RQ-211 · 2026-05-13

Reliability is a receipt, not a promise.

Every other AI shop says "we're reliable." We sign it — HMAC, hash-chained, continuously graded, publicly auditable.

99.x%
of delegations ACK'd ≤60s
xx,xxx
signed delegation proofs on file
60 / 60
BLP properties graded continuously
<1%
ship-or-no-credit miss rate (30d)
Book a 15-min reliability walkthrough Try the API

Section 1 — RulesThe four rules. No exceptions.

Reliability is a system, not a slogan. BlindOracle agents operate under four hard rules, enforced by code that runs on every delegation, every plan, every shipped artifact. The rules are simple. The proofs are signed.

Rule 1 — 60-second ACK

Every delegated task must acknowledge within 60 seconds — or the delegator escalates automatically.

Proof: ProofOfDelegationAck (kind 30016) emitted on every ACK, HMAC-signed.
Enforced by: RQ-206 ACK rail polls data/delegation_acks.jsonl and escalates on T+60s miss.

Rule 2 — Ship-or-no-credit

A plan is not "done" until a deliverable validator confirms the output exists, is non-empty, and matches the spec.

Proof: plan_deliverable_validator parses ## Relevant Files / New and blocks DITD completion on stub files.
Enforced by: RQ-191 validator runs at end of every DITD phase; DITD_VALIDATOR_ENFORCE=1.

Rule 3 — Outcome-owner

Every task has one named owner. Every owner has a signed passport. Every signature chains back to the operator.

Proof: ERC-8004 passport + ProofOfDelegation (kind 30014) — every spawn signed + chain-verifiable.
Enforced by: pre_tool_use.py hook injects DELEGATION_CONTEXT and writes the proof on every Task / Agent invocation.

Rule 4 — 72-hour scope-cut

If a plan can't ship in 72 hours, it gets cut or killed. No silent drift, no perpetual WIP.

Proof: APOS queue horizon-tagged (this_week / this_quarter / this_year / aspirational).
Enforced by: RQ-209 horizon validator (plan_horizon_validator.py) rejects unhorizoned plans at queue-add.

Section 2 — ProofHow a claim becomes a receipt.

Talk is cheap; signed bytes are not. BlindOracle's reliability claims are backed by an append-only proof log, HMAC-signed at the boundary, hash-chained for integrity, and verifiable by any third party with our public key. Here is what each rule looks like on the wire.

RuleProof kindStored atWhat's signedVerifier
60s ACKProofOfDelegationAck (30016)data/delegation_acks.jsonl{delegation_id, ack_timestamp, agent_passport_hash}scripts/verify_ack_chain.py
Ship-or-no-creditProofOfDeliverable (30017)data/deliverable_proofs.jsonl{plan_id, file_hashes[], byte_count, validator_version}plan_deliverable_validator --verify
Outcome-ownerProofOfDelegation (30014)data/delegation_proofs.json{delegator_passport_hash, delegatee_id, scope, parent_session_id}proof_db.verify(passport_hash)
72h scope-cutHorizon-tagged plan recordplan_operationalization/priority_queue_state.json{plan_id, horizon, created, due_by}plan_horizon_validator.py --check-all
{
  "kind": 30016,
  "delegation_id": "del_2026_05_13_<sha>",
  "ack_at": "2026-05-13T14:22:08.412Z",
  "elapsed_ms": 1834,
  "agent_passport_hash": "0x<...>",
  "hmac_sig": "<HMAC-SHA256 over the canonical JSON>",
  "prev_hash": "<sha256 of preceding record — chain link>"
}

Every proof links to the previous one via prev_hash. Tampering with any record breaks every record after it. We publish the chain head daily.

Section 3 — AuditWe grade ourselves, every 15 minutes.

The proof log says what happened. The grader says whether it was any good. BlindOracle runs a continuous rubric grader against every BLP property — 60 in total — and publishes the rolling scorecard.

BLP Coverage

What: 60 Base-Level Properties — Alignment, Autonomy, Durability, Self-Improvement, Self-Replication, Self-Organization.

How: blp-rubric-grader-agent (RQ-199) runs */15 * via cron. Warn-only mode active since 2026-05-12.

Public: /api/fleet-stats.json exposes blp_score_global + per-category trend.

Quality Gate

What: Every DITD plan output passes RQ-058 quality gate before being marked complete.

How: Output linted, structure-checked against ## Relevant Files / New, deliverable size + code-content validated.

Public: rejected-plan rate published in /api/fleet-stats.json.

Damage Control

What: L4 sandbox surface (RQ-200) restricts orchestrator capabilities. 6 prompt-injection vectors blocked in prod since 2026-05-12.

How: Pre-tool-use hook + content-trap scanner (RQ-173) on all external ingest.

Public: monthly red-team summary, published on this page.

Last 30 days, rolling — fetched live from /api/fleet-stats.json

  • Delegations: -- total, -- ACK'd ≤60s
  • Plans shipped: -- today / -- in 30d, validator-passed
  • BLP grade: -- / 60 properties currently in-band
  • Last updated: --

Section 4 — CompareReliability theatre vs. reliability receipts.

Most AI shops sell vibes: "trusted by", "production-grade", "enterprise-ready." Ask for the proof. Here's the difference.

CapabilityTypical AI shopBlindOracle
Task ACK time"fast" (unmeasured)60s hard rule, signed proof per ACK
Output verification"we have tests"Deliverable validator + signed proof, blocks merge on stub
Agent identityAPI key or noneERC-8004 passport, signed delegation chain to root operator
Drift controlQuarterly retroHorizon-tagged backlog, 72h cut rule, weekly operator brief
Adversarial defense"we follow OWASP"MASSAT 4.3/10 audit, RQ-173 trap defense, L4 sandbox in prod
Cost disciplineOpaque markup79–83% LLM cost reduction documented, multi-provider routing public
Public audit logNoneHash-chained proof log, verifier scripts in repo

"Show me the receipt" is the only reliability question that matters. Ours is signed.

Section 5 — CTATry the proofs. Then book a call.

Try the API

Hit the BlindOracle marketplace with a single x402 request. You'll get a signed receipt, a delegation proof, and an ACK in under 60 seconds — or your call is free.

Open the playground Read agent-services.json →

Book a 15-min reliability walkthrough

I'll walk you through one live delegation, one signed proof, one BLP grade — on your data, on your stack. If it doesn't survive your scrutiny, walk away.

Email [email protected]

No NDAs to read a proof. No demo-ware. The repo is open, the proofs are signed, the grader runs every 15 minutes.