Agent chaos
Semantically wrong responses that pass structural validation. Tests whether agents validate content, not just shape.
Token-keyed retry chaos. The first call for a given id returns version 1; any subsequent call for the same id within a 60-second window returns version 2 — same response shape, different data. Tests whether retry logic silently picks up changed data without noticing.
Returns a structurally-valid JSON response matching one of six common discovery schemas, where every URL points at something that doesn't exist and every claim about capabilities is a lie. The shape passes schema validators; the values fail any consumer that actually tries to resolve them. Set ?ai=true to have an edge LLM generate a fresh, different drifted document on every call.
Structural validation isn’t enough.
An agent that fetches a discovery document, runs it through a JSON schema
validator, and proceeds happily will pass /semantic-drift cleanly — right
up until it tries to actually use the values. The OIDC authorization_endpoint
returns 404. The webfinger links[0].href 404s. The JWKS key cannot verify
any real token. The agent-card promises skills the agent has never had.
This is a common failure mode in agentic systems: the data parses, the schema matches, the orchestrator assumes the rest of the world also matches, and the failure manifests three calls deep with a misleading error.
The X-Chaos-Drift: semantic response header is a hint for monitoring: a
well-behaved client could treat it as a signal that this is intentional chaos
rather than a real outage. Real consumers should validate by trying the
values: actually call the URL, actually verify a token with the key, actually
exercise the skill before trusting the contract.
AI mode
/semantic-drift?ai=true swaps the hardcoded drift for one generated fresh
by an edge LLM (Llama-3.1-8b) on every call. Same schema, different invented
agent names, different hostnames, different skill descriptions every time —
designed to break pattern-matching defenses that learn the deterministic
output’s tells.
The trade-off is non-determinism and latency. Tests that assert exact
response content should keep ai=true off. Latency adds 100ms–10s per call
(first-call-after-cold-load is the slow case). Each call consumes part of
a shared daily inference budget.
When the budget is approaching exhaustion, the endpoint automatically
falls back to deterministic drift. The X-Chaos-Ai-Source response header
tells you which path served the request — ai (success) or fallback
(degraded). On fallback, X-Chaos-Ai-Fallback-Reason explains why
(budget-exhausted, invalid-json, ai-error, etc.).
Together these make AI mode a safe opt-in: pass ?ai=true to get varied
output when you want it, accept latency and inference cost when you do, and
know via response headers when you’re getting the deterministic fallback
instead.
Partial streaming
/partial-stream is the same idea applied at the transport layer: the response
is well-formed up to a point, then cuts off without warning. The HTTP layer
reports 200 OK. The Content-Type promises JSON. The body starts JSON. It just
never finishes.
Different from /truncate (which sends a fully self-consistent shorter body
but lies about Content-Length) and from /drip (which eventually completes).
This one parses as a network success but the payload is unusable.
Agents that buffer-then-parse will get a parse error. Agents that stream-parse will get an incremental document that never closes. Agents that handle either gracefully — fall back to a cached response, retry once, return a structured error — are doing the right thing.
Retry mismatch
/retry-mismatch is about idempotency assumptions. Most retry logic in the
wild assumes that a successful retry returns the same data the original
attempt would have. That’s true for static resources and almost nothing
else. Real APIs serve cursor-paginated lists where the underlying data
shifted; auth-protected endpoints where the token rotated; rate-limited
endpoints where the response narrowed; A/B-tested endpoints where the
bucket flipped.
The endpoint tests this with a token-keyed two-shot pattern: the first
call for ?id=<token> returns version 1 of the response; any retry for the
same id within 60 seconds returns version 2 — same shape, different data.
Watch the data.version field, the data.items array, or the
X-Chaos-Retry-Replayed header to detect the shift. A client whose retry
logic blindly returns the second response as if it were the first is the
failure mode this catches.
Use any token you like for ?id=. State is per-edge (in-memory cache,
not durable storage) and expires after 60s, so a fresh id always resets
the cycle.