Agent chaos

Semantically wrong responses that pass structural validation. Tests whether agents validate content, not just shape.

GET /retry-mismatch

Token-keyed retry chaos. The first call for a given id returns version 1; any subsequent call for the same id within a 60-second window returns version 2 — same response shape, different data. Tests whether retry logic silently picks up changed data without noticing.

id Client-supplied identity token (1–128 chars). State is tracked per id. Use any string you like — uuids, request IDs, anything stable across the retry.

details

GET /semantic-drift

Returns a structurally-valid JSON response matching one of six common discovery schemas, where every URL points at something that doesn't exist and every claim about capabilities is a lie. The shape passes schema validators; the values fail any consumer that actually tries to resolve them. Set ?ai=true to have an edge LLM generate a fresh, different drifted document on every call.

schema Which schema to return. One of: openid-configuration, oauth-authorization-server, webfinger, jwks, host-meta, agent-card. Default: openid-configuration.

ai If true, the response is generated by Llama-3.1-8b per request. Same shape, different invented values each call — agent name, hostnames, skill descriptions all vary. Falls back to deterministic drift on quota exhaustion or model error; check X-Chaos-Ai-Source header to see which path served you. Default: false.

details

Structural validation isn’t enough.

An agent that fetches a discovery document, runs it through a JSON schema validator, and proceeds happily will pass /semantic-drift cleanly — right up until it tries to actually use the values. The OIDC authorization_endpoint returns 404. The webfinger links[0].href 404s. The JWKS key cannot verify any real token. The agent-card promises skills the agent has never had.

This is a common failure mode in agentic systems: the data parses, the schema matches, the orchestrator assumes the rest of the world also matches, and the failure manifests three calls deep with a misleading error.

The X-Chaos-Drift: semantic response header is a hint for monitoring: a well-behaved client could treat it as a signal that this is intentional chaos rather than a real outage. Real consumers should validate by trying the values: actually call the URL, actually verify a token with the key, actually exercise the skill before trusting the contract.

AI mode

/semantic-drift?ai=true swaps the hardcoded drift for one generated fresh by an edge LLM (Llama-3.1-8b) on every call. Same schema, different invented agent names, different hostnames, different skill descriptions every time — designed to break pattern-matching defenses that learn the deterministic output’s tells.

The trade-off is non-determinism and latency. Tests that assert exact response content should keep ai=true off. Latency adds 100ms–10s per call (first-call-after-cold-load is the slow case). Each call consumes part of a shared daily inference budget.

When the budget is approaching exhaustion, the endpoint automatically falls back to deterministic drift. The X-Chaos-Ai-Source response header tells you which path served the request — ai (success) or fallback (degraded). On fallback, X-Chaos-Ai-Fallback-Reason explains why (budget-exhausted, invalid-json, ai-error, etc.).

Together these make AI mode a safe opt-in: pass ?ai=true to get varied output when you want it, accept latency and inference cost when you do, and know via response headers when you’re getting the deterministic fallback instead.

Partial streaming

/partial-stream is the same idea applied at the transport layer: the response is well-formed up to a point, then cuts off without warning. The HTTP layer reports 200 OK. The Content-Type promises JSON. The body starts JSON. It just never finishes.

Different from /truncate (which sends a fully self-consistent shorter body but lies about Content-Length) and from /drip (which eventually completes). This one parses as a network success but the payload is unusable.

Agents that buffer-then-parse will get a parse error. Agents that stream-parse will get an incremental document that never closes. Agents that handle either gracefully — fall back to a cached response, retry once, return a structured error — are doing the right thing.

Retry mismatch

/retry-mismatch is about idempotency assumptions. Most retry logic in the wild assumes that a successful retry returns the same data the original attempt would have. That’s true for static resources and almost nothing else. Real APIs serve cursor-paginated lists where the underlying data shifted; auth-protected endpoints where the token rotated; rate-limited endpoints where the response narrowed; A/B-tested endpoints where the bucket flipped.

The endpoint tests this with a token-keyed two-shot pattern: the first call for ?id=<token> returns version 1 of the response; any retry for the same id within 60 seconds returns version 2 — same shape, different data.

Watch the data.version field, the data.items array, or the X-Chaos-Retry-Replayed header to detect the shift. A client whose retry logic blindly returns the second response as if it were the first is the failure mode this catches.

Use any token you like for ?id=. State is per-edge (in-memory cache, not durable storage) and expires after 60s, so a fresh id always resets the cycle.