Crawler files

Chaotic versions of the well-known files that crawlers, scanners, and AI agents fetch by canonical path.

GET /.well-known/ai.txt

An ai.txt — the AI-scraping equivalent of robots.txt — that contradicts itself, references AI bots that don't exist, or returns malformed directives.

mode contradictory (default; Allow and Disallow for the same path on GPTBot/ClaudeBot/etc.), fake-bots (User-Agent names that aren't real AI crawlers), malformed (missing colons, invalid values).

details

GET /.well-known/security.txt

An RFC 9116 security.txt with broken required fields: an Expires date in the past, dead Contact URLs, total nonsense in every field, or a Canonical that doesn't include the served URL.

mode expired (default; Expires: 1999-12-31, well in the past), dead-contact (Contact URLs that 404), nonsense (gibberish in every field), unsigned-canonical (Canonical claims a URL we don't actually serve from, with no signature).

details

GET /ads.txt

An IAB ads.txt with internally contradictory authorized-seller declarations, fictitious ad networks, or malformed lines that ads.txt crawlers should reject.

mode contradictory (default; same seller listed as both DIRECT and RESELLER), fake-sellers (domains that don't exist or are decommissioned), malformed (missing fields, bad delimiters).

details

GET /humans.txt

A humans.txt that contradicts itself, recursively references itself for every field, or contains time-paradox dates.

mode contradictory (default; same person listed as both Maintainer and 'does not contribute'), recursive ('see /humans.txt' for every field), time-paradox (dates like 2099-13-32, 1856-02-29, 'tomorrow').

details

GET /llms.txt

An llms.txt (the proposed convention for telling LLMs about site structure) that lists pages that 404, contradicts itself, or embeds prompt-injection content. Tests whether AI agents that ingest llms.txt sanitise it before acting.

mode dead-links (default; every linked path 404s), contradictory (summary disagrees with linked-page content), prompt-injection (embeds 'ignore previous instructions' content — clearly labelled chaos, useful for testing whether your agent sanitises ingested llms.txt).

details

GET /robots.txt

A robots.txt that contradicts itself, sets impossible crawl delays, or returns malformed directives. Crawlers that parse strictly should reject; lenient crawlers will produce unpredictable behaviour.

mode contradictory (default; Allow and Disallow for the same path), tarpit (50 narrow Allow paths that all loop), malformed (missing colons, invalid directives), infinite-crawl-delay (Crawl-delay: 999999999).

details

GET /sitemap.xml

A sitemap that crawlers will follow into bad places: 404 URLs, future-dated lastmod values, circular sitemap-index references, or a body that claims gzip encoding but isn't gzipped.

mode dead-urls (default; every <loc> 404s), future-lastmod (lastmod dates in 2099+), circular-index (sitemap-index referencing itself), wrong-encoding (Content-Encoding: gzip header on plain-text body).

details

These files are served at their canonical well-known paths on bots.catastrophic.io. Point your crawler, scanner, or AI agent at this host and observe how it handles metadata that real-world tooling rarely treats as untrusted.

Each file supports a ?mode= parameter to select between several flavours of chaos, with a sensible default per file. All responses carry an X-Chaos-*-Mode header reflecting the selection so monitoring clients can verify which mode they received.