Complete Guide to llms.txt

What llms.txt is, why it exists, how to use it in practice, and the limits and best practices to help agents and LLMs answer better about your site.

Origins, purpose and status

llms.txt is an editorial proposal: a Markdown file at the root that provides a curated map of important content and how to interpret it.
It coexists with robots.txt and sitemap.xml: it doesn’t replace them; it adds a curatorial layer to reduce noise, ambiguity and context selection costs.
Adoption is heterogeneous: very useful for controlled agents/tools (IDE, chatbot, helpdesk), less as a universal “SEO signal”; not all AI crawlers fetch it systematically.

Human- and machine-readable, structured enough for deterministic parsing.
Typically lives at https://yourdomain.tld/llms.txt.
Encourages clean Markdown mirrors of important pages (via .md suffix, including index.html.md for path without filename).

llms.txt as an index with links: lighter and navigable.
llms-full.txt as a full dump: immediate but potentially huge; often used with indexing and retrieval (RAG).

Concise, clear language.
Short informative descriptions next to links.
Reduce ambiguity and unexplained jargon.
Empirical testing: expand links and verify answers against real content and policies.

Technical docs: Quickstart/Getting Started, API Reference, runnable examples, decision guides, compatibility/versioning.
Company/product site: About, Products/Services, Pricing, FAQ, Support/Contact, Security/Compliance, Privacy/Terms, returns/shipping policies.
Portfolio/personal: CV/bio, main projects, contacts, notable talks/publications.

No hard size limit; prefer a curated “index” over indiscriminate dumps.
If you need full content, separate it explicitly (llms-full.txt or section files) and consider RAG for large corpora.

Not a “hard” control mechanism: systems may use or ignore it.
Mixed evidence of AI bots fetching it: useful, but don’t expect automatic discovery/traffic gains.
Governance: avoid non-public information; ensure stability/versioning of linked pages for consistency over time.