ai.txt & TDM Policy
robots.txt controls crawling. llms.txt describes your content. But neither answers the question AI companies actually care about: "Are we allowed to use this?"
Questions this article answers
- ?What is ai.txt and do I need one for my website?
- ?How do I declare whether AI systems can use my content for training?
- ?What is the TDM Reservation Protocol and how does it affect AI crawlers?
Summarize This Article With AI
Open this article in your preferred AI engine for an instant summary and analysis.
Quick Answer
ai.txt is an emerging standard (like robots.txt for licensing) declaring whether AI systems may use your content for training, retrieval, or citation. The TDM Reservation Protocol uses HTTP headers and meta tags to express reuse permissions in a machine-readable way. Fewer than 5% of sites have either.
Before & After
Before - No AI usage policy
# robots.txt User-agent: * Disallow: /admin/ # No AI-specific rules # No ai.txt file # No TDM headers # AI companies guess your preferences
After - Clear ai.txt with TDM headers
# /ai.txt Training: Disallowed Retrieval: Allowed with Attribution Citation: Allowed with Link # HTTP header on content pages TDM-Reservation: 0
What This Actually Measures
We're checking whether your site publishes machine-readable declarations about how AI systems are allowed to use your content. This goes beyond robots.txt (crawling access) and licensing (copyright terms) to address the specific question: "Can AI systems use this for training models, for retrieval-augmented generation, and for citation in answers?"
Three distinct policy mechanisms get checked. First, the ai.txt file -an emerging convention (like robots.txt and llms.txt) at the domain root declaring per-use-case permissions. The format typically specifies policies for training (use for model training?), retrieval (fetch and summarize in real-time responses?), and citation (quote with attribution in answers?).
Second, the TDM Reservation Protocol -a W3C-drafted standard implementing the EU DSM Directive's Article 4, allowing rights holders to reserve text and data mining rights. It uses HTTP headers (TDM-Reservation: 1) and HTML meta tags (<meta name="tdm-reservation" content="1">) to declare that automated mining requires explicit permission.
Third, we check related signals in robots.txt AI crawler directives, llms.txt content usage sections, and terms-of-service pages linked from structured data. The combination creates a composite "AI usage policy clarity" score -how clearly your site communicates its content reuse policy.
Primary metric: "AI policy signal presence" -does the site have any machine-readable AI usage policy? Secondary: "policy completeness" -does it address all three use cases (training, retrieval, citation) with clear permissions or restrictions?
Why "No Policy" Is the Worst Policy
The AI licensing landscape is evolving fast. OpenAI, Anthropic, Google, and Meta are building systems that attempt to respect publisher content policies. Sites with clear declarations -permissive or restrictive -get their preferences honored. Sites without policies get treated according to each company's default, which varies and may not match what you want.
For sites wanting maximum AI visibility, a clear permissive policy is strategically valuable. When your ai.txt explicitly states AI systems can retrieve and cite your content with attribution, AI systems checking this policy (and more do each quarter) cite your content more freely. Without this signal, some systems apply conservative defaults limiting how extensively they quote you.
For sites wanting to restrict AI usage, a clear restrictive policy is the only reliable mechanism. robots.txt blocks crawling but doesn't address training or retrieval from cached data. Copyright declarations don't address specific AI use cases. TDM Reservation and ai.txt are the purpose-built tools.
Consistency across mechanisms is critical. An ai.txt saying "retrieval allowed" paired with robots.txt blocking GPTBot sends contradictory signals. AI companies interpret these conflicts differently, leading to unpredictable behavior. We check internal consistency across all signals to ensure your site communicates one clear policy.
How We Check This
Policy signals are checked at three levels: domain-wide files, HTTP headers, and per-page meta tags.
At the domain level, we send HEAD and GET requests to conventional ai.txt URLs: /ai.txt, /.well-known/ai.txt, /ai-policy.txt. If found, we parse for structured policy declarations -key-value pairs (Training: Allowed, Retrieval: Allowed with Attribution, Citation: Allowed with Link) and block-based per-company declarations.
We re-examine robots.txt and llms.txt for AI-specific policy content. robots.txt entries for AI crawlers (GPTBot, ClaudeBot, PerplexityBot) are interpreted as crawling policy. llms.txt sections describing usage permissions are interpreted as retrieval/citation policy.
At the HTTP header level, we check responses from a page sample (homepage, content page, product page) for TDM-related headers: TDM-Reservation, X-Robots-Tag with AI-specific directives, and custom headers used by specific AI companies (X-AI-Usage headers observed on some publisher sites).
At the per-page level, we parse HTML for TDM meta tags (<meta name="tdm-reservation" content="1">), Creative Commons meta tags addressing derivative works, and custom AI-usage meta tags.
Then the cross-signal consistency check. We map all detected policy signals -ai.txt, robots.txt, llms.txt, HTTP headers, per-page meta tags -and verify they express consistent permissions. Conflicts (ai.txt allows retrieval but robots.txt blocks AI crawlers) get flagged with specific remediation steps.
How We Score It
AI policy scoring evaluates presence, completeness, and consistency:
1. Policy presence (4 points): - ai.txt file exists with parseable declarations: 4/4 points - No ai.txt but clear AI policy in llms.txt or robots.txt AI crawler rules: 3/4 points - TDM Reservation headers or meta tags without ai.txt: 2/4 points - Only generic robots.txt with no AI-specific rules: 1/4 points - No AI-related signals detected: 0/4 points
2. Policy completeness (3 points): - Addresses all three use cases (training, retrieval, citation): 3/3 points - Two of three: 2/3 points - One only: 1/3 points - Present but vague or unactionable: 0/3 points
3. Cross-signal consistency (3 points): - All signals (ai.txt, robots.txt, llms.txt, headers, meta tags) are consistent: 3/3 points - Minor inconsistencies without contradictions: 2/3 points - One contradiction: 1/3 points - Multiple contradictions: 0/3 points
Bonus: - +0.5 points if ai.txt includes per-company policies (different rules for different AI providers) - +0.5 points if TDM policy links to a human-readable terms page
Deductions: - -1 point if robots.txt blocks AI crawlers while other signals suggest permissive policy (direct contradiction) - -0.5 points if ai.txt exists but isn't parseable (malformed, ambiguous) - -0.5 points if TDM Reservation is 1 (opt-out) but no mechanism for researchers to request access
This is one of the newest criteria. Fewer than 5% of sites have an ai.txt file as of early 2026. Most score 0-2. News publishers engaged with AI policy discussions score 4-7. Sites with multi-signal AI policies score 8-10.
Resources
Key Takeaways
- An ai.txt file explicitly declares whether AI systems may train on, retrieve, or cite your content.
- No policy is the worst policy - sites without declarations get treated according to each AI company's unpredictable defaults.
- Keep all signals consistent - ai.txt, robots.txt, llms.txt, and TDM headers should express the same permissions.
- Fewer than 5% of sites have ai.txt as of 2026 - adding one is a quick win for AI policy clarity.
How does your site score on this criterion?
Get a free AEO audit and see where you stand across all 10 criteria.