Technical AuditCriterion T14

ai.txt & TDM Policy

robots.txt controls crawling. llms.txt describes your content. But neither answers the question AI companies actually care about: "Are we allowed to use this?"

Published February 14, 2026

low effortlow impact

Questions this article answers

?What is ai.txt and do I need one for my website?
?How do I declare whether AI systems can use my content for training?
?What is the TDM Reservation Protocol and how does it affect AI crawlers?

Summarize This Article With AI

Open this article in your preferred AI engine for an instant summary and analysis.

ChatGPT Perplexity Google AI

AI & TDM Policy Files

✓ai.txt file present

✓TDM licensing declared

~Opt-in for AI training

~Crawl rate specified

✓Usage restrictions clear

✓Machine-readable format

aeocontent.ai

Quick Answer

ai.txt is an emerging standard (like robots.txt for licensing) declaring whether AI systems may use your content for training, retrieval, or citation. The TDM Reservation Protocol uses HTTP headers and meta tags to express reuse permissions in a machine-readable way. Fewer than 5% of sites have either.

Before & After

Before - No AI usage policy

# robots.txt
User-agent: *
Disallow: /admin/
# No AI-specific rules
# No ai.txt file
# No TDM headers
# AI companies guess your preferences

After - Clear ai.txt with TDM headers

# /ai.txt
Training: Disallowed
Retrieval: Allowed with Attribution
Citation: Allowed with Link

# HTTP header on content pages
TDM-Reservation: 0

What This Actually Measures

We're checking whether your site publishes machine-readable declarations about how AI systems are allowed to use your content. This goes beyond robots.txt (crawling access) and licensing (copyright terms) to address the specific question: "Can AI systems use this for training models, for retrieval-augmented generation, and for citation in answers?"

Three distinct policy mechanisms get checked. First, the ai.txt file -an emerging convention (like robots.txt and llms.txt) at the domain root declaring per-use-case permissions. The format typically specifies policies for training (use for model training?), retrieval (fetch and summarize in real-time responses?), and citation (quote with attribution in answers?).

Second, the TDM Reservation Protocol -a W3C-drafted standard implementing the EU DSM Directive's Article 4, allowing rights holders to reserve text and data mining rights. It uses HTTP headers (TDM-Reservation: 1) and HTML meta tags (<meta name="tdm-reservation" content="1">) to declare that automated mining requires explicit permission.

Third, we check related signals in robots.txt AI crawler directives, llms.txt content usage sections, and terms-of-service pages linked from structured data. The combination creates a composite "AI usage policy clarity" score -how clearly your site communicates its content reuse policy.

Primary metric: "AI policy signal presence" -does the site have any machine-readable AI usage policy? Secondary: "policy completeness" -does it address all three use cases (training, retrieval, citation) with clear permissions or restrictions?

Why "No Policy" Is the Worst Policy

The AI licensing landscape is evolving fast. OpenAI, Anthropic, Google, and Meta are building systems that attempt to respect publisher content policies. Sites with clear declarations -permissive or restrictive -get their preferences honored. Sites without policies get treated according to each company's default, which varies and may not match what you want.

For sites wanting maximum AI visibility, a clear permissive policy is strategically valuable. When your ai.txt explicitly states AI systems can retrieve and cite your content with attribution, AI systems checking this policy (and more do each quarter) cite your content more freely. Without this signal, some systems apply conservative defaults limiting how extensively they quote you.

For sites wanting to restrict AI usage, a clear restrictive policy is the only reliable mechanism. robots.txt blocks crawling but doesn't address training or retrieval from cached data. Copyright declarations don't address specific AI use cases. TDM Reservation and ai.txt are the purpose-built tools.

Consistency across mechanisms is critical. An ai.txt saying "retrieval allowed" paired with robots.txt blocking GPTBot sends contradictory signals. AI companies interpret these conflicts differently, leading to unpredictable behavior. We check internal consistency across all signals to ensure your site communicates one clear policy.

How We Check This

Policy signals are checked at three levels: domain-wide files, HTTP headers, and per-page meta tags.

At the domain level, we send HEAD and GET requests to conventional ai.txt URLs: /ai.txt, /.well-known/ai.txt, /ai-policy.txt. If found, we parse for structured policy declarations -key-value pairs (Training: Allowed, Retrieval: Allowed with Attribution, Citation: Allowed with Link) and block-based per-company declarations.

We re-examine robots.txt and llms.txt for AI-specific policy content. robots.txt entries for AI crawlers (GPTBot, ClaudeBot, PerplexityBot) are interpreted as crawling policy. llms.txt sections describing usage permissions are interpreted as retrieval/citation policy.

At the HTTP header level, we check responses from a page sample (homepage, content page, product page) for TDM-related headers: TDM-Reservation, X-Robots-Tag with AI-specific directives, and custom headers used by specific AI companies (X-AI-Usage headers observed on some publisher sites).

At the per-page level, we parse HTML for TDM meta tags (<meta name="tdm-reservation" content="1">), Creative Commons meta tags addressing derivative works, and custom AI-usage meta tags.

Then the cross-signal consistency check. We map all detected policy signals -ai.txt, robots.txt, llms.txt, HTTP headers, per-page meta tags -and verify they express consistent permissions. Conflicts (ai.txt allows retrieval but robots.txt blocks AI crawlers) get flagged with specific remediation steps.

How We Score It

AI policy scoring evaluates presence, completeness, and consistency:

1. Policy presence (4 points): - ai.txt file exists with parseable declarations: 4/4 points - No ai.txt but clear AI policy in llms.txt or robots.txt AI crawler rules: 3/4 points - TDM Reservation headers or meta tags without ai.txt: 2/4 points - Only generic robots.txt with no AI-specific rules: 1/4 points - No AI-related signals detected: 0/4 points

2. Policy completeness (3 points): - Addresses all three use cases (training, retrieval, citation): 3/3 points - Two of three: 2/3 points - One only: 1/3 points - Present but vague or unactionable: 0/3 points

3. Cross-signal consistency (3 points): - All signals (ai.txt, robots.txt, llms.txt, headers, meta tags) are consistent: 3/3 points - Minor inconsistencies without contradictions: 2/3 points - One contradiction: 1/3 points - Multiple contradictions: 0/3 points

Bonus: - +0.5 points if ai.txt includes per-company policies (different rules for different AI providers) - +0.5 points if TDM policy links to a human-readable terms page

Deductions: - -1 point if robots.txt blocks AI crawlers while other signals suggest permissive policy (direct contradiction) - -0.5 points if ai.txt exists but isn't parseable (malformed, ambiguous) - -0.5 points if TDM Reservation is 1 (opt-out) but no mechanism for researchers to request access

This is one of the newest criteria. Fewer than 5% of sites have an ai.txt file as of early 2026. Most score 0-2. News publishers engaged with AI policy discussions score 4-7. Sites with multi-signal AI policies score 8-10.

Resources

W3C TDM Reservation Protocol Specification

www.w3.org/community/reports/tdmrep/CG-FINAL-tdmrep-20240510

llms.txt Specification

llmstxt.org

robots.txt Standard

www.robotstxt.org

Key Takeaways

An ai.txt file explicitly declares whether AI systems may train on, retrieve, or cite your content.
No policy is the worst policy - sites without declarations get treated according to each AI company's unpredictable defaults.
Keep all signals consistent - ai.txt, robots.txt, llms.txt, and TDM headers should express the same permissions.
Fewer than 5% of sites have ai.txt as of 2026 - adding one is a quick win for AI policy clarity.

How does your site score on this criterion?

Get a free AEO audit and see where you stand across all 10 criteria.

Written by

Alex Shortov

CTO of AEO Content, Inc. Building tools to help businesses get cited by AI answer engines.

Related FAQs

Technical Audit Criteria

What are conditional audit criteria?

Conditional criteria are checks that only apply to certain site types. Product/Offer schema applies only to e-commerce. Speakable schema only to content-heavy sites. Hreflang only to multilingual sites. ai.txt/TDM policy only to sites with substantial original content. These 4 conditional criteria supplement the 10 universal technical audit criteria.

What content licensing signals does the audit check?

The audit looks for machine-readable licensing metadata: CreativeWork license properties in JSON-LD, meta tags indicating reuse policy, copyright notices, ai.txt declarations, and TDM Reservation Protocol headers. Clear licensing tells AI systems whether they can quote or reference your content in their answers.

What This Actually Measures

Why "No Policy" Is the Worst Policy

How We Check This

Policy signals are checked at three levels: domain-wide files, HTTP headers, and per-page meta tags.

At the per-page level, we parse HTML for TDM meta tags (<meta name="tdm-reservation" content="1">), Creative Commons meta tags addressing derivative works, and custom AI-usage meta tags.

How We Score It

AI policy scoring evaluates presence, completeness, and consistency:

Bonus: - +0.5 points if ai.txt includes per-company policies (different rules for different AI providers) - +0.5 points if TDM policy links to a human-readable terms page

Related FAQs

Technical Audit Criteria

What are conditional audit criteria?

What content licensing signals does the audit check?

ai.txt & TDM Policy

What This Actually Measures

Why "No Policy" Is the Worst Policy

How We Check This

How We Score It

Resources

Related FAQs

Related Guides

ai.txt & TDM Policy

What This Actually Measures

Why "No Policy" Is the Worst Policy

How We Check This

How We Score It

Resources

Related FAQs

Related Guides