Criterion 6

robots.txt for AI Crawlers

Configuring your robots.txt file to explicitly manage how AI systems like ChatGPT, Perplexity, and Google Gemini access and use your content.

What It Is

robots.txt is a text file at your domain root (example.com/robots.txt) that tells web crawlers which parts of your site they can access. With the rise of AI systems, a new category of crawlers has emerged — AI-specific bots that collect content for training and retrieval.

Key AI crawlers: - **GPTBot** — OpenAI (ChatGPT, GPT-based products) - **CCBot** — Common Crawl (used by many AI companies including Anthropic) - **Google-Extended** — Google (Gemini, AI Overviews, Bard) - **PerplexityBot** — Perplexity AI - **Anthropic-AI** — Anthropic (Claude) - **Bytespider** — ByteDance (TikTok AI features)

Your robots.txt can explicitly Allow or Disallow each of these bots, giving you control over your AI visibility strategy.

Why It Matters for AEO

Most websites use the default robots.txt from their platform (Shopify, WordPress, etc.), which has no AI-specific rules. This creates two problems:

1. **No intentional AI policy**: Without explicit rules, you have no control over how AI systems use your content 2. **Missed signal of AI-friendliness**: Explicitly allowing AI crawlers signals to these systems that your content is available and welcome for citation

For AEO, the strategic approach is to Allow AI crawlers on content you want cited (blog posts, product pages, FAQ) while potentially blocking areas you don't want scraped (admin, checkout, internal tools).

How to Implement

Add these rules to your robots.txt file:

``` # AI Crawler Policy — Explicitly allow AI systems User-agent: GPTBot Allow: / Crawl-delay: 2

User-agent: CCBot Allow: / Crawl-delay: 2

User-agent: Google-Extended Allow: /

User-agent: PerplexityBot Allow: / Crawl-delay: 2

User-agent: anthropic-ai Allow: / Crawl-delay: 2

User-agent: Bytespider Disallow: / ```

**For Shopify**: Edit the robots.txt.liquid file in your theme (Online Store > Themes > Edit code > Templates > robots.txt.liquid).

**For WordPress**: Use a plugin like Yoast SEO to edit robots.txt, or directly edit the file in your root directory.

**For Next.js / static sites**: Create a public/robots.txt file or generate it dynamically via an API route.

The `Crawl-delay` directive asks bots to wait between requests, preventing excessive server load.

Common Mistakes

- Blocking all AI crawlers — this makes your content invisible to AI assistants entirely - Forgetting that robots.txt is advisory, not enforced — well-behaved bots follow it, but it's not a security measure - Not including a Sitemap reference — always add `Sitemap: https://yoursite.com/sitemap.xml` - Using overly broad Disallow rules that accidentally block content pages - Not testing the file after changes — use Google's robots.txt tester - Platform limitations — Shopify's robots.txt is partially managed by the platform

External Resources

- Google Search Central: robots.txt documentation - OpenAI's GPTBot documentation — Official crawler details - robotstxt.org — The robots.txt standard reference

← Back to Knowledge Base