Clean HTML: If Crawlers Can't See It, It Doesn't Exist
Most AI crawlers don't run JavaScript. If your content loads after page render -behind accordions, SPAs, or API calls -you're invisible. We've seen entire FAQ sections vanish from AI's perspective because of one accordion widget.
Part of the AEO scoring framework - the current 48 criteria that measure how ready a website is for AI-driven search across ChatGPT, Claude, Perplexity, and Google AIO.
Quick Answer
Your content must be in the raw HTML the server sends -not loaded by JavaScript after render. AI crawlers skip JS entirely. Use server-side rendering or static generation. We've found sites where 60% of their content was invisible to AI because of client-side rendering.
Audit Note
In our audits, we've measured Clean HTML: If Crawlers Can't See It, It Doesn't Exist on live sites, we've compared implementations, and we've audited...
Why is my content invisible to AI crawlers even though it shows on my website?
Here's a scenario we run into constantly.
Do AI crawlers like GPTBot run JavaScript or only read raw HTML?
Try something right now.
How do I check if my FAQ content is hidden from AI by JavaScript?
**1.
Summarize This Article With AI
Open this article in your preferred AI engine for an instant summary and analysis.
Before & After
Before - Content hidden behind JS accordion
<div class="accordion" aria-expanded="false">
<div class="content" style="display:none">
Answer here
</div>
</div>After - Content visible in HTML source
<details> <summary>Question here</summary> <p>Answer here - visible to crawlers in the HTML source</p> </details>
Why Is Some Content Invisible to AI Crawlers?
Here's a scenario we run into constantly. A site has great FAQ content -30 well-written questions with detailed answers. Score should be high. But when we pull the raw HTML source? The FAQ section is empty. The answers only appear after JavaScript fires and a user clicks each accordion.
Clean, crawlable HTML means your meaningful content is present in the initial HTML document the server sends -before any JavaScript runs. GPTBot, CCBot, PerplexityBot -they all fetch HTML and extract text. They don't execute JavaScript.
This means: - Text visible in View Source (not Inspect Element) is what AI sees - Proper semantic HTML elements (<main>, <article>, <section>) guide extraction - Content behind accordion clicks, tab switches, or "Read more" buttons is often invisible - Heavy DOM complexity from third-party scripts creates noise that drowns out your content - Slow server responses can cause crawlers to time out and move on
How Do You Check What AI Actually Sees on Your Pages?
Try something right now. Open your site. Right-click. View Source. Search for a key phrase from your FAQ or main content.
Not there? That's what AI sees. Nothing.
This is the most common "hidden" problem we find in audits. The content exists -it's on the page if you're a human with a browser. But AI crawlers see a blank page or a loading spinner.
The damage is specific: - FAQ answers behind aria-expanded="false" accordions? Invisible to AI. - Single-page applications that render client-side? They appear blank to crawlers. - Dynamic content loaded via API calls after page render? Missed entirely. - Heavy JavaScript payloads that slow down crawling? Timeout. Next site.
In the live chat vertical, we found sites losing points specifically on this criterion -great content that AI literally couldn't see because of how it was rendered. The content isn't the problem. The delivery mechanism is.
How Do You Make Content Visible to AI Crawlers?
1. Server-side render everything important Use SSR or static site generation so content is in the HTML response. Next.js, Nuxt, Astro -these frameworks handle this well. Our own site (aeocontent.ai, 88/100) is Next.js with server-side rendering on every page.
2. Replace hidden accordions with visible HTML ```html <!-- Bad: Content requires JS to display --> <div class="accordion" aria-expanded="false"> <div class="content" style="display:none">Answer here</div> </div>
<!-- Good: Content visible in HTML, CSS handles display --> <details> <summary>Question here</summary> <p>Answer here -visible to crawlers in the HTML source</p> </details> ```
The <details> element is the secret weapon. Content is in the HTML for crawlers. CSS handles the expand/collapse for humans.
3. Audit what crawlers actually see ```bash # Check what crawlers see (no JS execution) curl -s yoursite.com/page | grep -c "your key content phrase"
# If 0 results, your content is JS-dependent ```
4. Defer non-essential scripts
Load analytics, chat widgets, and social proof scripts with defer or async. Don't let third-party JavaScript compete with your actual content for parse time.
5. Add <noscript> fallbacks For essential content that requires JavaScript, provide a <noscript> version with the same content. Belt and suspenders.
Start here: Run the curl test above on your top 5 pages. Count how many key phrases are actually in the raw HTML. The number will surprise you.
What Hidden Rendering Traps Break AI Crawlability?
"It works in my browser" -the most dangerous assumption in AEO. Your browser runs JavaScript. Crawlers don't. They're fundamentally different experiences.
Custom web components (<my-accordion>, <fancy-tabs>) that render content client-side. The HTML source shows empty custom elements. AI sees nothing.
CSS class="hidden" on blog excerpts or content sections. Some crawlers skip elements marked hidden. Don't gamble on it.
Fetch/XHR loading critical content. If your FAQ answers come from an API call after page load -crawlers won't wait for it. They've already moved on.
Blind trust in platform themes. Shopify and WordPress themes routinely render content client-side. Don't assume your platform handles this correctly. Test it.
Score Impact in Practice
Clean HTML / Crawlable Content carries 3% weight in the Technical Plumbing tier. Sites where 90%+ of key content is present in the raw HTML source score 8-10/10 on this criterion. Sites with significant JavaScript-dependent content rendering typically score 3-5/10, and single-page applications with client-side rendering can score 0-1/10.
The real damage goes beyond the 3% weight. When content is invisible to crawlers, every other criterion that depends on content quality - Q&A Format, Direct Answer Density, Original Data, Content Depth - scores based on what AI can see, not what exists on your page. A site with 30 excellent FAQ answers behind JavaScript accordions scores 0 on both Clean HTML and FAQ Section. That's compounding failure from a single implementation choice.
In our audits, we regularly find sites where 40-60% of the content is invisible to the curl test. One home services company had a beautiful FAQ section with 45 questions - all rendered client-side via React. Their FAQ Section score was 1/10. After switching to server-side rendering, the same content scored 8/10 with zero content changes. The words didn't change. The delivery mechanism did.
How AI Engines Evaluate This
The critical fact: none of the major AI crawlers execute JavaScript. GPTBot, ClaudeBot, PerplexityBot, and CCBot all fetch the raw HTML response and extract text from it. If your content isn't in that initial HTML payload, it doesn't exist to these systems.
GPTBot (OpenAI) sends a standard HTTP GET request and processes the HTML response. It does not render JavaScript, wait for API calls, or execute client-side routing. Content that requires React hydration, Vue mounting, or Angular bootstrapping is completely invisible to GPTBot. It processes what the server sends and moves on - typically within 2-3 seconds per page.
ClaudeBot (Anthropic) follows the same pattern but is particularly sensitive to DOM complexity. Pages with deeply nested div structures from component frameworks produce noise that makes content extraction harder even when the content is technically present. Clean semantic HTML with minimal nesting depth gives ClaudeBot the clearest extraction path.
PerplexityBot operates under tight time constraints because it builds answers in real-time. If a page is slow to respond (over 3 seconds) or returns a minimal HTML shell that expects JavaScript to populate content, PerplexityBot moves to the next source. Speed and HTML completeness are both factors.
Google-Extended (used for Gemini and AI Overviews) has slightly more rendering capability than other AI crawlers because it shares infrastructure with Googlebot, which does execute JavaScript. However, rendering is not guaranteed, and relying on it for AI visibility is a gamble. Server-side rendering remains the only reliable approach across all engines.
External Resources
Key Takeaways
- Test what AI sees by running curl on your pages - if your key content is not in the raw HTML, it is invisible to every AI crawler.
- Use server-side rendering or static generation so content is in the HTML response before JavaScript runs.
- Replace JavaScript accordions with the HTML <details> element - content stays in the source for crawlers while still collapsing for users.
- Defer non-essential scripts (analytics, chat widgets, social proof) so they do not compete with your content for parse time.
How does your site score on this criterion?
Get a free AEO audit and see where you stand across all 34 criteria.