Schema Coverage Ratio
Your homepage has perfect JSON-LD. Your other 200 pages? Zero. Here's how we measure the gap -and why AI engines judge your whole domain by it.
Part of the AEO scoring framework - the current 48 criteria that measure how ready a website is for AI-driven search across ChatGPT, Claude, Perplexity, and Google AIO.
Quick Answer
Schema coverage ratio is the percentage of your indexed pages carrying relevant JSON-LD markup. Above 80% means you're solid site-wide. Below 40% means most of your pages are invisible to structured data consumers -no matter how perfect your homepage is.
Audit Note
In our audits, we've measured Schema Coverage Ratio on live sites, we've compared implementations, and we've audited the gaps that keep scores low.
What percentage of my pages need schema markup for AI visibility?
Schema coverage ratio answers two questions at once: what fraction of your crawlable pages have any JSON-LD at...
How do I check if my site has JSON-LD on every page?
Here's the mistake we see on almost every audit: beautiful structured data on the homepage, and nothing -zero...
Does schema coverage affect how AI engines rank my domain?
The automated audit performs a full-site crawl starting from the XML sitemap and following internal links up to...
Summarize This Article With AI
Open this article in your preferred AI engine for an instant summary and analysis.
Before & After
Before - Only homepage has JSON-LD
<!-- Homepage: perfect schema -->
<script type="application/ld+json">
{ "@type": "Organization", "name": "Acme" }
</script>
<!-- /blog/post-1: nothing -->
<!-- /products/widget: nothing -->After - Every template has correct schema
<!-- /blog/post-1 -->
<script type="application/ld+json">
{ "@type": "Article", "headline": "...", "datePublished": "..." }
</script>
<!-- /products/widget -->
<script type="application/ld+json">
{ "@type": "Product", "name": "Widget", "offers": { "@type": "Offer", "price": "29.99" } }
</script>What Does Schema Coverage Ratio Measure?
Schema coverage ratio answers two questions at once: what fraction of your crawlable pages have any JSON-LD at all, and how many of those use the *right* schema type for the page context?
The audit crawls every indexable URL from your sitemap and renders the source to extract all <script type="application/ld+json"> blocks. Each block gets parsed and validated against the Schema.org vocabulary. Pages land in one of four buckets: correct and relevant schema, wrong schema type for the page (Organization on a blog post instead of Article -we see this constantly), malformed JSON-LD, or no schema at all.
The final ratio: (pages with correct, relevant schema) / (total crawlable pages) × 100. We also track a secondary metric -"schema type accuracy" -the percentage of schema-bearing pages where the primary @type actually matches the content. A product page should carry Product schema, not just a generic WebPage.
Beyond the top-level number, we measure schema depth. A Product page with just name and description scores lower than one with price, availability, brand, aggregateRating, and image. The bare minimum isn't enough when your competitors are filling in every recommended property.
Why Isn't One Perfect Homepage Enough?
Here's the mistake we see on almost every audit: beautiful structured data on the homepage, and nothing -zero -on the other 200 pages. AI engines and search crawlers evaluate your site holistically. When 90% of your pages lack schema, the domain-level signal is weak regardless of how polished that homepage markup is.
Google's Rich Results eligibility is per-page. Every product page without Product schema is a missed rich snippet. Every article without Article schema loses the chance for a featured result with author and date info. At scale, this compounds fast -a 200-page e-commerce site with schema on only 15 pages has a 7.5% coverage ratio. That's 92.5% of the catalog invisible to structured data consumers.
AI answer engines like Perplexity and ChatGPT increasingly use structured data to validate facts before citing a page. When they find proper schema confirming the content is an Article published on a specific date by a specific author, they treat that page as more trustworthy than an identically-worded page with none. Site-wide coverage means no matter which page an AI engine lands on, it gets machine-readable context.
There's an entity problem too. If your Organization schema appears on 10 pages but is absent from 190, crawlers may not associate all your pages with the same entity. Uniform coverage reinforces that every page belongs to one authoritative publisher.
How Is Schema Coverage Checked?
The automated audit performs a full-site crawl starting from the XML sitemap and following internal links up to a configurable depth (default: 3 levels). For each URL, the crawler fetches raw HTML and extracts all JSON-LD script blocks.
Each block goes through three validation stages. First, JSON syntax -unclosed braces, trailing commas, encoding errors. Second, Schema.org vocabulary -does the @type exist in the hierarchy, are required properties present? Third, contextual relevance -a page with pricing in the body should have Product schema, not just WebPage.
The audit generates a per-page report card: green (correct schema), yellow (schema present but wrong type or missing required properties), orange (malformed JSON-LD), red (no schema). These roll up into the site-wide coverage ratio.
Here's where it gets powerful: pages are grouped by template type when detectable. All /blog/* pages, all /products/* pages. This reveals template-level gaps -your blog template might be missing Article schema entirely, which means every blog post inherits that gap. Template-level findings are the highest-priority fixes because patching one template improves coverage for hundreds of pages simultaneously.
The crawler respects robots.txt directives and Crawl-delay settings. Pages returning 4xx or 5xx are excluded from the denominator but flagged separately as potential sitemap issues.
How Is Schema Coverage Scored?
Schema coverage uses a tiered rubric based on the percentage of pages with correct, contextually relevant JSON-LD:
Coverage ratio tiers: - 90-100%: Score 10/10 -Exceptional. Nearly every page has correct schema. - 80-89%: Score 8/10 -Strong. Minor gaps, typically on edge-case pages. - 60-79%: Score 6/10 -Moderate. Significant template gaps exist. - 40-59%: Score 4/10 -Weak. More pages lack schema than have it. - 20-39%: Score 2/10 -Poor. Schema isolated to a handful of pages. - 0-19%: Score 1/10 -Minimal. No site-wide schema strategy.
Deductions for quality issues: - -1 point if more than 10% of schema blocks have JSON syntax errors - -1 point if the dominant type is WebPage when more specific types apply - -0.5 points if Organization schema is inconsistent across pages (different names, URLs, or addresses) - -0.5 points if required properties are missing on more than 25% of typed schemas
Bonus for depth: - +0.5 points if average schema depth (recommended properties present / total recommended) exceeds 70% - +0.5 points if the site uses @graph to combine multiple related schemas per page
Maximum: 10. Minimum: 0 for sites with zero JSON-LD anywhere.
Score Impact in Practice
Sites scoring 8+ on schema coverage ratio share a common trait: they implemented structured data at the template level, not the page level. A B2B SaaS company with Article schema on every blog post template, Product schema on their pricing page, and Organization schema consistently across all pages will see scores in the 8-9 range even with 200+ pages. The template approach means every new page inherits correct schema automatically.
Sites scoring 2-3 typically have one of two patterns. Either schema exists only on the homepage (Organization and WebSite types) with zero coverage on inner pages, or a CMS plugin generates generic WebPage schema everywhere regardless of content type. Both patterns produce the same result - AI engines get no meaningful structured context on the vast majority of the domain's pages.
The gap between 4 and 8 is almost always a template problem. A site with 150 blog posts and zero Article schema on the blog template sits at 30-40% coverage. One template fix pushes it to 85%+. That single change can shift the score by 4-5 points and immediately improve how AI engines contextualize every blog post on the domain.
Common Mistakes
The most frequent mistake is using WebPage as a catch-all type. WebPage is the generic fallback - it tells AI engines nothing about what kind of content the page actually contains. Product pages should use Product, blog posts should use Article or BlogPosting, FAQ pages should use FAQPage. Generic WebPage dilutes the structured data signal and wastes the opportunity to give AI engines specific context.
Second, inconsistent Organization schema across pages. We see sites where the Organization name is "Acme" on the homepage, "Acme, Inc." on the about page, and "Acme Corp" in the footer schema. AI engines trying to resolve the publishing entity get three different signals. Pick one canonical name and use it everywhere.
Third, orphaned schema - JSON-LD blocks that reference properties pointing to non-existent URLs or contain placeholder values. Schema with "description": "Lorem ipsum" or "image": "example.com/placeholder.jpg" actively hurts credibility. AI engines parsing this data encounter invalid references and may discount the entire domain's structured data as unreliable.
Fourth, schema depth neglect. Having Product schema with only name and description is barely better than having no schema. AI engines weigh the completeness of recommended properties - price, availability, brand, aggregateRating - when deciding how much to trust and cite the data.
How AI Engines Evaluate This
ChatGPT uses structured data primarily for entity resolution and fact verification. When it encounters a page with complete Article schema including author, datePublished, and publisher properties, it can verify the content's provenance before citing it. Pages without this context require the model to infer authorship and recency from surrounding text - a less reliable process that reduces citation confidence.
Perplexity treats JSON-LD as a high-trust data source for building structured responses. When answering product comparison queries, Perplexity preferentially extracts data from pages with Product schema containing price, availability, and ratings. It displays this data in formatted comparison tables with direct citations. Pages without schema are still crawled but contribute less structured information to these responses.
Claude's retrieval system evaluates schema as a quality signal at the domain level. Consistent, correct schema across a high percentage of pages indicates a well-maintained site with reliable content. Domains with fragmented or absent schema receive lower trust scores in retrieval ranking, making their pages less likely to surface for factual queries where multiple competing sources exist.
Google AI Overviews rely heavily on structured data for featured snippets and knowledge panels. Schema coverage directly affects which pages qualify for these enhanced displays. A domain with 90%+ coverage has significantly more entry points into AI Overview results than one at 30%.
Resources
Key Takeaways
- Aim for 80%+ of indexable pages carrying the correct JSON-LD type for their content.
- Fix schema at the template level - one template change can cover hundreds of pages at once.
- Use the right @type per page context (Product for products, Article for articles) - generic WebPage dilutes the signal.
- Schema depth matters - fill in recommended properties like price, availability, and aggregateRating, not just name and description.
How does your site score on this criterion?
Get a free AEO audit and see where you stand across all 34 criteria.