Cross-Page Duplication: When Different Pages Say the Same Thing
Different URLs with substantially similar content confuse AI engines about which page to cite. Cross-Page Duplication measures content repetition across your site by comparing paragraphs from different pages. When AI encounters the same claims on multiple pages, it either picks one arbitrarily or skips your site entirely.
Part of the AEO scoring framework - the current 48 criteria that measure how ready a website is for AI-driven search across ChatGPT, Claude, Perplexity, and Google AIO.
Quick Answer
Ensure each page makes unique claims with unique language. If two pages need to cover overlapping topics, differentiate them by audience, depth, or angle. The scorer compares paragraph shingles across blog posts and key pages, excluding site-wide boilerplate (CTAs, bios, footers appearing on 40%+ of pages). This criterion (3% weight, Answer Readiness pillar) catches content cannibalization at the paragraph level.
Audit Note
In our audits, we've measured Cross-Page Duplication: When Different Pages Say the Same Thing on live sites, we've compared implementations, and we've audited the...
What is Cross-Page Duplication and how does it differ from within-page duplicate content?
Cross-Page Duplication detects content repetition between different pages on your site.
How does the scorer detect cross-page duplication?
The cross-page duplication scorer operates at the site level (not the page level) and follows these steps: 1.
Does boilerplate content (CTAs, bios) count as cross-page duplication?
**Step 1: Review the audit evidence** The audit report shows each duplicate pair with: the two URLs involved,...
Summarize This Article With AI
Open this article in your preferred AI engine for an instant summary and analysis.
Duplication severity across pages
Before & After
Before - Same intro on multiple posts
<!-- /blog/live-chat-pricing --> <p>Live chat is essential for modern businesses. It helps improve customer satisfaction and reduce response times significantly.</p> <!-- /blog/live-chat-features --> <p>Live chat is essential for modern businesses. It helps improve customer satisfaction and reduce response times significantly.</p>
After - Unique intros per page
<!-- /blog/live-chat-pricing --> <p>LiveChat costs $20-69/agent/month. Intercom starts at $39/seat. Here is what you get at each price point and where the hidden fees are.</p> <!-- /blog/live-chat-features --> <p>The 5 features that separate good live chat from great: auto-routing, canned responses, visitor tracking, CRM sync, and analytics.</p>
What Is Cross-Page Duplication?
Cross-Page Duplication detects content repetition between different pages on your site. While the Duplicate Content Blocks criterion checks for repetition within a single page, this criterion checks whether different URLs share substantially similar paragraphs.
The scorer collects paragraphs from your homepage, key pages, and a sample of blog posts, then compares them pairwise using shingle-based Jaccard similarity. Any paragraph pair from different pages that exceeds the similarity threshold is flagged. The key innovation is boilerplate detection: paragraphs that appear on more than 40% of pages (by shingle fingerprint) are classified as template elements (CTAs, author bios, footer content, disclaimer blocks) and excluded from the comparison. This means your standard "Subscribe to our newsletter" footer block does not trigger a penalty.
The remaining non-boilerplate duplicates represent genuine content reuse - paragraphs where two different pages make the same claims in the same language. This is a problem because it creates content cannibalization at the most granular level. AI engines encountering the same paragraph on two different URLs have to decide which one to cite, and the confusion often results in neither being selected.
How Does the Scorer Work?
The cross-page duplication scorer operates at the site level (not the page level) and follows these steps:
1. Collect pages: Gather the homepage, key pages (about, services, products), and a sample of blog posts (typically the 5-10 most recently published).
2. Extract paragraphs: From each page, extract meaningful text paragraphs (excluding navigation, headers, footers detected as boilerplate).
3. Detect boilerplate: Calculate a fingerprint for each paragraph using its first 5 shingles. Paragraphs whose fingerprint appears on more than 40% of pages are classified as boilerplate and excluded.
4. Pairwise comparison: Compare all non-boilerplate paragraphs from different pages using shingle Jaccard similarity. Pairs exceeding the threshold are flagged as duplicates.
5. Score calculation: Based on the number of unique page pairs with duplicated content and the severity of duplication, calculate a 0-10 score. Zero or one duplicate pairs with low severity scores 8-10. Three to five pairs scores 4-6. Six or more pairs with high similarity scores 0-3.
If fewer than two pages are available for comparison (e.g., a single-page site), the scorer returns 5/10 with a "not enough pages" finding.
How Do You Fix Cross-Page Duplication?
Step 1: Review the audit evidence
The audit report shows each duplicate pair with: the two URLs involved, the duplicate paragraphs, and the similarity percentage. Start with the highest-similarity pairs.
Step 2: Differentiate overlapping pages
If two blog posts cover similar topics, differentiate them by angle, depth, or audience: - "Live Chat for E-Commerce" vs "Live Chat for SaaS" - same tool, different audience - "Live Chat Pricing Guide" vs "Live Chat Feature Comparison" - same tools, different focus - "Beginner's Guide to Live Chat" vs "Advanced Live Chat Strategies" - same topic, different depth
Step 3: Rewrite shared introductions
The most common cross-page duplicate is the introductory paragraph. Many sites use the same "context-setting" opener across related posts. Write a unique opening for each post that ties directly to its specific angle.
Step 4: Link instead of repeating
If two pages need to reference the same concept, explain it fully on one page and link to it from the other:
<!-- Instead of repeating the explanation -->
<p>For a detailed breakdown of live chat pricing tiers,
see our <a href="/blog/live-chat-pricing">pricing guide</a>.</p>
Step 5: Audit template-injected content
Check whether your CMS injects the same content blocks (beyond CTAs) into multiple pages. If a "Why Choose Us" block appears in the body of 5 different pages, that is cross-page duplication that the boilerplate detector may not catch if it appears on fewer than 40% of total pages.
How AI Engines Evaluate This
ChatGPT encounters cross-page duplication when it retrieves multiple pages from the same domain. When two pages contain the same paragraph, ChatGPT has to decide which page is the canonical source for that information. If neither page provides unique context around the shared paragraph, ChatGPT may cite neither and instead find a more differentiated source from another domain.
Claude builds entity models across multiple pages from the same site. When Claude detects substantial content overlap between pages, it reduces the overall trust score for the domain because repeated content suggests thin editorial process. Claude also uses cross-page uniqueness as a signal for topical authority - a site where every page makes distinct claims demonstrates broader expertise than a site where the same claims appear on multiple pages.
Perplexity assembles answers from passages across sources. Cross-page duplication means Perplexity encounters the same passage twice from the same domain, which wastes its source diversity budget. Perplexity prefers domains where each page contributes unique information to its answer.
Google AI Overviews deduplicates at the URL level before selecting sources. Pages that substantially duplicate content from other pages on the same site are less likely to be selected because Google has already captured that information from the canonical page.
External Resources
Key Takeaways
- Cross-Page Duplication compares content BETWEEN different pages on your site, not within a single page.
- The scorer automatically excludes site-wide boilerplate (paragraphs appearing on 40%+ of pages) like CTAs, author bios, and footer text.
- Each duplicate pair is identified by URL and section heading, with similarity percentage and sample text in the audit report.
- Severity scales with both the number of duplicate pairs and the proportion of affected content - 3+ duplicate pairs with high similarity scores poorly.
- Common cause: copying introductory paragraphs across related blog posts, or using the same product description on multiple landing pages.
How does your site score on this criterion?
Get a free AEO audit and see where you stand across all 34 criteria.