Platform

AEO Website Research-grade Content Content Factory About Audits Rankings Pricing

Resources

Knowledge Base Research FAQ
AEO Scoring Criteria Criterion #419

Cross-Page Duplication: When Different Pages Say the Same Thing

Different URLs with substantially similar content confuse AI engines about which page to cite. Cross-Page Duplication measures content repetition across your site by comparing paragraphs from different pages. When AI encounters the same claims on multiple pages, it either picks one arbitrarily or skips your site entirely.

One of 48 criteria in AEO Rank, the citation-readiness score we run against every site we audit.

By Alex Shortov

medium effort medium impact

Quick Answer

Ensure each page makes unique claims with unique language. If two pages need to cover overlapping topics, differentiate them by audience, depth, or angle. The scorer compares paragraph shingles across blog posts and key pages, excluding site-wide boilerplate (CTAs, bios, footers appearing on 40%+ of pages). This criterion (3% weight, Answer Readiness pillar) catches content cannibalization at the paragraph level.

Audit Note

In our audits, we've measured Cross-Page Duplication: When Different Pages Say the Same Thing on live sites, we've compared implementations, and we've audited the gaps that keep scores low.

What is Cross-Page Duplication and how does it differ from within-page duplicate content?

Cross-page duplication compares paragraphs between different URLs, while duplicate content blocks check repetition inside one page, and both drag your score down.

How does the scorer detect cross-page duplication?

The scorer extracts paragraph shingles from your homepage, key pages, and recent blog posts, filters out site-wide boilerplate, and flags pairs with high similarity.

Does boilerplate content (CTAs, bios) count as cross-page duplication?

No, boilerplate like CTAs and author bios is excluded automatically when it appears on 40 percent or more pages, so only meaningful duplication counts.

Summarize This Article With AI

Open this article in your preferred AI engine for an instant summary and analysis.

Cross-Page Duplication Scoring
10/10 0 pairs All pages contain fully unique content
7-8/10 1-2 pairs Minor overlap between a couple of pages
4-6/10 3-5 pairs Moderate overlap - multiple pages share content
0-3/10 6+ pairs Severe - systematic content reuse across site
aeocontent.ai
Duplication severity across pages

What this article answers

  • What is Cross-Page Duplication and how does it differ from within-page duplicate content?
  • How does the scorer detect cross-page duplication?
  • Does boilerplate content (CTAs, bios) count as cross-page duplication?

Key takeaways

  • Cross-Page Duplication compares content BETWEEN different pages on your site, not within a single page.
  • The scorer automatically excludes site-wide boilerplate (paragraphs appearing on 40%+ of pages) like CTAs, author bios, and footer text.
  • Each duplicate pair is identified by URL and section heading, with similarity percentage and sample text in the audit report.
  • Severity scales with both the number of duplicate pairs and the proportion of affected content - 3+ duplicate pairs with high similarity scores poorly.
  • Common cause: copying introductory paragraphs across related blog posts, or using the same product description on multiple landing pages.

What Is Cross-Page Duplication?

Cross-page duplication detects shared paragraphs between different URLs using shingle-based Jaccard similarity, filtering out boilerplate that appears on more than 40% of pages.

Cross-Page Duplication detects content repetition between different pages on your site. While the Duplicate Content Blocks criterion checks for repetition within a single page, this criterion checks whether different URLs share substantially similar paragraphs.

The scorer collects paragraphs from your homepage, key pages, and a sample of blog posts, then compares them pairwise using shingle-based Jaccard similarity. Any paragraph pair from different pages that exceeds the similarity threshold is flagged. The key innovation is boilerplate detection: paragraphs that appear on more than 40% of pages (by shingle fingerprint) are classified as template elements (CTAs, author bios, footer content, disclaimer blocks) and excluded from the comparison. This means your standard “Subscribe to our newsletter” footer block does not trigger a penalty.

The remaining non-boilerplate duplicates represent genuine content reuse - paragraphs where two different pages make the same claims in the same language. This is a problem because it creates content cannibalization at the most granular level. AI engines encountering the same paragraph on two different URLs have to decide which one to cite, and the confusion often results in neither being selected.

How Does the Scorer Work?

The cross-page duplication scorer operates at the site level (not the page level) and follows these steps:

  1. Collect pages: Gather the homepage, key pages (about, services, products), and a sample of blog posts (typically the 5-10 most recently published).

  2. Extract paragraphs: From each page, extract meaningful text paragraphs (excluding navigation, headers, footers detected as boilerplate).

  3. Detect boilerplate: Calculate a fingerprint for each paragraph using its first 5 shingles. Paragraphs whose fingerprint appears on more than 40% of pages are classified as boilerplate and excluded.

  4. Pairwise comparison: Compare all non-boilerplate paragraphs from different pages using shingle Jaccard similarity. Pairs exceeding the threshold are flagged as duplicates.

  5. Score calculation: Based on the number of unique page pairs with duplicated content and the severity of duplication, calculate a 0-10 score. Zero or one duplicate pairs with low severity scores 8-10. Three to five pairs scores 4-6. Six or more pairs with high similarity scores 0-3.

If fewer than two pages are available for comparison (e.g., a single-page site), the scorer returns 5/10 with a “not enough pages” finding.

Cross-page duplication scoring uses paragraph fingerprints that grade each duplication tier differently.

Duplicate RatioScore ImpactCitation Outcome
Under 5%Full creditNo effect
5-15%Minor deductionEngine may pick wrong URL
15-30%Significant deductionAuthority fragments across copies
Over 30%Severe deductionPages compete with each other

How Do You Fix Cross-Page Duplication?

Start with the highest-similarity pairs, differentiate overlapping articles by angle or depth, rewrite shared introductions, and replace duplicate CTAs with section-specific call-to-action language.

Step 1: Review the audit evidence

The audit report shows each duplicate pair with: the two URLs involved, the duplicate paragraphs, and the similarity percentage. Start with the highest-similarity pairs.

Step 2: Differentiate overlapping pages

If two blog posts cover similar topics, differentiate them by angle, depth, or audience:

  • “Live Chat for E-Commerce” vs “Live Chat for SaaS” - same tool, different audience
  • “Live Chat Pricing Guide” vs “Live Chat Feature Comparison” - same tools, different focus
  • “Beginner’s Guide to Live Chat” vs “Advanced Live Chat Strategies” - same topic, different depth

Step 3: Rewrite shared introductions

The most common cross-page duplicate is the introductory paragraph. Many sites use the same “context-setting” opener across related posts. Write a unique opening for each post that ties directly to its specific angle.

Step 4: Link instead of repeating

If two pages need to reference the same concept, explain it fully on one page and link to it from the other:

<!-- Instead of repeating the explanation -->
<p>For a detailed breakdown of live chat pricing tiers,
see our <a href="/blog/live-chat-pricing">pricing guide</a>.</p>

Step 5: Audit template-injected content

Check whether your CMS injects the same content blocks (beyond CTAs) into multiple pages. If a “Why Choose Us” block appears in the body of 5 different pages, that is cross-page duplication that the boilerplate detector may not catch if it appears on fewer than 40% of total pages.

How AI Engines Evaluate This

ChatGPT picks one canonical source when paragraphs repeat across pages, Claude lowers domain trust on detected overlap, and Perplexity prefers differentiated sources from competing domains.

ChatGPT encounters cross-page duplication when it retrieves multiple pages from the same domain. When two pages contain the same paragraph, ChatGPT has to decide which page is the canonical source for that information. If neither page provides unique context around the shared paragraph, ChatGPT may cite neither and instead find a more differentiated source from another domain.

Claude builds entity models across multiple pages from the same site. When Claude detects substantial content overlap between pages, it reduces the overall trust score for the domain because repeated content suggests thin editorial process. Claude also uses cross-page uniqueness as a signal for topical authority - a site where every page makes distinct claims demonstrates broader expertise than a site where the same claims appear on multiple pages.

Perplexity assembles answers from passages across sources. Cross-page duplication means Perplexity encounters the same passage twice from the same domain, which wastes its source diversity budget. Perplexity prefers domains where each page contributes unique information to its answer.

Google AI Overviews deduplicates at the URL level before selecting sources. Pages that substantially duplicate content from other pages on the same site are less likely to be selected because Google has already captured that information from the canonical page.

External Resources

Key takeaways

  • Cross-Page Duplication compares content BETWEEN different pages on your site, not within a single page.
  • The scorer automatically excludes site-wide boilerplate (paragraphs appearing on 40%+ of pages) like CTAs, author bios, and footer text.
  • Each duplicate pair is identified by URL and section heading, with similarity percentage and sample text in the audit report.
  • Severity scales with both the number of duplicate pairs and the proportion of affected content - 3+ duplicate pairs with high similarity scores poorly.
  • Common cause: copying introductory paragraphs across related blog posts, or using the same product description on multiple landing pages.

Related FAQs

Content Strategy for AI
Technical Audit Criteria
The AEO Audit