Fact Density Measurement
AI engines are citation machines -they need specific facts to quote. A page full of general advice with zero data points gives them nothing to work with.
Part of the AEO scoring framework - the current 48 criteria that measure how ready a website is for AI-driven search across ChatGPT, Claude, Perplexity, and Google AIO.
Quick Answer
Fact density counts verifiable claims per 1,000 words -named statistics, specific numbers, dated references, attributable statements. In our testing, pages above 5 claims per 1,000 words are significantly more likely to get cited by AI answer engines. Below 2? You're furniture.
Audit Note
In our audits, we've measured Fact Density Measurement on live sites, we've compared implementations, and we've audited the gaps that keep scores low.
How many facts per page do I need for AI engines to cite my content?
Fact density is an objective metric: the number of verifiable, specific claims per 1,000 words.
What counts as a verifiable claim that AI systems can extract?
AI citation behavior follows a power law -a small number of high-fact-density pages generate the majority of citations.
Why does fact-sparse content get ignored by ChatGPT and Perplexity?
The audit uses NLP to identify and count factual claims on each content page.
Summarize This Article With AI
Open this article in your preferred AI engine for an instant summary and analysis.
Before & After
Before - Vague, fact-sparse paragraph
Live chat software can significantly improve customer satisfaction. Many companies have seen great results after implementing it. Experts agree this is an important tool for modern businesses.
After - Data-rich, citable paragraph
Live chat reduces average response time to 42 seconds compared to 17 hours for email (Zendesk 2025 CX Report). Companies using live chat see a 23% increase in customer satisfaction and 38% higher conversion rates on pages where the widget is active.
What Does Fact Density Measure?
Fact density is an objective metric: the number of verifiable, specific claims per 1,000 words. Not "content quality" -that's subjective. We're counting statements containing specific, checkable data points: named statistics with numerical values, dated events, attributed quotations, specific measurements, named entities with factual assertions, and citation references to external sources.
We distinguish between high-value and low-value facts. High-value: "Customer satisfaction increased by 23% after implementing live chat" or "The protocol was ratified in March 2024 by 47 member states." Low-value: "many companies have seen improvements" or "experts agree this is important." Only high-value facts count.
The measurement runs per page, then aggregates across the site. Site-level metrics include: mean fact density across all content pages, median (to resist skew from outlier pages), percentage below the minimum threshold (2 facts per 1,000 words), and percentage above the high-performance threshold (5 per 1,000).
Here's why this matters: AI answer engines are citation machines. They need specific facts to cite. A page full of general advice gives the AI nothing quotable. A page dense with named statistics, specific numbers, and dated references gives the AI multiple citation-worthy fragments to extract. We've seen this pattern repeat across every audit we've run.
Why Do Fact-Sparse Domains Get Ignored by AI?
AI citation behavior follows a power law -a small number of high-fact-density pages generate the majority of citations. But the distribution across your site affects domain-level authority. When AI engines observe that a domain consistently produces content with high fact density, they develop a trust bias toward that domain for factual queries -even on pages they haven't crawled yet.
The flip side is brutal. A site where 90% of content is generic and fact-sparse trains AI engines to expect low-value content from that domain. Even if you publish one exceptional, data-rich article, the domain's reputation for thin content drags it down. The AI ranks it below a competitor whose entire catalog maintains higher density.
The competitive dynamics are straightforward. Two articles covering the same topic. One has 8 specific statistics with sources. The other has general advice without numbers. AI engines cite the data-rich article. At the site level, a domain where every article has 3-5 specific facts per 1,000 words creates a systematic advantage over competitors publishing fluffy, fact-sparse content.
This is especially critical for YMYL topics -healthcare (solace-health, patientadvocate-org), finance, legal -where AI engines apply heightened scrutiny to factual claims. For these topics, fact density combined with proper source attribution is often the deciding factor in whether an AI engine cites you or skips you entirely.
How Is Fact Density Checked?
The audit uses NLP to identify and count factual claims on each content page. Three stages: text extraction, fact detection, fact classification.
In text extraction, we strip HTML to get the body text, excluding navigation, footer, sidebar, and boilerplate. We use the <main> or <article> element as the content boundary when available, falling back to heuristic extraction for pages without semantic HTML. Word count on extracted text establishes the denominator.
In fact detection, pattern matching and NLP heuristics identify potential claims. The patterns: numerical values with context (percentages, dollar amounts, counts, measurements), date references (specific years, months, named time periods), named entity assertions (claims about specific people, companies, organizations), comparative claims with endpoints ("X is 3x faster than Y"), and source attributions ("according to [source]," "a [year] study by [org] found").
In fact classification, each detected claim is labeled high-value or low-value. High-value facts are specific and verifiable -enough detail that a reader could check them. Low-value facts are vague generalizations ("many studies show," "most experts agree") lacking the specificity needed for citation. Only high-value facts hit the final score.
The output: per-page density scores plus a site-wide distribution. Pages are bucketed into four tiers -data-rich (above 5 per 1,000 words), adequate (3-5), thin (1-3), and empty (below 1). A healthy site has fewer than 20% of pages in "thin" or "empty."
How Is Fact Density Scored?
Fact density scoring combines per-page and site-wide metrics:
Site-wide distribution (6 points): - 80%+ of content pages have density above 3 per 1,000 words: 6/6 points - 60-79% above 3: 5/6 points - 40-59% above 3: 4/6 points - 20-39% above 3: 2/6 points - Below 20% above 3: 1/6 points - No pages with detectable factual claims: 0/6 points
High-density page percentage (2 points): - 30%+ of content pages exceed 5 per 1,000 words: 2/2 points - 15-29%: 1.5/2 points - 5-14%: 1/2 points - Below 5%: 0/2 points
Fact attribution quality (2 points): - 50%+ of facts include source attribution (named study, linked reference, quoted expert): 2/2 points - 25-49% attributed: 1.5/2 points - 10-24% attributed: 1/2 points - Below 10% attributed (facts stated without sources): 0.5/2 points - No facts detected: 0/2 points
Deductions: - -1 point if more than 20% of claims are recycled from other pages on the same site (internal duplication) - -0.5 points if fact-dense pages lack Article schema with datePublished (facts without dates lose credibility)
Maximum: 10. Content-marketing sites with rigorous editorial standards typically score 6-8. Sites publishing primarily opinion without data typically score 1-3.
Score Impact in Practice
Sites scoring 8+ on fact density are typically industry analysts, research organizations, or businesses that integrate proprietary data into their content strategy. A healthcare company publishing articles with specific patient outcome statistics, named study references, and dated regulatory citations will consistently produce pages above 5 facts per 1,000 words. Their domain builds a reputation as a data-rich source that AI engines prefer for factual queries.
Sites scoring 2-3 follow a recognizable pattern: marketing-oriented content heavy on benefits language ("improve your results," "boost your performance") but light on verifiable claims. These pages read well to humans but give AI engines nothing specific to cite. When ChatGPT encounters a query like "What percentage of companies use live chat?" it needs a page with a specific number and source attribution. Pages saying "many companies use live chat" provide zero citation value.
The jump from 3 to 6 requires a content editing pass, not a rewrite. For each existing article, identify the 3-5 vaguest claims and replace them with specific data points. "Customer satisfaction improves significantly" becomes "Customer satisfaction increased 23% within 90 days of implementation (Forrester, 2025)." This targeted editing approach can shift an entire content library's fact density without producing new articles.
Common Mistakes
Recycling the same statistics across multiple pages is the most common fact density pitfall. A company's blog uses the same 3 industry statistics in every article - "73% of customers prefer live chat" appears on 15 different pages. AI engines detect this internal duplication and discount the recycled facts. Each page needs its own relevant data points, not the same figures repurposed everywhere.
Citing outdated statistics without acknowledging their age undermines credibility. A 2019 study referenced as current data in a 2026 article gets flagged by AI engines that cross-reference publication dates. Always include the year of the study or data source. If the data is more than 3 years old, acknowledge it explicitly and note whether more recent data exists.
Presenting opinions as facts inflates the raw count without improving the score. Statements like "AEO is the most important marketing strategy" or "every business needs live chat" are evaluative claims, not verifiable facts. The scoring system classifies these as low-value and excludes them from the density calculation. Only specific, checkable claims with numbers, dates, named entities, or source attributions count.
Clustering all facts in one section while leaving the rest of the article fact-sparse creates an uneven distribution. AI engines may extract content from any section of the page. An article with 10 facts in the introduction and zero in the remaining 2,000 words has pockets of density but is overall thin. Distribute data points throughout the article so every section provides citable material.
How AI Engines Evaluate This
ChatGPT extracts specific claims from retrieved content to construct its responses. When answering "How much does live chat reduce response times?" it scans retrieved pages for numerical claims with context - "42 seconds average response time" or "83% reduction compared to email." Pages with higher fact density give ChatGPT more extraction candidates, increasing the probability that your specific data point appears in the response with a citation back to your page.
Perplexity builds its cited answers by assembling facts from multiple sources. It preferentially selects pages where facts are clearly stated, attributed to sources, and formatted for easy extraction. A page where facts appear in clean sentences ("According to Zendesk's 2025 CX Report, live chat response times average 42 seconds") gets cited more frequently than one where the same data is buried in a narrative paragraph.
Claude evaluates fact density as a content quality indicator during retrieval ranking. Pages with consistent factual claims supported by source attributions receive higher relevance scores for informational queries. For YMYL topics especially, Claude's retrieval system weights attributed facts heavily - a health-related page citing specific studies and clinical outcomes outranks one making general wellness claims.
Google AI Overviews use fact extraction to populate the structured portions of their responses. Pages with high fact density and proper schema markup provide the raw material for AI Overview boxes, statistics callouts, and comparison tables. Low fact density pages may be consulted for context but rarely contribute specific data points to the visible AI Overview output.
Resources
Key Takeaways
- Target at least 5 verifiable claims per 1,000 words - named statistics, specific numbers, dated references.
- Attribute facts to sources ("a 2025 study by Forrester found...") to boost both credibility and citation likelihood.
- Replace vague language ("many companies see improvements") with specific data points ("23% reduction in response time").
- Maintain high fact density across your whole domain, not just one standout article - AI engines evaluate domains holistically.
How does your site score on this criterion?
Get a free AEO audit and see where you stand across all 34 criteria.