Fact Density Measurement
AI engines are citation machines -they need specific facts to quote. A page full of general advice with zero data points gives them nothing to work with.
Questions this article answers
- ?How many facts per page do I need for AI engines to cite my content?
- ?What counts as a verifiable claim that AI systems can extract?
- ?Why does fact-sparse content get ignored by ChatGPT and Perplexity?
Summarize This Article With AI
Open this article in your preferred AI engine for an instant summary and analysis.
Quick Answer
Fact density counts verifiable claims per 1,000 words -named statistics, specific numbers, dated references, attributable statements. In our testing, pages above 5 claims per 1,000 words are significantly more likely to get cited by AI answer engines. Below 2? You're furniture.
Before & After
Before - Vague, fact-sparse paragraph
Live chat software can significantly improve customer satisfaction. Many companies have seen great results after implementing it. Experts agree this is an important tool for modern businesses.
After - Data-rich, citable paragraph
Live chat reduces average response time to 42 seconds compared to 17 hours for email (Zendesk 2025 CX Report). Companies using live chat see a 23% increase in customer satisfaction and 38% higher conversion rates on pages where the widget is active.
What This Actually Measures
Fact density is an objective metric: the number of verifiable, specific claims per 1,000 words. Not "content quality" -that's subjective. We're counting statements containing specific, checkable data points: named statistics with numerical values, dated events, attributed quotations, specific measurements, named entities with factual assertions, and citation references to external sources.
We distinguish between high-value and low-value facts. High-value: "Customer satisfaction increased by 23% after implementing live chat" or "The protocol was ratified in March 2024 by 47 member states." Low-value: "many companies have seen improvements" or "experts agree this is important." Only high-value facts count.
The measurement runs per page, then aggregates across the site. Site-level metrics include: mean fact density across all content pages, median (to resist skew from outlier pages), percentage below the minimum threshold (2 facts per 1,000 words), and percentage above the high-performance threshold (5 per 1,000).
Here's why this matters: AI answer engines are citation machines. They need specific facts to cite. A page full of general advice gives the AI nothing quotable. A page dense with named statistics, specific numbers, and dated references gives the AI multiple citation-worthy fragments to extract. We've seen this pattern repeat across every audit we've run.
Why Fact-Sparse Domains Train AI to Ignore You
AI citation behavior follows a power law -a small number of high-fact-density pages generate the majority of citations. But the distribution across your site affects domain-level authority. When AI engines observe that a domain consistently produces content with high fact density, they develop a trust bias toward that domain for factual queries -even on pages they haven't crawled yet.
The flip side is brutal. A site where 90% of content is generic and fact-sparse trains AI engines to expect low-value content from that domain. Even if you publish one exceptional, data-rich article, the domain's reputation for thin content drags it down. The AI ranks it below a competitor whose entire catalog maintains higher density.
The competitive dynamics are straightforward. Two articles covering the same topic. One has 8 specific statistics with sources. The other has general advice without numbers. AI engines cite the data-rich article. At the site level, a domain where every article has 3-5 specific facts per 1,000 words creates a systematic advantage over competitors publishing fluffy, fact-sparse content.
This is especially critical for YMYL topics -healthcare (solace-health, patientadvocate-org), finance, legal -where AI engines apply heightened scrutiny to factual claims. For these topics, fact density combined with proper source attribution is often the deciding factor in whether an AI engine cites you or skips you entirely.
How We Check This
The audit uses NLP to identify and count factual claims on each content page. Three stages: text extraction, fact detection, fact classification.
In text extraction, we strip HTML to get the body text, excluding navigation, footer, sidebar, and boilerplate. We use the <main> or <article> element as the content boundary when available, falling back to heuristic extraction for pages without semantic HTML. Word count on extracted text establishes the denominator.
In fact detection, pattern matching and NLP heuristics identify potential claims. The patterns: numerical values with context (percentages, dollar amounts, counts, measurements), date references (specific years, months, named time periods), named entity assertions (claims about specific people, companies, organizations), comparative claims with endpoints ("X is 3x faster than Y"), and source attributions ("according to [source]," "a [year] study by [org] found").
In fact classification, each detected claim is labeled high-value or low-value. High-value facts are specific and verifiable -enough detail that a reader could check them. Low-value facts are vague generalizations ("many studies show," "most experts agree") lacking the specificity needed for citation. Only high-value facts hit the final score.
The output: per-page density scores plus a site-wide distribution. Pages are bucketed into four tiers -data-rich (above 5 per 1,000 words), adequate (3-5), thin (1-3), and empty (below 1). A healthy site has fewer than 20% of pages in "thin" or "empty."
How We Score It
Fact density scoring combines per-page and site-wide metrics:
Site-wide distribution (6 points): - 80%+ of content pages have density above 3 per 1,000 words: 6/6 points - 60-79% above 3: 5/6 points - 40-59% above 3: 4/6 points - 20-39% above 3: 2/6 points - Below 20% above 3: 1/6 points - No pages with detectable factual claims: 0/6 points
High-density page percentage (2 points): - 30%+ of content pages exceed 5 per 1,000 words: 2/2 points - 15-29%: 1.5/2 points - 5-14%: 1/2 points - Below 5%: 0/2 points
Fact attribution quality (2 points): - 50%+ of facts include source attribution (named study, linked reference, quoted expert): 2/2 points - 25-49% attributed: 1.5/2 points - 10-24% attributed: 1/2 points - Below 10% attributed (facts stated without sources): 0.5/2 points - No facts detected: 0/2 points
Deductions: - -1 point if more than 20% of claims are recycled from other pages on the same site (internal duplication) - -0.5 points if fact-dense pages lack Article schema with datePublished (facts without dates lose credibility)
Maximum: 10. Content-marketing sites with rigorous editorial standards typically score 6-8. Sites publishing primarily opinion without data typically score 1-3.
Resources
Key Takeaways
- Target at least 5 verifiable claims per 1,000 words - named statistics, specific numbers, dated references.
- Attribute facts to sources ("a 2025 study by Forrester found...") to boost both credibility and citation likelihood.
- Replace vague language ("many companies see improvements") with specific data points ("23% reduction in response time").
- Maintain high fact density across your whole domain, not just one standout article - AI engines evaluate domains holistically.
How does your site score on this criterion?
Get a free AEO audit and see where you stand across all 10 criteria.