Image Context for AI: Your Pictures Are Worth Zero Words Without Markup
AI can't see your images. It reads the markup around them. A product photo with alt="image1.jpg" tells AI nothing. A figure with a descriptive figcaption and 8-word alt text tells AI exactly what it's looking at - and that context feeds directly into citation decisions.
Part of the AEO scoring framework - the current 48 criteria that measure how ready a website is for AI-driven search across ChatGPT, Claude, Perplexity, and Google AIO.
Quick Answer
Wrap images in figure elements with descriptive figcaption text. Write alt text longer than 5 words that describes the image content specifically - not "photo" or "image" or a filename. Place images inside article or section elements so AI knows the context. This criterion carries 1% weight in the Technical Foundation pillar, but it's one of the easiest fixes in the entire scoring framework.
Audit Note
In our audits, we've measured Image Context for AI: Your Pictures Are Worth Zero Words Without Markup on live sites, we've compared implementations, and...
How do I write image alt text that AI engines actually use?
AI crawlers don't process images the way humans do.
What is the figure and figcaption element and why does AI care?
The Image Context criterion evaluates three specific signals.
Does image context affect my AEO score even though AI can't see images?
**1.
Summarize This Article With AI
Open this article in your preferred AI engine for an instant summary and analysis.
Before & After
Before - No image context for AI
<div class="blog-post"> <img src="chart.png" alt="chart"> <p>Our response times improved by 40% after switching to live agents.</p> </div>
After - Full semantic image context
<article>
<figure>
<img src="chart.png"
alt="HelpSquad response time chart showing
40% improvement over 6 months">
<figcaption>Average response times dropped
from 45 seconds to 27 seconds after
HelpSquad deployed live agents.</figcaption>
</figure>
</article>Why Can't AI See Your Images?
AI crawlers don't process images the way humans do. GPTBot, ClaudeBot, and PerplexityBot fetch your HTML and extract text. They don't render JPEGs. They don't analyze PNGs. They don't interpret charts or screenshots.
What AI does read is every piece of text associated with your images: alt attributes, figcaption elements, surrounding paragraphs, and the semantic context provided by parent elements. That text is the only way AI knows what your images contain.
A product comparison chart with alt="chart" tells AI nothing. The same chart with alt="Live chat response time comparison showing HelpSquad at 28 seconds vs industry average of 47 seconds" and a figcaption stating "Figure 1: HelpSquad average response times outperform the industry benchmark by 40%" - that tells AI a specific, citable fact.
Image Context for AI carries 1% weight in the Technical Foundation pillar. It's the lightest criterion in the scoring framework. But it's also one of the easiest to fix, and the time investment is minimal compared to content restructuring or schema implementation. Across our 500+ audits, most sites score 2-3/10 on this criterion simply because they've never thought about image markup from AI's perspective.
What Does the Scorer Check?
The Image Context criterion evaluates three specific signals.
Figure + figcaption on 50%+ of content images. The scorer checks whether images within your content areas (article, section, main elements) are wrapped in semantic figure elements with accompanying figcaption text. The figure element tells AI "this image is a meaningful content element, not a decorative graphic." The figcaption provides the textual description AI actually processes. Sites that wrap at least half their content images in figure/figcaption pairs score well on this signal.
Alt text longer than 5 words and not generic. The scorer reads every alt attribute and checks two things: length (must exceed 5 words) and specificity (must not contain generic terms like "image," "photo," "picture," "screenshot," "logo," or filenames like "IMG_4832.jpg"). Alt text of "HelpSquad agent dashboard showing active conversation metrics" passes both checks. Alt text of "dashboard screenshot" fails both.
Contextual placement. Images placed inside article, section, or main elements get credit for contextual association. Images floating in generic divs outside semantic containers lose this signal because AI can't associate the image context with specific content. A chart inside an article about response times inherits the article's topic. A chart in a random div inherits nothing.
Sites that pass all three signals score 8-10/10. Sites with generic alt text and no figure elements score 1-3/10. The most common pattern we see in audits is CMS-generated alt text that defaults to the filename - "blog-post-header-image-3.webp" - which scores zero on both length and specificity.
How Do You Add Proper Image Context?
1. Wrap content images in figure/figcaption
Every image that conveys information (charts, screenshots, product photos, diagrams) should be in a figure element. Decorative images (backgrounds, spacers, icons) don't need this treatment.
```html <!-- Before: image with no semantic context --> <img src="comparison.png" alt="comparison">
<!-- After: full semantic image context --> <figure> <img src="comparison.png" alt="Side-by-side comparison of chatbot vs live agent satisfaction scores across 500 customer interactions"> <figcaption>Customer satisfaction scores: live agents averaged 4.6/5 compared to 3.1/5 for chatbot-only support across 500 interactions.</figcaption> </figure> ```
2. Write descriptive alt text
Alt text should answer the question: "What specific information does this image convey?" Not "what type of file is this" - what does it show.
Bad: alt="graph"
Bad: alt="company photo"
Bad: alt="IMG_20260315_142233.jpg"
Good: alt="Monthly AEO score trend showing 34-point improvement from March to September 2026"
Good: alt="HelpSquad team of 12 live chat agents at their Philadelphia office"
The 5-word minimum is a floor, not a ceiling. 8-12 words is the sweet spot. Longer than 15 words starts to feel like a paragraph rather than an attribute.
3. Place images inside semantic containers
Make sure your content images live inside article, section, or main elements. Most modern CMS platforms handle this correctly, but custom themes and landing page builders often wrap everything in generic divs.
4. Handle CMS defaults
WordPress, Shopify, and Webflow all generate default alt text from filenames. Override these manually or use a plugin that prompts for descriptive alt text during upload.
Start here: check 5 images on your site right now. View source and search for their alt attributes. If any say "image," a filename, or are empty - those are your quick wins.
The Figcaption Advantage
Alt text is required. Figcaption is the differentiator.
Here's why figcaption matters more than most developers think. Alt text is a technical accessibility attribute - screen readers use it, and AI crawlers parse it. But figcaption is visible content that appears directly below the image on the rendered page. It's treated as regular body text by AI extraction pipelines.
This means figcaption text can be cited directly. When ChatGPT is looking for a specific data point about customer satisfaction scores, a figcaption that states "Live agents achieved 4.6/5 satisfaction vs 3.1/5 for chatbot-only support (n=500)" is a perfectly extractable sentence. Alt text serves the same function but is less likely to be cited verbatim because AI engines treat it as metadata rather than content.
The combination of figure + figcaption + descriptive alt creates a three-layer context signal: - Alt text: machine-readable image description (what the image contains) - Figcaption: human-readable content (what the image means) - Figure element: semantic wrapper (this image is meaningful content)
Sites that use all three layers score maximum points. Sites that use only alt text get partial credit. Sites that use neither are invisible for image-associated content.
One more nuance: figcaption text should not duplicate alt text. The alt describes the image. The figcaption explains what the image means in context. "Bar chart showing response times" (alt) vs "Average response times dropped 40% after deploying HelpSquad's live agent model" (figcaption) - complementary, not redundant.
Score Impact in Practice
Image Context for AI carries 1% weight in the Technical Foundation pillar - the lightest individual criterion in the entire scoring framework. A perfect score on this criterion adds approximately 1 point to your overall AEO Site Rank. A zero adds nothing.
So why does this article exist? Because it's the single easiest criterion to fix. Adding figure/figcaption wrappers and writing descriptive alt text requires no content strategy, no schema knowledge, no technical infrastructure changes. A content editor can fix 20 images in an hour. That hour buys you a free point.
Across our audits, the average score on this criterion is 2.4/10. That's the lowest average of any criterion in the framework - not because it's hard, but because nobody thinks about it. Most CMS platforms default to filename-based alt text and wrap images in bare img tags. The result is that nearly every site leaves this point on the table.
In a competitive vertical where scores cluster between 55 and 65, one point can move you from "below average" to "at average" in your benchmark category. We've seen sites where fixing image context was the tiebreaker that pushed them above a competitor in our visibility rankings.
The effort-to-impact ratio makes this criterion the best starting point for teams new to AEO optimization. Fix your images first, learn the pattern of thinking about content from AI's perspective, then move on to heavier criteria like Topic Coherence and Original Data.
How AI Engines Evaluate This
AI engines process images entirely through their associated text signals. None of them perform visual analysis during the crawling phase - the image file itself is irrelevant to citation decisions.
ChatGPT reads alt text as part of its page parsing. When a user asks a question that an image caption could answer (e.g., "What is the average response time for HelpSquad?"), ChatGPT can extract the answer from a figcaption or descriptive alt attribute just as readily as from a paragraph. The key requirement is that the alt text or figcaption contains specific, factual information - not just a generic description. ChatGPT treats "alt=chart" as noise and skips it. ChatGPT treats "alt=response time comparison chart showing 28-second average for HelpSquad" as a data source.
Claude gives extra weight to figcaption elements because they're visible content that the site intentionally placed near the image. Claude's extraction pipeline treats figcaptions as first-class content - equivalent to a paragraph - while treating alt text as supplementary metadata. A well-written figcaption with a specific fact and a number can be cited by Claude as a standalone data point.
Perplexity processes images indirectly through the alt and figcaption text. Because Perplexity assembles answers from extracted sentences, a figcaption that reads as a complete, factual sentence is an ideal extraction target. "HelpSquad's live agents reduced average response time from 47 seconds to 28 seconds" in a figcaption is just as extractable as the same sentence in a paragraph.
Google AI Overviews uses image context signals from alt text and surrounding content to associate images with topics. Pages with well-described images are more likely to appear in AI Overviews with image thumbnails - a visual advantage that increases click-through rates even when the text content is comparable to competitors.
External Resources
Key Takeaways
- Wrap 50% or more of your images in figure elements with descriptive figcaption text - this is the strongest image context signal for AI.
- Write alt text that is at least 6 words long and describes the specific content of the image, not a generic label.
- Place images inside article or section elements so AI engines can associate the image context with the surrounding content.
- Never use filenames, generic terms like "photo" or "image," or empty alt attributes on content images.
How does your site score on this criterion?
Get a free AEO audit and see where you stand across all 34 criteria.