We Ran a Real Test on What AI Crawlers Actually Read on Your Website, and the Results Were Eye-Opening
There is a lot of noise right now about AI optimization, schema markup, structured data, and a growing list of techniques being sold by agencies claiming they can get your business cited by AI systems like ChatGPT, Google AI Overviews, Perplexity, Gemini, and Bing Copilot. Most of it is speculation dressed up as strategy. We decided to stop guessing and run a controlled test to find out what AI crawlers actually use when retrieving and citing information about a website.
What we found confirmed our suspicions, and frankly, it should change how you think about AI optimization entirely.
The Test We Ran and How We Set It Up
At Marketing 1on1, we designed a structured experiment using live client websites. The goal was straightforward: we wanted to isolate variables and determine whether AI language models, when crawling and retrieving website information, rely on what is visibly rendered on the page, what is coded into structured data like schema markup, or something else entirely.
We made three specific changes to the website and then monitored how major AI systems, including ChatGPT, Google AI Overviews, Perplexity, and Gemini, responded to direct queries about that website over a period of more than two months.
Here is exactly what we changed:
- Operating hours were removed from visible page content but were added, fully and correctly, to the LocalBusiness schema markup only.
- The phone number anchor text was changed from the actual numeric phone number to generic text that simply read “call us,” while the underlying href still contained the number.
- Certain content was hidden using CSS with a
display: noneproperty to see whether AI crawlers would still pick it up.
These were deliberate, clean changes. Nothing else on the site was altered. We then queried AI systems with specific questions about those websites and documented the responses.
What the AI Systems Actually Said
Schema Markup Alone Was Completely Ignored
After more than two months with operating hours listed exclusively in LocalBusiness schema markup and absent from any visible page content, every major AI model we tested stated the website did not publish its operating hours. Not one model retrieved the hours from the schema. This is a significant finding. It strongly suggests that AI crawlers, at least the large language models powering today’s AI search features, are not meaningfully parsing or prioritizing structured data when generating citations.
We asked multiple AI systems variations of the question: “What are the operating hours for [the website]?” The responses were consistent and unanimous. The models said the website did not appear to include or display operating hours. Some were more direct, stating they could not find that information published on the site.
This is not a minor gap in AI capability. The schema was properly implemented, validated, and had been indexed. The information simply was not visible to a human reading the page, and apparently, it was not usable to an AI generating a response either.
“Schema markup may be doing a lot of things for rich snippets, but our test makes it clear it is not what AI language models are pulling from when they cite your business. If it is not on the page where a human can read it, the AI is not going to say it either.”
The Phone Number Test: Anchor Text Matters More Than the Data
When we replaced the visible phone number text with the words “call us,” AI models could no longer cite the actual phone number. Despite the number still existing in the anchor’s href attribute, every AI model we tested responded that no phone number was published on the website. One model stated explicitly: “No, there is no phone number published anywhere on the company website.” The number was technically present in the code. The AI did not use it.
This tells us something important about how large language models process web content. They appear to be working primarily from rendered, human-readable text, not from parsing raw HTML attributes. The href value of an anchor tag does not count as published information from the AI’s perspective. What the visitor sees is what the AI reads.
Hidden Content With display:none Was Actually Retrieved
This was the one result that surprised us slightly. Content we had hidden from visual display using display: none in CSS was, in fact, retrievable by AI systems. The models were able to cite that content and reference it in their responses.
This creates an interesting distinction. AI crawlers do appear to access page source or rendered DOM content beyond just what a typical human sees on screen, but they draw a clear line at structured data formats like schema markup and at HTML attributes that are not reflected in visible text.
The implication here is nuanced. Hiding content with CSS does not necessarily remove it from AI consideration, but putting critical information only into schema markup absolutely does appear to remove it from AI consideration. These are two very different scenarios, and conflating them leads to bad strategy.
What This Confirms About Schema Markup and AI Citations
Our findings align with an Ahrefs study we referenced in a previous analysis, which similarly found that schema markup did not improve AI rankings or increase the likelihood of LLM citations. Taken together, these two independent bodies of evidence point to the same conclusion: structured data markup is not a lever for AI citation optimization, at least not with the current generation of large language models powering AI search features.
This matters because a growing number of agencies are now selling “AI optimization” packages that center on schema implementation, JSON-LD enhancements, and other structured data configurations as the primary method of improving AI visibility. Our real-world test, run over more than two months on a live website, produced zero evidence that this approach works.
| What We Changed | Where the Information Existed | AI Systems Cited It? |
|---|---|---|
| Operating hours | Schema markup only (not visible on page) | No, across all models tested |
| Phone number | href attribute only (anchor text said “call us”) | No, models said no number existed |
| Hidden text content | display:none CSS (not visible to users) | Yes, models retrieved and cited it |
Why AI Crawlers Behave This Way
Understanding why AI systems prioritize visible content over structured data requires understanding how large language models are trained and how they retrieve web information. LLMs are trained predominantly on human-readable text, web pages as they appear to readers, articles, forums, documents, and similar content. Their understanding of the world is built from what humans write and read, not from machine-readable metadata.
When AI systems crawl a page to generate a real-time citation or answer, they are largely continuing that same pattern. They are looking for content that mirrors how information is actually communicated to humans. Schema markup is a communication layer built for traditional search engine crawlers, specifically for bots that parse metadata to power rich results in SERPs. It was never designed for language model comprehension, and current evidence strongly suggests language models are not using it as an authoritative data source.
The display: none finding adds an interesting layer. It suggests AI crawlers are working at the DOM level rather than purely from visual rendering, which is why hidden text is still accessible. But structured data schemas exist in a separate parsing context that LLMs do not appear to treat as equivalent to body content.
The Problem With Fake AI Optimization Services
Our test validates a concern we have had for a while. There is a segment of the SEO and digital marketing industry that has pivoted to selling “AI optimization” without any real evidence base. The pitch usually involves some combination of schema enhancements, JSON-LD tweaks, entity markup, and custom coding presented as proprietary methods for getting your business cited in AI Overviews or ranked in LLM responses.
Based on our testing, none of those technical approaches appear to influence AI citations in any meaningful way. The actual determinant of whether AI systems cite your website is far simpler and far less mystical: is the information clearly written and visible on your page?
“The businesses that will win in AI-driven search are not the ones with the most elaborate schema configurations. They are the ones with the clearest, most factually complete, and most human-readable content on their pages. That has always been the real answer.”
Paying a premium for schema-heavy AI optimization packages is not just wasteful, it may create a false sense of security while the actual content on your website remains incomplete or poorly structured for human readers.
What AI Crawlers Actually Want
Based on our findings, here is what appears to genuinely matter for AI citation and retrieval:
1. Visible, Plain-Text Information on the Page
If you want AI systems to know your phone number, publish it as readable text on the page. If you want them to cite your hours, write them out visibly on the page. Do not rely on schema markup as a substitute for real on-page content.
2. Anchor Text That Reflects the Actual Information
Our phone number test demonstrated that anchor text is what AI models read, not href values. If a link says “click here” or “call us,” the AI treats it as a navigational element, not as factual data. Descriptive, content-rich anchor text is more than an SEO best practice at this point; it is essential for AI readability.
3. Comprehensive, Factually Dense Page Content
AI systems cite pages that contain clear answers to specific questions. If you want to be cited for your business name, services, location, hours, specialties, or expertise, all of that information needs to be written out in plain, readable prose or clearly formatted content on the page itself.
4. Content Structure That Matches Query Patterns
Headers, clear paragraphs, and organized information help AI systems identify and extract relevant data. This is not about gaming the system; it is about writing content the way humans naturally seek information, which is also how AI systems are trained to find it.
5. Do Not Rely on Schema as a Content Replacement
Schema markup may still have value for traditional search rich results and for signaling entity relationships to Google’s knowledge graph. But it is not a substitute for on-page content when it comes to AI citation. Treat it as a supplement, never as a replacement.
Myths vs. Facts: AI Optimization Edition
| Myth | What Our Test Actually Found |
|---|---|
| Schema markup boosts AI citations | Schema-only information was not cited by any AI model tested |
| JSON-LD is read by AI language models | No evidence of LLMs pulling from JSON-LD structured data |
| Hidden content is invisible to AI | display:none content was retrieved and cited by AI models |
| The href in a link counts as published information | AI models ignored the phone number in an href when anchor text did not show the number |
| Custom AI optimization coding improves LLM visibility | No measurable effect observed; visible on-page content was the determining factor |
What This Means for Your Website Right Now
The practical takeaway from our study is that the fundamentals of good content still govern AI retrieval. Write your business information clearly and completely on your pages. Make sure every fact you want an AI system to know about your business is readable by a human visitor, because that appears to be the primary criterion AI crawlers use when deciding what is worth citing.
Audit your most important pages and ask: if someone read only the visible text on this page, would they have everything they need? Your address, hours, phone number, services, service area, expertise, and any other facts you want attributed to your business should all be present as readable content, not buried in markup.
Schema still has its place for traditional SEO purposes, and we are not suggesting you remove it. But stop treating it as an AI optimization tool, because based on everything we have tested and observed, it simply is not one.
Work With a Team That Tests Instead of Guesses
At Marketing 1on1, we do not sell theories. We run tests, document findings, and build strategies from real-world evidence. If you are investing in AI optimization or evaluating agencies claiming they can get your business cited in AI-driven search results, we can help you separate what is real from what is being fabricated for a quick sale.
Our approach to AI visibility, traditional SEO, and digital marketing is grounded in what we can actually observe and measure, not in trend-chasing or technical theater.
Frequently Asked Questions
Do AI crawlers read schema markup when generating citations?
Based on our testing over more than two months, AI language models such as ChatGPT, Google AI Overviews, Perplexity, and Gemini did not retrieve or cite information that existed only in schema markup and was absent from visible page content. Operating hours properly coded into LocalBusiness schema were consistently reported as missing by every AI model we queried. Current evidence strongly suggests that LLMs prioritize human-readable, visibly rendered content over structured data markup.
Will adding schema markup help my website get cited in AI Overviews or ChatGPT responses?
Our testing found no evidence that schema markup, including LocalBusiness schema with complete structured data, improved the likelihood of AI citation. This aligns with independent research from Ahrefs, which similarly found schema markup had no measurable effect on AI rankings or LLM citations. Schema may still benefit traditional Google search rich results, but it does not appear to influence AI language model retrieval.
Does hidden content using CSS display:none get crawled by AI systems?
Yes, in our test, content hidden using the CSS property display: none was successfully retrieved and cited by AI models. This suggests AI crawlers operate at the DOM level and can access content that is technically in the page source even when not visually rendered. However, this is distinct from schema markup, which AI models did not appear to use as a citation source.
What information do AI crawlers actually use to cite a website?
Based on our real-world study, AI crawlers primarily rely on visible, human-readable text content on the page. Anchor text is read as content, meaning the text of a link matters more to AI systems than the URL or phone number contained in the href. Information that is clearly written out on the page, formatted accessibly, and readable by a human visitor is what AI systems consistently used when generating citations in our tests.
Are AI optimization services that focus on schema and structured data worth paying for?
Our findings suggest they are not, at least not for the specific goal of improving AI citations or appearing in LLM-generated responses. Agencies selling AI optimization centered on schema enhancements, JSON-LD configurations, and custom structured data coding have not produced evidence that these methods work for AI visibility. Our controlled test found zero improvement in AI citations from schema-only information. The most effective AI optimization strategy remains producing complete, factually accurate, and clearly written on-page content.








