Skip to main content
Geo Fundamentals

Technical Health for AI Visibility: What AI Crawlers Actually Check

May 18, 20268 min read1,704 words
Anthony (Tony) Velte, Founder & Principal of LocalStar Digital

Anthony (Tony) Velte

Founder & Principal · Author of 12+ books

Technical health for AI visibility means your site is fast, reliable, server-rendered, and crawlable enough that ChatGPT, Perplexity, Claude, and Google AI Overviews can actually fetch and parse your content within their tight timeout windows. It is the 15% of our SignalScore methodology that, if you get it wrong, quietly nullifies every other optimization you have made.

Technical Health is not the most heavily weighted dimension in our methodology — Citability and Brand Authority both carry more weight in determining whether AI engines recommend you. But Technical Health is the dimension that gates everything else. If an AI crawler cannot fetch your page, render it, and parse it in the few seconds it allows itself, none of your beautifully structured content matters. Your page is, for AI search purposes, invisible.

Why AI Crawlers Are Different From Googlebot

For most of the last decade, "technical SEO" meant optimizing for one crawler — Googlebot — which got progressively more sophisticated. Googlebot today executes JavaScript, renders pages in a Chromium-based environment, and is patient enough to wait for client-side hydration to complete. Google's own guidance acknowledges that JavaScript-heavy sites can be indexed, though they recommend server-side rendering or static pre-rendering for reliability (see Google Search Central's JavaScript SEO basics).

AI crawlers are a different animal. The major ones — OpenAI's GPTBot, Anthropic's ClaudeBot, Perplexity's PerplexityBot, Common Crawl's CCBot, and Google's own Google-Extended — were built primarily for one job: ingesting text at scale to inform AI responses. They are optimized for throughput, not patience. Most of them do not execute JavaScript at all. They fetch your HTML, read what is there, and move on. If your content is rendered by client-side React after the initial HTML loads, those crawlers see a blank shell.

Even the AI crawlers that do execute some JavaScript tend to operate on tighter render budgets than Googlebot. The operators do not publish exact timeout windows, but the practical pattern we observe in audit work is consistent: AI crawlers favor pages that return their main content quickly and reliably, and a page that only finishes rendering its primary content after several seconds under load is a fragile source for them. Slow, render-heavy pages are the ones most likely to be fetched as an empty or partial shell.

The 7 Technical Checks AI Crawlers Actually Do

When an AI crawler hits your site, these are the checks that determine whether your content makes it into the AI knowledge graph. Failing any one of them meaningfully reduces your AI visibility:

  • HTTPS — sites served over HTTP are increasingly skipped or downranked by AI crawlers as a basic trust signal. A valid SSL certificate is table stakes.
  • Server-rendered or static HTML — the main content of the page must be present in the initial HTML response, not injected by client-side JavaScript after page load. SSR (server-side rendering) or SSG (static site generation) both satisfy this.
  • A fast initial response — Time to First Byte (TTFB) plus initial HTML transfer should land in the low hundreds of milliseconds, not several seconds. A common, defensible target is a TTFB under 800ms, which web.dev recommends as a guide for good Largest Contentful Paint. Slow shared hosting and unoptimized backend code routinely fail here.
  • Core Web Vitals in the "good" range — LCP, CLS, and INP targets (covered below). AI engines increasingly use these as proxy signals for site quality, the same way Google does for ranking.
  • Valid, parseable HTML — broken tags, unclosed elements, and malformed markup confuse parsers. AI crawlers are less forgiving than browsers about HTML errors.
  • Accessible robots.txt that does not block AI crawlers — many sites unintentionally block GPTBot or ClaudeBot by inheriting an overly aggressive robots.txt. Check yours.
  • No JavaScript-only critical content — if your headlines, body copy, prices, or contact information only appear after JS execution, most AI crawlers will not see them. This is the single most common failure mode we encounter in audits.

These seven checks are why Technical Health carries weight in SignalScore. They are also the reason a site can have brilliant content and excellent third-party authority and still be invisible to AI search — because the crawler never successfully ingested any of it.

Core Web Vitals: What Each One Measures and What to Hit

Core Web Vitals are Google's standardized metrics for real-world page experience, and they have become the de facto industry benchmark — including for AI crawler heuristics. The current vitals and their thresholds are documented at web.dev/vitals (the canonical source maintained by Google's Chrome team). Three metrics matter most:

Largest Contentful Paint (LCP) — under 2.5 seconds

LCP measures how long it takes for the largest visible content element — usually a hero image, headline block, or main content area — to render. The target is under 2.5 seconds for "good," with 2.5 to 4 seconds flagged as "needs improvement" and over 4 seconds as "poor." LCP is the strongest single proxy for perceived page speed, and it is the vital that most directly affects whether an AI crawler waits long enough to see your content.

Cumulative Layout Shift (CLS) — under 0.1

CLS measures how much the page layout shifts unexpectedly during load — the annoying experience of a button moving just as you go to tap it, or content jumping down because an ad finally loaded. The target is a CLS score under 0.1 for "good." High CLS scores signal poor engineering discipline around image dimensions, font loading, and dynamic content injection — all of which can also indicate parsing problems for AI crawlers.

Interaction to Next Paint (INP) — under 200 milliseconds

INP, which replaced First Input Delay (FID) as a Core Web Vital in March 2024, measures how responsively your page reacts to user interactions like taps, clicks, and keypresses. The target is under 200 milliseconds for "good," with 200 to 500 milliseconds as "needs improvement" and over 500 milliseconds as "poor." INP is less directly relevant to crawlers than LCP or CLS, but it is part of the composite signal AI engines use to assess overall site quality.

These thresholds come directly from Google's Chrome team and are the same thresholds Google uses for Search ranking signals. There is no separate "AI crawler" standard for Core Web Vitals — the same targets apply.

The "JS-Blocking" Trap — The Single Most Common Failure

In our audit work, the most common technical failure we find is also the most consequential: client-side-only React, Vue, or Angular applications where the main content does not exist in the initial HTML. When you view-source on these sites, you see a roughly empty page with a single `<div id="root">` and a script tag. The actual headlines, body copy, product information, contact details — all of it is rendered into that div after the JavaScript bundle downloads, parses, and executes.

Browsers handle this fine. Googlebot, with its patient render budget, mostly handles it. But GPTBot, ClaudeBot, and PerplexityBot largely do not. They fetch the initial HTML, see an empty shell, and move on. The site does not get indexed for AI search regardless of how excellent the content actually is — because, from the AI engine's perspective, the content does not exist.

The fix is structural: either server-side render the site (Next.js, Remix, Nuxt, SvelteKit all support this), statically generate it at build time, or pre-render it through a service that produces real HTML. There is no shortcut and no clever middleware that solves this without doing one of those things properly.

Our perspective on this comes from decades of enterprise systems work: if the foundation is wrong, no amount of polish on top fixes it. AI visibility works the same way. Beautiful content on a client-side-only React app is invisible content. The technical foundation has to come first — which is why we treat it as the gating dimension, not the finishing touch.

How LocalStar's Own Technical Foundation Works

We run our own site — and every custom-build client site — on Next.js 15 with server-side rendering by default, deployed to Vercel's edge network. Every page is either pre-rendered at build time or rendered on the server at request time. AI crawlers fetching <code>localstardigital.com</code> get fully populated HTML on the first byte. We monitor Core Web Vitals continuously and treat any regression as a build-breaking issue rather than a "fix it later" item.

This is not a vanity setup. It is the reason our own site scores strongly on the technical dimensions — when we audited localstardigital.com with our own methodology, Technical scored 100/100 on the quick diagnostic and 88/100 on the full GEO audit (we published the full results, including the dimensions where we still have work to do). It is the same architectural standard we apply to the custom-build websites we deliver for Strategic-tier clients. The methodology we measure clients against is the methodology we hold ourselves to — see SignalScore methodology for full dimension weightings, and /services/web-design for how we apply this architecture to client builds.

Technical Health is the 15% you cannot skip. Get it right and the other 85% of your SignalScore — Citability, Content Quality, Schema, AI Crawler Access, and Brand Authority — actually has something to land on. Get it wrong, and the rest is theater.

Want a full SignalScore audit that includes a Technical Health subscore against the seven checks above? Every LocalStar engagement starts with one. Book a strategy call and we will walk you through where your site stands and exactly what to fix first.

Frequently Asked Questions

Most of the major AI crawlers either do not execute JavaScript at all or execute it with much tighter timeouts and less rendering patience than Googlebot. GPTBot, ClaudeBot, and Common Crawl's CCBot historically fetch and parse HTML only. PerplexityBot and Google-Extended have more rendering capability but still favor server-rendered content. The safe assumption is: if your critical content is not in the initial HTML response, it is at risk of being invisible to AI search.

If your React app uses Next.js, Remix, or another framework with proper server-side rendering or static generation, and you can confirm the main content is present when you view-source on a deployed page, you are in good shape on the JS-blocking dimension. The check is empirical: load your page, view source, search for a unique headline or body sentence. If it is in the raw HTML, AI crawlers can see it. If you only see a `<div id="root">` and a script tag, you have a problem.

Core Web Vitals are not directly weighted by AI engines the way they are by Google Search, but they correlate strongly with the things AI crawlers do care about: response time, render stability, and the general engineering quality of the site. A site with poor LCP and CLS almost always has other technical problems that affect AI crawlability. We treat the web.dev thresholds (LCP under 2.5s, CLS under 0.1, INP under 200ms) as the floor, not the ceiling.

Technical Health carries a 15% weight in our methodology — meaningful but not the largest dimension. Citability (25%) and Brand Authority (20%) carry more weight because they have a bigger direct effect on whether AI engines actually recommend you. But Technical Health is the gating dimension: if you fail it badly, the higher-weighted dimensions cannot fire. It is the prerequisite, not the prize. See the SignalScore methodology page for the full dimension breakdown and our background for why we built it this way.

Ready to improve your AI visibility?

Book a strategy call. We will audit your search and AI presence and recommend a plan tailored to your business.