How Google’s AI Search Architecture Actually Works: Fan-Outs, AI Overviews, and the Future of SEO

0
23

How Google’s AI Search Architecture Actually Works: Fan-Outs, AI Overviews, and the Future of SEO

The Pulse:

  • Google processes thousands of AI-driven changes to Search per year, all tracked and evaluated through side-by-side experiments with human raters – a rigorous architecture that predates generative AI by over a decade.
  • Fan-out queries – Google’s mechanism of forking a single user prompt into multiple parallel sub-queries and recombining results – are the core retrieval engine behind AI Overviews, driving the observable trend of growing average query lengths across the platform.
  • Nikola Todorovic, Director of Software Engineering at Google Search, explicitly states that bulk AI content generation “is not going to provide a ton of value” – while Google engineers saved 20-30 minutes of manual code tracing in seconds using an internal AI tool, the tool did not determine whether the architectural trade-off was correct.

TL;DR: Google Search has embedded AI into its ranking architecture for over 12 years, starting with isolated convolutional neural network deployments in Safe Search. AI Overviews and AI Mode extend that foundation using fan-out query retrieval and large language model synthesis – not a replacement of the existing stack, but a layer stamped on top of it. Site owners who produce genuine human expertise will continue earning citations in both systems; those relying on undifferentiated AI content generation at scale will not.

Fan-Out Retrieval Explained

AI Overviews fork a single query into parallel sub-queries, retrieve results independently, then synthesize them into one answer – the ranking stack underneath remains unchanged.

AI Has Been in Search for 12+ Years

Safe Search deployed isolated convolutional neural networks roughly 12 years ago – Google’s first production AI module in Search, long before generative AI arrived.

AI Mode vs. AI Overviews

AI Overviews is an isolated feature stamped onto the existing stack. AI Mode is Search’s full conversational platform – multi-turn, larger infrastructure, still citation-linked.

Bulk Generation Loses Citations

Google’s own Director of Search Engineering warns that multiplying content cheaply through AI will not earn visibility – human experience and genuine expertise remain the citation signal.

Queries Are Getting Longer

Users now submit vague, multi-detail prompts where they once typed two-word keywords – fan-out architecture handles the disambiguation they no longer have to do themselves.

The core friction here is precision versus scale: AI makes content production cheaper by an order of magnitude, yet Google’s internal architecture is simultaneously becoming better at detecting whether a source adds information that cannot be extracted from a spec sheet or a manufacturer’s box. That tension – between the temptation to automate volume and the requirement to demonstrate irreplaceable human judgment – is the defining challenge for every authority building strategy in 2025 and beyond.

What follows is a deep technical and strategic analysis of how Google’s AI search architecture actually works, drawn from Nikola Todorovic’s detailed account of the internal experiment process, the fan-out retrieval mechanism, and the architectural distinction between AI Overviews and AI Mode – with precise implications for site owners, SEO professionals, and content teams who need to remain visible as queries grow longer and more complex.

Google Runs Thousands of AI-Driven Search Changes Per Year – Here Is the Architecture Behind Them

Google processes thousands of AI-driven changes to Search annually, each one evaluated through rigorous side-by-side experiments and human rater review before launch. Safe Search, deployed roughly 12 years ago using convolutional neural networks, became one of the first isolated AI/ML systems Google could safely embed in production Search-a breakthrough that proved AI could work reliably at scale without destabilizing the core ranking infrastructure. The launch-review process itself has remained consistent for years: engineers build experimental versions, run them against production using random user queries, collect human rater scores via published guidelines, then present results to decision-making leads who may send experiments back for refinement if they detect bad loss patterns despite overall statistical gains.

In my work analyzing how search engines evolve, I’ve found that most site owners misunderstand the pace and rigor of Google’s internal innovation. They assume changes happen ad-hoc or that AI features are bolted on hastily. The reality is the opposite. Nikola Todorovic, Director of Software Engineering at Google Search, explained that Google runs this standardized experiment-and-validation pipeline on every change-whether it’s a minor ranking tweak or a major AI feature. The infrastructure at Google enables teams to prototype new versions quickly, run them against the baseline in parallel, and measure outcomes using human judgment aligned with published rater guidelines. This is not intuition-driven product management; it’s empirical, repeatable, and defensive. The process catches bad patterns early. If an experiment shows overall improvement but contains pockets of severe loss-say, a particular query category performs much worse-the launch review gate stops the rollout. Engineers iterate, fix the pattern, and resubmit. Only then does it go live.

What makes this architecture resilient is isolation. Safe Search did not integrate directly into the main ranking stack. Instead, it operated as a standalone signal-processing images, videos, and text to determine explicitness levels, then feeding that signal back into ranking decisions. Todorovic noted that convolutional neural networks were already outperforming humans at image understanding 12 years ago, which made Safe Search an ideal first deployment: the model’s decisions were auditable, the failure modes were contained, and if problems emerged, engineers could iterate on the neural network itself without destabilizing the entire search system. That isolation principle carried forward. BERT and MUM-transformer-based systems Google announced publicly-were built as new signals layered on top of the existing ranking infrastructure, not replacements for it. Each one added capability without removing the old-school retrieval and ranking components underneath.

The Conventional Approach The Yacov Avrahamov Perspective (Based on Google’s Actual Architecture)
AI features are experimental, launched quickly based on promising prototypes. Every change-AI or not-goes through side-by-side experiments, human rater review using published guidelines, and a launch-review gate where decision-makers check for bad loss patterns before rollout.
AI systems are integrated directly into the ranking stack to maximize impact. High-risk AI systems are isolated (like Safe Search) or layered on top (like BERT, MUM) so they can be debugged, iterated, or removed without breaking core search functionality.
Google launches thousands of changes per year because new technology is available. Google launches thousands of changes per year because each one is measured against a baseline and validated by human raters. Technology availability is a starting point, not a reason to ship.
AI models in search are black boxes; engineers accept their outputs and move on. Engineers must understand the signals their AI systems use, the failure modes, and the trade-offs. Debuggability and observability are non-negotiable before production deployment.

The historical context matters. Todorovic emphasized that AI in Search predates the generative AI wave by years. Transformers-the architecture underlying ChatGPT, Claude, and all modern language models-were deployed in Search long before the public knew what a transformer was. Google announced BERT and MUM publicly, but internally, the organization had already learned how to layer new AI signals on top of the ranking system without breaking it. That experience proved invaluable when AI Overviews and AI Mode arrived. The infrastructure for running experiments, measuring human satisfaction, and isolating new features was already mature. The team did not have to invent a new launch process; they applied the same rigor they had been using for thousands of changes per year.

The Real Takeaway: Google’s ability to ship thousands of changes annually without degrading search quality rests on a proven architecture: isolated or layered AI systems, side-by-side experiments with random user queries, human rater validation, and a launch-review gate that catches bad patterns. Site owners who understand this process recognize that AI features are not experimental sideshows-they are the product of years of infrastructure investment and rigorous validation. Content that earns visibility in AI-powered search must meet the same standard: genuine value, human expertise, and measurable user satisfaction.

Fan-Out Queries: The Retrieval Mechanism Powering AI Overviews and Longer Search Prompts

Fan-out queries are parallel sub-queries that Google identifies from a single user prompt, retrieves results for each in parallel, and recombines into one synthesized answer. This mechanism is the architectural foundation enabling AI Overviews to handle longer, more conversational search prompts-and it directly explains why average query length is growing as users discover Search can answer increasingly complex questions. Unlike traditional keyword matching, fan-out retrieval lets a single vague or multi-faceted prompt spawn multiple targeted retrievals that feed into a language model synthesis layer, making AI Overviews an isolated feature that sits on top of Google’s existing ranking stack without replacing it.

When Nikola Todorovic introduced the term “fan-out” during our conversation, he was describing a retrieval innovation that has become essential to understanding how AI Overviews work under the hood. Here’s the mechanism: you submit a search query-say, “based on dietary restrictions which restaurants would you recommend for lunch in Zurich?”-and Google’s system doesn’t just treat that as a single retrieval instruction. Instead, it identifies additional sub-queries embedded in your original prompt. It might fork a retrieval for “vegetarian restaurants Zurich,” another for “restaurants with dietary accommodations,” another for “lunch spots near me,” and potentially several others. All of these retrievals happen in parallel. Once the results come back from the ranking system, AI Overviews combines an intelligent selection of those results-snippets, titles, additional page context-and synthesizes them into a coherent summary that addresses your original, more complex question. The user never sees the sub-queries; they see one unified answer drawn from multiple sources.

The architectural isolation of AI Overviews is critical to understanding why this works at scale. Todorovic emphasized that “the whole retrieval system, the whole ranking system is the old school” and that “AI Overviews is a feature that stamps on top of this and operates on its own in this isolated space.” This means Google did not rebuild its core ranking infrastructure to accommodate generative AI. Instead, the ranking stack-which has been refined over decades and undergoes thousands of changes per year-continues to operate exactly as it did before. AI Overviews runs as a separate feature layer that consumes the output of that proven ranking system, synthesizes it using language models, and presents the result. This isolation provides two critical benefits: it allows Google to measure the impact of AI Overviews independently from ranking changes, and it means that if an AI Overview experiment shows poor results, the fix doesn’t require touching the core ranking pipeline. The ranking system remains the source of truth for relevance and quality; the language model layer adds synthesis and conversational capability on top.

The growth in average query length is a direct consequence of users discovering what fan-out queries and AI Overviews make possible. Todorovic observed that users are “uncovering that Search can actually answer more complex questions” and that “the average query length is growing.” The evolution is stark: a decade ago, a typical search was “restaurant vegetarian Zurich”-keyword-based, short, specific. That evolved into “vegetarian restaurants in Zurich”-still keyword-focused but more natural. Now, users are typing queries like “based on dietary restrictions which restaurants would you recommend for lunch in Zurich?” or asking open-ended questions such as “What is the physical effect that makes water glow when there’s radiation there?” without knowing the technical term (Cherenkov radiation, in this case). These longer, vague, or multi-detail prompts would have been unsearchable five years ago because the ranking system alone cannot synthesize across multiple interpretations of an ambiguous query. Fan-out queries solve this by allowing the system to explore multiple interpretations in parallel and the language model to weave them into a coherent answer. Users are lengthening their queries because they now expect Search to understand intent even when they cannot articulate it precisely-and fan-out retrieval is what makes that expectation valid.

The Real Mechanism: Fan-out queries represent a fundamental shift in how retrieval feeds synthesis-users are no longer constrained to formulating questions the ranking system can answer directly; they can ask vague, complex, or multi-faceted questions because AI Overviews will decompose them into parallel retrievals and reassemble the results into value.

AI Mode vs. AI Overviews: What the Architectural Difference Means for Expert Content and Authority Building

AI Mode is Google Search’s multi-turn conversational platform designed to compete directly with ChatGPT and Gemini, while AI Overviews remain an isolated feature layer stamped on top of the existing retrieval and ranking stack. Both systems still surface linked citations and fan-out queries – neither is a closed language model operating in isolation. The architectural difference matters because it determines scale, infrastructure footprint, and most critically, the types of content that earn visibility and citations in each system. Only genuinely expert, human-experience-driven content survives in both.

The distinction between AI Mode and AI Overviews is not semantic – it reflects fundamental differences in how Google has architected these systems. AI Overviews function as what Nikola Todorovic, Director of Software Engineering at Google Search, describes as a feature that “stamps on top” of the existing ranking infrastructure. The retrieval system and ranking stack remain, in his words, “old school” – the same mechanisms that have powered Google Search for years. AI Overviews then layer on top, combining an interesting selection of retrieved results and synthesizing them into a summary using language models. This isolation is deliberate. It allows Google to contain the complexity, iterate on the AI layer independently, and maintain the stability of the core ranking system beneath it. AI Overviews can fail, be rolled back, or be refined without destabilizing the entire Search product.

AI Mode, by contrast, represents a different architectural commitment. Todorovic explains that AI Mode “has a kind of a bigger well, like the infrastructure is new and like all the it has kind of bigger ownership or like it’s no longer an isolation of it. It’s like the AI mode is kind of it runs on search, but it’s also has like a bigger platform for its own.” AI Mode still uses fan-out queries and still surfaces linked results and citations – it is not a closed LLM divorced from Search infrastructure. But it operates with its own platform layer, its own ownership structure, and its own infrastructure footprint. This is Google’s answer to the conversational AI market dominated by ChatGPT and Gemini. Users can transition from AI Overviews into AI Mode when they want a longer, multi-turn conversation, or they can enter AI Mode directly. The user journey is fluid, but the technical architecture underneath is distinctly different.

The implication for content creators and site owners is severe and non-negotiable: both systems will only surface and cite content that provides genuine value. Todorovic is explicit on this point. When asked about AI content generation, he states: “it’s not going to provide a ton of value.” The warning is not about volume or velocity – it is about the fundamental absence of human expertise. AI Overviews and AI Mode both operate on retrieved results. They synthesize, they summarize, they connect dots. But they cannot manufacture expertise they do not see in the source material. This is where the joystick anecdote becomes instructive. Martin, the host, recounts a moment when he asked a shop assistant what “force feedback” meant – the assistant replied, “Oh, that means that this joystick has force feedback.” The response was circular, empty, and useless. Todorovic and Martin both recognize this pattern in modern web content: articles that simply restate spec sheets, rephrase manufacturer claims, or wrap existing information in slightly different words. AI Overviews will bypass this content entirely. It has no value to synthesize. The language model can see immediately that the source is not adding insight – it is merely echoing what already exists elsewhere or what is already visible on the product packaging.

This is where legitimate AI tools enter the picture, and Todorovic is clear on the distinction. He uses NotebookLM, Google’s internal AI tool, to understand complex documentation quickly. He describes it as “a fascinating tool uh that can like in a couple minutes explain a complicated thing.” This is an acceptable use case: AI as a productivity lever for understanding dense material faster, not as a content multiplication engine. In his own engineering work, Todorovic and his team recently used an internal AI coding tool to trace how image-size data flows through 20-30 layers of abstraction in the codebase. The tool answered the question in seconds; without it, the team would have spent 20-30 minutes manually navigating those layers. The AI did not make the architectural decision – it did not determine whether the trade-off was correct or optimal. It accelerated the discovery process. That distinction is critical. AI is a force multiplier for expert work, not a replacement for expertise itself.

The Real Implication: Sites that produce undifferentiated AI-generated content will lose visibility in both AI Mode and AI Overviews; those that use AI tools to deepen and accelerate expert research, while maintaining human judgment and experience-based insight, will earn disproportionate citations as query complexity increases.

What Site Owners and SEO Professionals Must Do Now to Thrive in AI-Powered Search

The core principle is simple: continue providing genuine value to your users. Site owners who focus on building products, platforms, and content that solve real problems will remain visible in AI Overviews and AI Mode. Those treating AI as a bulk content generation engine will not. The historical pattern-from newspapers to radio to television to the internet-shows that every media transition rewards value-providers and punishes commodity creators. AI search is the next iteration of that same pattern.

Director of Software Engineering at Google Search, “site owners need to continue making sure their products and websites are providing value to the user.” This is not a new mandate. It is the constant. What changes is the medium through which that value reaches users. When you sell a product, users will continue coming to you if you provide value. When you operate a restaurant, users will visit if your menu and service merit the trip. In an AI-centric system, the mechanism shifts-queries become longer, more conversational, more vague-but the underlying truth does not: value attracts traffic, whether through direct visits or through Google.

The acceptable use of AI tools differs sharply from the problematic use. Todorovic explicitly warns against the approach many SEO professionals are considering: “it’s not going to provide a ton of value” if you simply multiply content through bulk AI generation because it is cheap and easy. This approach creates what Martin, the Search Off the Record host, calls the “force feedback problem”-a reference to a shop assistant who, when asked what force feedback meant, simply replied, “this joystick has force feedback.” The assistant provided zero context, no insight, no expertise. They restated the label without explaining the mechanism or the user experience. AI language models will do exactly this at scale. They will repackage manufacturer spec sheets, restate what is already visible on product boxes, and generate plausible-sounding explanations that contain no human judgment. Google’s ranking systems and AI Overviews will recognize this pattern and deprioritize it.

The legitimate applications of AI for content creators and site owners fall into three categories. First, use AI to improve the mechanics of writing: grammar, style, readability, tone. An AI tool can tighten prose without replacing the author’s expertise or judgment. Second, use AI for data analysis and competitive research-ask it to surface patterns in your market, summarize competitor strategies, or organize information you have already gathered. Todorovic himself uses an internal Google AI tool called NotebookLM to understand complex documentation quickly; the tool does not decide what the documentation means or whether the architectural decisions were correct. That remains a human responsibility. Third, use AI for coding and technical problem-solving, where it can trace through layers of abstraction in seconds. In one concrete example from Google’s own engineering practice, engineers discovered a code method that traced image-size data provenance through 20-30 layers of abstraction. Using an internal AI tool, they asked, “Where does this information actually come from?” The system identified the source in seconds-work that would have required 20-30 minutes of manual investigation. The AI tool accelerated the discovery. It did not decide whether the architectural trade-off was correct; that judgment remained with the engineers.

The historical analogy Todorovic draws is essential: newspapers, radio, television, and the internet all required value-providers to adapt or lose their audience. Newspapers did not disappear when radio arrived; they did not disappear when television arrived. They transformed. Radio stations did not vanish when the internet launched. They evolved. In each transition, commodity producers-those offering only repackaged wire service copy, only syndicated content, only what competitors also offered-lost relevance. Those offering original reporting, unique perspective, and human expertise survived and often thrived. The same principle governs AI search. Site owners who position themselves as experts-who share tested experiences, who explain trade-offs, who provide judgment that an AI system cannot replicate-will earn citations in both AI Overviews and AI Mode. Those offering only rewritten spec sheets and bulk-generated product descriptions will not.

Mastering AI tools is therefore not optional. Todorovic’s recommendation to all SEO professionals and site owners is direct: “continue providing value, but then do not neglect the new technology and make sure you use it in the best possible way for you.” This is a dual mandate. The first part-providing value-is unchanging. The second part-mastering the tools-is new. An SEO professional who refuses to learn how AI can accelerate research, improve writing, or analyze data will be outpaced by one who does. But the professional who uses AI to generate 500 articles per month, each indistinguishable from the others, will lose ground to one who uses AI to enhance 50 carefully researched, expert-driven articles per month. The difference is not in the volume of AI use; it is in whether AI amplifies human expertise or replaces it.

The Real Takeaway: Site owners who combine genuine user value with strategic AI tool adoption will capture disproportionate traffic as queries grow longer and more complex; those attempting bulk content generation will face algorithmic invisibility in both AI Overviews and AI Mode.

Frequently Asked Questions

What is the difference between BERT, MUM, and the transformer-based systems Google deployed before generative AI arrived?

BERT and MUM were both built on transformer architecture and deployed inside Google Search as isolated signal layers, not as replacements for the core ranking stack. Director of Software Engineering at Google Search, these systems functioned as additional signals on top of existing infrastructure – improving relevance and query understanding without rewriting the retrieval engine. The transformer architecture that underpins BERT and MUM is the same foundational technology that eventually enabled generative AI models. Google was publicly open about both deployments, announcing them as incremental quality improvements rather than architectural overhauls. The key distinction from today’s AI Overviews is that BERT and MUM operated entirely within the ranking layer, while AI Overviews operate as a separate feature stamped on top of retrieval and ranking outputs.

How does Google’s launch-review process handle experiments that show overall improvement but contain specific bad loss patterns?

An experiment that shows net improvement in side-by-side human rater evaluations does not automatically pass the launch gate. Todorovic explains that decision-making leads scrutinize not just aggregate statistics but specific loss patterns within the experiment. If a subset of queries degrades meaningfully – even when the overall score improves – engineers are sent back to isolate and fix those patterns before the change ships. This prevents a statistically favorable experiment from introducing a concentrated harm to a specific query type, topic, or user segment. The process is deliberately conservative: a good overall number is a necessary condition for launch, but it is not a sufficient one.

Can a site owner transition users from AI Overviews directly into AI Mode, and how does that flow work?

Yes. Todorovic confirms that users can move from an AI Overview directly into AI Mode when they want a longer, multi-turn conversation or deeper detail on a topic. The transition is user-initiated: after receiving an AI Overview summary, a user can choose to continue the session in AI Mode, which preserves conversational context and enables follow-up queries. From a site-owner perspective, this means a single search session can begin with a standard AI Overview citation and extend into an AI Mode dialogue – both of which surface linked results and citations drawn from the same underlying Search infrastructure. Content that earns a citation in AI Overviews is therefore well-positioned to remain visible as the user deepens their query in AI Mode.

Why was Safe Search one of the first areas where Google could safely deploy isolated AI/ML models in production Search?

Safe Search was an ideal isolation boundary because its task is narrow and self-contained: classify whether a given image, video, or text result is explicit, and return a signal score. That score feeds into the broader ranking stack without requiring the AI model to understand the full complexity of relevance or query intent. When convolutional neural networks arrived approximately 12 years ago and demonstrated the ability to understand images at or above human accuracy, Safe Search could adopt them without risking the integrity of the main ranking flow. Todorovic notes that the harder challenge with ML in Search has always been debuggability – understanding why a model produced a given output and iterating on it. Safe Search’s narrow scope made that iteration tractable in a way that deploying ML directly into core ranking was not, at least at that stage of the technology.

What is the practical risk of using AI tools for content production versus productivity, and where does Google draw the line?

Todorovic draws a clear operational line between AI as a productivity multiplier and AI as a content factory. Using AI to improve grammar, refine style, analyze competitive data, or accelerate research is explicitly described as an acceptable and recommended use. Bulk AI content generation – producing large volumes of undifferentiated articles at low cost – is not. The mechanism behind this distinction is straightforward: AI Overviews synthesize answers from retrieved sources, and a source that merely restates information already available elsewhere adds no incremental value to that synthesis. Content that cannot be replicated by an AI model – personal testing, direct experience, original analysis – is precisely what the fan-out retrieval and citation system is designed to surface. The risk of bulk generation is not a penalty per se; it is invisibility, because the content offers nothing the language model cannot already produce internally.

Scale Expert Content That AI Engines Actually Cite

Google’s AI Overviews and AI Mode surface linked citations from sources that demonstrate genuine expertise. AuthorityRank engineers that content at scale – producing authority-grade articles in minutes, not months, built to earn citations from ChatGPT, Perplexity, and Google’s AI infrastructure.

See AuthorityRank in Action

LEAVE A REPLY

Please enter your comment!
Please enter your name here